BIMgent: The AI Agent That Builds 3D Models Like a Pro

Inside the multimodal AI agent that turns sketches and text into fully modeled buildings

Sep 15, 2025

Image showing BIMgent automating BIM modeling from hand-drawn sketches through designing, planning, and executing stages

If you’ve ever modeled a building in Vectorworks, Revit, or ArchiCAD, you know the drill: hundreds of clicks, complex menus, and a high chance that one wrong step sends you backtracking. While AI in AEC has improved conceptual design and analysis, automating the actual in-software modeling process has remained mostly unexplored.

That’s the gap BIMgent aims to fill.

Image showing BIMgent’s layered framework for BIM automation with design input, action planning, and execution workflows

What BIMgent Does

BIMgent is a multimodal LLM-powered agent designed to autonomously build 3D BIM models directly through the graphical user interface of authoring software. Unlike tools that use APIs, BIMgent:

Interprets text descriptions or 2D sketches into a detailed floorplan aligned with software coordinates.
Plans modeling steps using a hierarchical approach—high-level architectural sequencing and low-level command execution—supported by official software documentation.
Executes actions like mouse clicks, keyboard shortcuts, and parameter changes, while verifying and correcting its own work.

In testing on a custom Mini Building Benchmark of 25 real-world modeling tasks, BIMgent achieved:

32% end-to-end task success rate vs. 0% for baseline models.
86.58% success in wall creation tasks.
95.12% success in creating openings.

Image showing BIMgent low-level planner using floorplan metadata and software documentation to generate detailed BIM automation subtasks

How It Works

BIMgent operates in three core layers:

Design Layer – Generates or processes a floorplan from multimodal inputs and maps every architectural element to GUI pixel coordinates.
Action Planning Layer – Splits the modeling process into general steps (e.g., create layers, add external walls) and detailed substeps. Retrieves instructions from the software’s documentation for accuracy.
Execution Layer – Carries out clicks and keystrokes in two modes:
- Pure-Action Workflow for straightforward, repeatable actions.
- Vision-Driven Workflow for complex interface interactions, using screenshot-based grounding to focus on relevant UI regions.

Image showing BIMgent dynamic GUI grounding process comparing screenshots, highlighting changes, and generating precise BIM automation actions

Why It Matters for AEC

Most AI design tools stop at generating geometry. BIMgent goes further by performing the actual authoring process—reducing manual effort in repetitive tasks and preserving the architect’s design intent.

Its benchmark results show strong reliability in component-level modeling like wall, slab, and opening creation—tasks that often consume a significant share of modeling time.

Current Limitations

The research notes three main error sources:

Planning errors – Incorrect step sequences.
Grounding errors – Misidentifying GUI targets.
Execution errors – Wrong clicks or incomplete actions.

Future work will focus on improving efficiency (reducing steps and execution time), expanding to other BIM tools, and creating more automated evaluation methods.

This blog post is based on research by Zihan Deng, Changyu Du, Stavros Nousias, and André Borrmann, published in “BIMgent: Towards Autonomous Building Modeling via Computer-use Agents.”

AEC Tech + AI with Mayur Mistry

Discussion about this post

Ready for more?