BIMgent: The AI Agent That Builds 3D Models Like a Pro
Inside the multimodal AI agent that turns sketches and text into fully modeled buildings
If you’ve ever modeled a building in Vectorworks, Revit, or ArchiCAD, you know the drill: hundreds of clicks, complex menus, and a high chance that one wrong step sends you backtracking. While AI in AEC has improved conceptual design and analysis, automating the actual in-software modeling process has remained mostly unexplored.
That’s the gap BIMgent aims to fill.
What BIMgent Does
BIMgent is a multimodal LLM-powered agent designed to autonomously build 3D BIM models directly through the graphical user interface of authoring software. Unlike tools that use APIs, BIMgent:
Interprets text descriptions or 2D sketches into a detailed floorplan aligned with software coordinates.
Plans modeling steps using a hierarchical approach—high-level architectural sequencing and low-level command execution—supported by official software documentation.
Executes actions like mouse clicks, keyboard shortcuts, and parameter changes, while verifying and correcting its own work.
In testing on a custom Mini Building Benchmark of 25 real-world modeling tasks, BIMgent achieved:
32% end-to-end task success rate vs. 0% for baseline models.
86.58% success in wall creation tasks.
95.12% success in creating openings.
How It Works
BIMgent operates in three core layers:
Design Layer – Generates or processes a floorplan from multimodal inputs and maps every architectural element to GUI pixel coordinates.
Action Planning Layer – Splits the modeling process into general steps (e.g., create layers, add external walls) and detailed substeps. Retrieves instructions from the software’s documentation for accuracy.
Execution Layer – Carries out clicks and keystrokes in two modes:
Pure-Action Workflow for straightforward, repeatable actions.
Vision-Driven Workflow for complex interface interactions, using screenshot-based grounding to focus on relevant UI regions.
Why It Matters for AEC
Most AI design tools stop at generating geometry. BIMgent goes further by performing the actual authoring process—reducing manual effort in repetitive tasks and preserving the architect’s design intent.
Its benchmark results show strong reliability in component-level modeling like wall, slab, and opening creation—tasks that often consume a significant share of modeling time.
Current Limitations
The research notes three main error sources:
Planning errors – Incorrect step sequences.
Grounding errors – Misidentifying GUI targets.
Execution errors – Wrong clicks or incomplete actions.
Future work will focus on improving efficiency (reducing steps and execution time), expanding to other BIM tools, and creating more automated evaluation methods.
This blog post is based on research by Zihan Deng, Changyu Du, Stavros Nousias, and André Borrmann, published in “BIMgent: Towards Autonomous Building Modeling via Computer-use Agents.”





