DiagrammerGPT is a new framework that uses large language models (LLMs) to generate diagrams from text with a high degree of accuracy and flexibility.
It can create diagrams for any topic or file format, offering more precise object layouts and clearer text labels than the existing text-to-image (T2I) models.
A diagram is a simplified or symbolic drawing that uses complex visual elements, such as objects, text labels, arrows, and lines, to explain information in a clear and concise way. Existing T2I models, such as DALL-E 3, struggle to generate diagrams because they cannot position the objects correctly and create legible labels.
Innovative features of DiagrammerGPT
DiagrammerGPT manages to better control all aspects of the diagrams’ visual elements, including their position, size, shape, color, orientation, connections, and labels.
These are the innovative features that enables DiagrammerGPT to produce accurate diagrams:
- It uses a two-stage approach to generate diagrams: first, it creates a diagram plan and then it renders the diagram image using a diagram generator.
- It leverages the layout guidance features of advanced LLMs, which allow the LLMs to control the position, size, shape, color, and orientation of the visual elements, as well as the connections and labels between them.
- It introduces a new diagram dataset, AI2D-Caption.
DiagrammerGPT is a two-stage text-to-diagram generation framework:
I. Diagram planning. In this stage it uses a GPT-4 model, called the planner, to generate diagram plans from text prompts. Another GPT-4 model, called the auditor, checks the diagram plans for errors and inconsistencies and gives feedback to the planner. The planner and the auditor work together in a feedback loop to improve the diagram plans until they align with the input prompts.
II. Diagram generation . In this stage, it creates diagrams following the diagram plans. The model uses two key elements: (1) DiagramGLIGEN, which converts the diagram plan into a visual image, and (2) a text label rendering module, which generates clear and readable text labels on the diagram.
DiagramGLIGEN is a model based on the GLIGEN architecture which uses gated self-attention layers to improve the Stable Diffusion v1.4 model. Unlike the original GLIGEN model, which can only create natural images and uses only objects for layout grounding, DiagramGLIGEN is more specialized for diagrams, because it is trained on the AI2D-Caption diagram dataset, which has captions for different types of diagrams.
To evaluate the performance of the model, the team introduces the AI2D-Caption dataset, which is built on top of the AI2D dataset and provides dense annotations for each diagram (e.g., object descriptions and text label-object linkages). This dataset enables DiagrammerGPT to learn from diverse and rich examples of diagrams and their descriptions.
They provide a comprehensive analysis of DiagrammerGPT, including its ability to generate open-domain diagrams, vector graphic diagrams in different platforms, human-in-the-loop diagram plan editing, and multimodal planner/auditor LLMs (e.g., GPT-4Vision).
DiagrammerGPT was compared with three other models (Stable Diffusion v1.4, VPGen, and AutomaTikZ) in generating diagrams from text using various evaluation methods. The results of this comparison are presented in the table below.
DiagrammerGPT outperforms other models on all metrics. We can also see that the other models improve their performance when they are fine-tuned on a diagram dataset.
Below you can see the results of the human evaluation. The table shows how well DiagrammerGPT and Stable Diffusion v1.4 align the images and texts and capture the object relationships.
DiagrammerGPT was preferred more than Stable Diffusion v1.4 on both criteria: image-text alignment (36% vs 20%) and object relationships (48% vs 30%).
DiagrammerGPT is a new framework that can generate diagrams from text prompts using LLMs, outperforming the existing text-to-image (T2I) models.
It consists of a planner LLM that generates a diagram plan, an auditor LLM that checks and refines the diagram plan, and a diagram generator module that draws the diagram image.
DiagrammerGPT can create diagrams for various topics and in different file formats, such as flowcharts, mind maps, and electrical circuits.
- Research paper: “DiagrammerGPT: Generating Open-Domain, Open-Platform Diagrams via LLM Planning” (on arXiv)
- Project page