HuggingGPT is a framework that integrates ChatGPT with the Hugging Face AI models. It utilizes large language models (LLMs), like ChatGPT, as controllers that allow AI models to work together and perform multiple tasks in a collaborative manner.
HuggingGPT serves as a bridge connecting ChatGPT and the Hugging Face hub, which contains over 400 task-specific AI models.

The workflow
The workflow of HuggingGPT consists of four stages:
- Task planning: ChatGPT receives a user request, identifies the tasks that need to be performed and parses them into a task list. Then it determines the execution order and resource dependencies among the tasks, to optimize the workflow. To achieve this, the model employs a task specification template with four slots: task type, task ID, task dependencies, and task arguments.
- Model selection: ChatGPT matches the tasks to the appropriate models, based on the model descriptions obtained from the Hugging Face Hub. The models are filtered based on their task type, ranked by the number of downloads they have received on Hugging Face, and the top-K models are selected as candidates for HuggingGPT to choose from.
- Task execution: the AI models execute the assigned tasks, based on the task order and dependencies created in the first stage.
- Response generation: the system uses all the information gathered from the previous stages, such as the planned tasks, selected models, and inference results of the models, to generate a concise summary and provide a meaningful response to the user’s request.

The most important task of response generation is the inference results. For instance, if the model is an object detection model, the inference results may include bounding boxes with detection probabilities. Similarly, if the model is a question-answering model, the inference results may include answer distributions.
Here is an example of how HuggingGPT efficiently handles the resource dependencies while executing a task.


Limitations, further research
HuggingGPT, like any other system, has some limitations:
1. The inference of the large language model creates a bottleneck in the system’s efficiency. HuggingGPT needs to interact with the LLM in the task planning, model selection, and response generation stages, which increases the response latency and affects the user experience.
2. Due to the maximum number of tokens that the large language model can accept, HuggingGPT has a limitation on the maximum context length. To mitigate this, HuggingGPT only tracks the conversation context in the task planning stage.
3. The system stability is also a concern. HuggingGPT can face issues with the large language model, such as unexpected output formats and exceptions in the program workflow.
4. Expert models on Hugging Face’s inference endpoint may face issues due to network latency or service state, leading to errors in the task execution stage.
Learn more:
Research paper: “HuggingGPT: Solving AI Tasks with ChatGPT and its Friends in HuggingFace” (on arXiv)