AgentScope 1.0, a developer-centric framework for AI agents

September 30, 2025

AgentScope 1.0 (project page, paper, code) is a new framework designed to simplify the development, deployment, and evaluation of LLM-based agentic applications. It is open-source, developed by Alibaba Group’s Tongyi Lab, and released under the Apache 2.0 license.

You can read the Quickstart steps and usage examples on GitHub. To install, either clone the repository or use pip to install the agentscope Python module. Alternatively, follow the tutorial in the official documentation.

After installation, create an agent by selecting a model, memory, and tools, then send it messages to receive replies and actions. Additional features such as web search or math solving are available, and AgentScope Studio provides visualization of the agent’s reasoning and behavior. For larger tasks, it supports deployment across multiple machines.

Key innovations

Building intelligent agents is not just about connecting a language model to some tools. Developers face tough engineering challenges such as handling asynchronous tasks, managing tool overload, ensuring reliable testing, and deploying agents safely at scale. AgentScope 1.0 introduces advanced solutions to each of these problems:

Managing asynchronous and parallel tool calls: When an agent needs to call multiple web APIs at the same time, such as fetching data from different sources, it can easily get stuck waiting, slowing down the entire process. AgentScope 1.0 supports asynchronous and parallel tool execution (ReAct paradigm), which means agents can send out multiple requests simultaneously and continue reasoning without grinding to a halt.
Handling a rich toolset without overload: Giving an agent too many tools risks overwhelming its limited “context window.” If the model has to remember too many tool descriptions at once, it may forget important details or fail to use tools correctly. The new framework introduces a sandboxed tool management system, which keeps tool definitions and executions separate from the model’s core reasoning, thus avoiding the overloading of the agent with all tool information at once.
Rigorous testing and tracing: Agents often behave unpredictably, and without proper evaluation it is hard to know whether they are stable, reliable, or safe to use. Developers need more than just a success/failure score. AgentScope 1.0 includes an evaluation module and the AgentScope Studio visualization system that turn raw results into interactive charts and probability distributions, showing how stable an agent is over time. AgentScope Studio also allows step-by-step tracing of reasoning, tool calls, and responses, helping developers pinpoint exactly where and why an error occurred.
Safe and scalable deployment: Running agents in real-world environments raises risks, such as unsafe tool use, security concerns, or systems crashing when scaling up. The framework provides runtime sandboxing to execute tools safely, ensuring that external calls happen in controlled environments. It also supports multi-process and distributed deployment, so agents can scale across machines or clusters without performance bottlenecks. It defines a modular separation into message, model, memory, and tool modules. These are decoupled, more extensible, and allow for plugging in new models and Model Context Protocols (MCPs) more easily.
Built-in agents: It includes several ready-to-use agents: a browser-use agent, deep research agent, meta-planner, etc. Developers can either use them out of the box or use them as starting points.

In summary, AgentScope combines parallelism, safe tool integration, transparent evaluation, and scalable deployment into one unified framework.

The model

AgentScope 1.0 is designed to be model-agnostic, meaning it can be used with a wide range of LLMs and multimodal models, including OpenAI’s GPT series, Google’s Gemini, and Tongyi’s own Qwen models. Here are its architectural components:

The overall architecture of AgentScope (source: paper)

AgentScope 1.0 is built around four foundational components: messages, models, memory, and tools. These work together to help agents communicate, reason, remember important information, and safely use external functions like web browsers or calculators. At the center of each agent is the Model module that acts as the agent’s “brain,” connecting it to different AI models (like ChatGPT or multimodal APIs) through a single, unified interface. All communication, whether between agents, users, or the system, is handled through a special message format called a Msg object.

AgentScope follows the ReAct loop, which means agents reason through problems and take actions step by step. It can also use multiple tools at the same time, picking the right tool when needed, guiding agents in real time and saving their progress as they work.

AgentScope also supports multi-agent cooperation, allowing multiple agents to work together like a team. For example, in a travel-planning task, one agent could search for flights, another could find hotels, and a third could suggest activities. They can work together through AgentScope 1.0 to create a complete trip plan.

The workflow of the ReAct agent in AgentScope

In AgentScope, agents are built on the ReAct paradigm, a design that integrates reasoning (“thinking”) with action (“doing”). This way, agents can actively plan, use tools, and adapt to changes in their environment.

The workflow of a ReAct-based agent in AgentScope 1.0 is structured around three key functions: observe, reply and interrupt handling, and environment interaction. First, the agent observes incoming information, whether from a user, another agent, or the system. It then reasons through the input and generates a reply, which may include actions like calling tools or updating memory. The agent can also be interrupted or redirected at any point. For example, it might be asked to stop its current task, shift focus to something more urgent, or respond to a new request mid-execution.

To support this flexibility, AgentScope includes an Interrupt-handle mechanism. This allows the agent to pause or suspend ongoing operations, update its memory if needed, and easily pivot to a new task, all without breaking the workflow or losing context.

Evaluation toolkit

Evaluating the performance of intelligent agents requires specialized methods and benchmarks. Unlike traditional models that only generate responses, agents must reason through problems, use external tools, interact with their environment, and maintain memory across long conversations. To address these needs, AgentScope 1.0 provides a structured evaluation framework with two built-in evaluators:

GeneralEvaluator: A simple evaluator, well-suited for debugging or smaller test cases.
RayEvaluator: A distributed evaluation system built on Ray, enabling large-scale and parallelized testing across many tasks simultaneously.

To ensure comparability, AgentScope 1.0 supports industry-recognized benchmarks: ACEBench and GAIA benchmark. These benchmarks measure important capabilities, including reasoning accuracy, tool selection, handling of multimodal inputs, and performance stability over extended task sequences.

A key strength of AgentScope lies in its AgentScope Studio integration, which provides developers with interactive visualizations of evaluation results. Instead of reporting a single success rate or accuracy value, AgentScope Studio represents performance as statistical distributions with confidence intervals, offering a more reliable picture of an agent’s stability. It can also group outcomes into categories like “consistently correct,” “consistently incorrect,” or “unstable,” helping developers pinpoint areas for targeted improvement. Additionally, AgentScope Studio allows side-by-side comparisons of reasoning steps, tool calls, and responses, enabling detailed debugging and root-cause analysis when failures occur.

The evaluation of AgentScope 1.0 is primarily illustrated through case studies that demonstrate its ability to manage complex, multi-step tasks. These include user–assistant dialogues, cooperative multi-agent interactions, and browser-based agents. While these examples effectively showcase the framework’s capabilities in controlled environments, the current paper and documentation do not yet include large-scale, real-world deployment studies using industrial data.

Conclusion

AgentScope 1.0 is an open source framework specifically designed to simplify the creation, deployment, and management of intelligent agent applications, particularly those involving multi-agent systems.

Its asynchronous design ensures that agents can perform multiple tasks efficiently without unnecessary delays, while sandboxing provides a secure environment for executing external tools, reducing risks in real-world deployments. Meanwhile, the AgentScope Studio visual platform offers developers deep visibility into agent reasoning and interactions, making complex multi-agent behaviors easier to understand, debug, and optimize.