With the rapid development of large language models (LLMs), single agents have revealed many limitations when dealing with complex real-world tasks. To address this issue, a new multi-agent framework named Workforce and an accompanying training method called OWL (Optimized Workforce Learning) were jointly introduced by institutions such as Hong Kong University and camel-ai. Recently, this innovative achievement achieved an accuracy rate of 69.70% on the authoritative benchmark test GAIA, not only breaking the record for open-source systems but also surpassing commercial systems like OpenAI Deep Research.
All the code related to this research result has been made open-source on GitHub, receiving over 17,000 stars, indicating the community's recognition of this innovation.
So how does the Workforce framework break through the limitations of multi-agent systems? Its core lies in the innovative "decoupled design." The framework divides the entire system into three key components: domain-agnostic planners (Planner Agents), intelligent coordinators (Coordinator Agents), and specialized worker nodes (Worker Nodes). This design not only enhances system flexibility but also significantly reduces the complexity of cross-domain migration. Especially when adapting to new domains, users only need to replace or add worker nodes without making comprehensive modifications to the core system.
The OWL training method is another highlight of this framework. It adopts a two-stage training strategy. In the first stage, supervised fine-tuning is used to initially train the planner using expert demonstration data. In the second stage, reinforcement learning optimization is applied to further enhance decision-making capabilities through the Direct Preference Optimization (DPO) algorithm. These optimizations ensure that the planner can handle diverse tasks in the real world.
In the GAIA benchmark test, the Workforce framework demonstrated its significant advantages, particularly in multi-agent reasoning, achieving an accuracy rate of 69.70%, far exceeding previous open-source systems. Meanwhile, the OWL training method also achieved remarkable results in the test, enhancing the performance of the Qwen2.5-32B-Instruct model. This breakthrough allows multi-agent systems to handle complex tasks without being constrained by previous design concepts, showcasing powerful self-correction and evolution capabilities.
The introduction of the Workforce framework not only improves the overall performance of multi-agent systems but also points the way forward for the future development of intelligent assistants.