AI在线 AI在线

Tsinghua Collaborates with MemSET Intelligence to Open Source! First Chinese APP Specializing in GUI Agent Covers AutoNavi, Bilibili, and Xiaohongshu

With the rapid development of artificial intelligence technology, intelligent interaction has become the new focus of mobile internet. Recently, the THUNLP Lab at Tsinghua University and Mianbi Intelligence jointly released a revolutionary open-source project - AgentCPM-GUI, which is the world's first open-source GUI (graphical user interface) Agent specifically optimized for Chinese apps. This project not only demonstrates the core strength of domestic AI technology but also provides new possibilities for the intelligent upgrade of the Android ecosystem.Technical Breakthrough: The World's First GUI Agent Specialized for Chinese Apps.

With the rapid development of artificial intelligence technology, intelligent interaction has become the new focus of mobile internet. Recently, the THUNLP Lab at Tsinghua University and Mianbi Intelligence jointly released a revolutionary open-source project - AgentCPM-GUI, which is the world's first open-source GUI (graphical user interface) Agent specifically optimized for Chinese apps. This project not only demonstrates the core strength of domestic AI technology but also provides new possibilities for the intelligent upgrade of the Android ecosystem.

image.png

Technical Breakthrough: The World's First GUI Agent Specialized for Chinese Apps

AgentCPM-GUI is built based on Mianbi Intelligence's MiniCPM-V model with a total parameter count of 8 billion (8B). This model uses phone screen images as input, accurately identifying interface elements and automatically executing user instructions. Compared to traditional general-purpose Agents, AgentCPM-GUI has undergone in-depth optimization for Chinese apps, covering over 30 mainstream Chinese applications including AutoNavi Map, Dianping, Bilibili, and Xiaohongshu, showcasing excellent localization capabilities.

image.png

According to AIbase, this Agent performs exceptionally well in interface element positioning and task execution. For example, in a demonstration scenario, AgentCPM-GUI can quickly open Bilibili and check if a specific UP master has posted a new video, with smooth and precise operations. The realization of this function is due to its deep understanding of the interface logic of Chinese apps and efficient algorithm design.

Efficiency Revolution: Average Action Length Reduced to Only 9.7 Tokens

In terms of end-side inference efficiency, AgentCPM-GUI also shines. Through advanced model compression technology, the average action length has been reduced to 9.7 tokens, significantly reducing computational resource usage. This means that even on ordinary Android devices, AgentCPM-GUI can achieve quick responses and smooth operation, providing users with an interaction experience close to native apps.

AIbase believes that this efficiency improvement not only lowers the hardware threshold for developers and users but also lays the foundation for widespread deployment of AgentCPM-GUI on more consumer electronics devices. Whether it's smartphones, tablets, or other smart terminals, AgentCPM-GUI has the potential to become the core engine of intelligent interaction.

Open Source Empowerment: Promoting Intelligent Upgrades in the Android Ecosystem

As a fully open-source project, the release of AgentCPM-GUI marks Tsinghua University and Mianbi Intelligence's firm commitment to the popularization of AI technology. The development team stated that the code and related documentation of AgentCPM-GUI have been made public, allowing developers to freely access and further develop based on it. This move will greatly reduce the development costs of intelligent interaction for Chinese apps, helping more small and medium-sized enterprises join the construction of intelligent ecosystems.

AIbase noticed that the openness of AgentCPM-GUI has received significant attention from the industry. Industry insiders pointed out that the project not only fills the gap in the field of Chinese GUI Agents but also provides valuable references for the intelligent development of the global Android ecosystem. In the future, with the participation of more developers, AgentCPM-GUI is expected to push the interactive experience of mainstream apps such as AutoNavi Map and Dianping to new heights.

Application Prospects: From Navigation to Social Interaction, Intelligence Everywhere

The emergence of AgentCPM-GUI opens up broad space for the intelligent application of Chinese apps. In navigation scenarios, users can use voice commands to let AgentCPM-GUI automatically operate AutoNavi Map to plan routes; in social scenarios, Agents can quickly browse notes on Xiaohongshu or videos on Bilibili, precisely extracting the information users need; in the life service sector, restaurant recommendations and reservations through Dianping can be achieved with one-click operations via Agents.

AIbase predicts that with the popularity of AgentCPM-GUI, the user experience of Chinese apps will迎来 a qualitative leap. Whether it's improving operational efficiency or optimizing personalized services, this Agent will become the intelligent bridge connecting users and applications.

A Milestone Breakthrough in Domestic AI

As a professional media outlet in the AI field, AIbase believes that the release of AgentCPM-GUI is not only a major breakthrough in research and development by Tsinghua University and Mianbi Intelligence but also an important step for domestic AI to reach the global stage. Its fine optimization for Chinese apps and efficient end-side inference capability showcase the unique advantages of Chinese AI companies in localized scenarios.

相关资讯

清华大学与面壁智能发布端侧GUI Agent:AgentCPM-GUI

近日,清华大学THUNLP实验室联合面壁智能推出了一款创新的端侧GUI Agent——**AgentCPM-GUI**,为移动设备的人机交互带来全新突破。 该智能体基于**MiniCPM-V**模型构建,总参数量仅**8B**,以手机屏幕图像为输入,支持中英文操作,可自动执行用户提出的任务,展现出强大的GUI元素定位能力。 AgentCPM-GUI覆盖了包括**高德地图**、**大众点评**、**B站**、**小红书**在内的**30多个主流中文APP**,能够精准识别和操作APP界面元素,满足多样化的用户需求。
5/14/2025 3:00:52 PM
AI在线

清华携手面壁智能重磅开源!首款中文APP专精GUI Agent,覆盖高德、B站、小红书

随着人工智能技术的迅猛发展,智能化交互成为移动互联网的新焦点。 近日,清华大学THUNLP实验室与面壁智能联合发布了一款革命性的开源项目——AgentCPM-GUI,这是全球首个针对中文APP精细优化的开源GUI(图形用户界面)Agent。 该项目不仅展示了国产AI技术的硬核实力,还为安卓生态的智能化升级提供了全新可能。
5/14/2025 6:00:52 PM
AI在线

首批中文版Llama3模型来了,解释成语、答弱智吧问题

中文问题,中文回答。最近,Meta 推出了 Llama 3,为开源大模型树立了新的标杆。和以往的原始 Llama 模型一样,Llama 3 对中文的支持效果欠佳,经常会出现你用中文提问,它用英文或中文 英文回复的现象。因此,要想让国内用户用上该模型,开发者还需对其进行微调。最近,在 Github 以及 HuggingFace 平台上,我们已经陆陆续续地看到了一些这样的项目,比如 llama3-Chinese-chat 和 Llama3-8B-Chinese-Chat。这篇文章将逐一介绍。llama3-Chinese
4/25/2024 11:23:00 AM
机器之心
  • 1