资讯列表
多模态模型具备“物理推理能力”了吗?新基准揭示:表现最好的GPT-o4 mini也远不及人类!
表现最好的GPT-o4 mini,物理推理能力也远不及人类! 就在最近,来自香港大学、密歇根大学等机构的研究人员补齐了现有评估体系中的一处关键空白——评估多模态模型是否具备“物理推理能力”。 物理推理,即模型在面对真实或拟真的物理情境时,能否综合利用视觉信息、物理常识、数学建模进行判断和预测,被认为是通向具身智能的关键能力。
5/28/2025 11:55:28 AM
全靠Claude4!30年FAANG老工程师:AI帮我解决了四年老bug
AI 就像一头野驴,跑起来就不停。 人类花了几百万年才走上食物链顶端,而大模型只用了不到十年时间,已经能把你和刘亦菲 P 进一张自拍了。 奥!
5/28/2025 11:49:52 AM
强化学习解决长上下文推理问题:通义推出QwenLong-L1-32B
近期的推理大模型(LRMs)通过强化学习(RL)展现出强大的推理能力,但这些改进主要体现在短上下文推理任务中。 相比之下,如何通过强化学习扩展 LRMs 以有效处理和推理长上下文输入,仍然是一个尚未解决的关键挑战。 来自阿里巴巴通义实验室的团队首先形式化定义长上下文推理强化学习范式,并识别出其中的两个核心挑战:次优的训练效率与不稳定的优化过程。
5/28/2025 11:46:52 AM
开源模型竟被用于窃取下游微调数据?清华团队揭秘开源微调范式新型隐藏安全风险
本文作者分别来自清华大学 CoAI 小组和墨尔本大学。 第一作者张哲昕为清华大学直博三年级学生,研究方向为大模型安全,主要合作者为孙玉豪,来自墨尔本大学,主要指导教师为清华大学王宏宁副教授与黄民烈教授。 基于开源模型继续在下游任务上使用私有下游数据进行微调,得到在下游任务表现更好的专有模型,已经成为了一类标准范式。
5/28/2025 11:46:18 AM
全球顶尖AI做物理,被人类按地摩擦?不懂推理大翻车,本科生碾压
大模型,真的懂物理推理吗? 就在刚刚,港大、密歇根大学、多伦多大学等机构的研究者用3000道物理题,给全球顶尖大模型来了一场大拷问。 结果,这些顶尖AI,毫无例外全部翻车了!
5/28/2025 11:44:24 AM
多模态大模型不会画辅助线?最新评估得分:o3仅25.8%,远低于人类82.3%
多模态时代应如何评估模型的视觉输出能力? 来自清华大学、腾讯混元、斯坦福大学、卡耐基梅隆大学等顶尖机构的研究团队联合发布了RBench-V:一款针对大模型的视觉推理能力的新型基准测试。 过去的评估基准主要集中于评估多模态输入和纯文本推理过程。
5/28/2025 11:43:48 AM
GPA只有3.3,顶会一作两篇,成功杀进TOP 20 AI博士?小哥曝光关键秘诀
两篇顶会一作,在亚洲攻读硕士学位,2026年春季毕业。 这位网友表示,自己计划申请2026年秋季开学的博士项目。 他的条件是:本科GPA在3.2-3.3左右,不算很高,但有一些研究经验。
5/28/2025 11:43:06 AM
单卡即可微调大模型!内存占用仅1/8,性能依然拉满 | ICML 2025
诸如Qwen,GPT,DeepSeek R1等基础大模型已成为现代深度学习的基石。 然而,在应用于具体下游任务时,它们庞大的参数规模使得额外微调成本较高。 为了解决这一问题,近期的研究聚焦于低秩适应 (LoRA) 方法,通过保持基座模型参数冻结,仅对新增的小型轻量级适配器进行微调,从而降低微调成本。
5/28/2025 11:42:14 AM
Artificial Intelligence Helps: Nestlé Pilot Project Expected to Save 1.5 Million Meals from Food Waste
With the continuous development of artificial intelligence technology, an increasing number of companies are beginning to use AI tools to reduce food waste. Recently, Nestlé, a global food giant, participated in a pilot project conducted in the UK. This project aims to effectively "design out" food waste by real-time monitoring and tracking food waste situations.In the preliminary trial of this AI tool, the edible food waste at one of Nestlé's factories was reduced by 87%.
5/28/2025 11:01:30 AM
AI在线
Mistral Launches Agents API: Building Collaborative and Memorable AI Agents for Enterprises
Mistral AI has launched a new Agents API, designed to extend language models into intelligent agent systems for enterprise applications.This framework adds tools for task execution, context tracking, and agent orchestration to foundational language models, enabling multiple AI agents not only to execute tasks independently, but also to collaborate and integrate with external systems, creating complete business processes.Each agent can connect via connectors and the MCP (Model Context Protocol) tool to run Python scripts, perform web searches, generate images (powered by Black Forest Lab FLUX1.1[pro] Ultra), or extract documents from Mistral Cloud. As a standard protocol, MCP enables seamless connections between APIs, databases, and user data.In contrast to traditional chatbots, Agents API supports persistent context management. Even if interactions are interrupted or rolled back, the agent retains its state, enhancing system continuity and reliability.
5/28/2025 11:01:30 AM
AI在线
Trae International Version Launches Paid Subscription Model, First Month Pro Subscription Only $3 with Claude4 Support
Trae, an AI-driven integrated development environment (IDE) launched by ByteDance, has quickly gained prominence in the global developer community since its release on January 20, 2025, thanks to its powerful AI capabilities and seamless development experience. Recently, Trae's international version officially introduced a paid subscription plan, marking its transition from being completely free to a sustainable business model. This report, compiled by AIbase, provides an in-depth look at the details of Trae's international version paid strategy and its potential impact on developers based on the latest online information.First Month for $3, Enhanced with Claude4.
5/28/2025 11:01:29 AM
AI在线
WordPress Forms AI Core Team: 660 Plugins走向规范化发展
In response to the open-source community's enthusiastic experiments with AI, WordPress has chosen to get involved and consolidate efforts.Tuesday saw WordPress officially announce the establishment of a dedicated AI team aimed at coordinating and advancing AI product development within its developer community.
5/28/2025 11:01:29 AM
AI在线
Reports say OpenAI plans to launch a feature that allows logging into third-party apps with ChatGPT
According to the latest news, OpenAI is exploring how users can log in to third-party applications using their ChatGPT accounts. The company mentioned this service in a webpage released this week and is currently soliciting developers' interest in it.Source note: Image generated by AI, provided by MidJourney. As ChatGPT quickly becomes one of the largest consumer applications globally, with approximately 600 million monthly active users, OpenAI hopes to leverage this popularity to further expand its business in areas such as online shopping, social media, and personal devices.
5/28/2025 11:01:26 AM
AI在线
Anthropic Introduces Claude Conversational Voice Mode for Mobile Devices, Searches Google Docs, Calendars, etc.
An artificial intelligence startup called Anthropic, located in San Francisco, announced the launch of a major update for its Claude AI chatbot: a brand-new voice conversation mode. This feature is now available in the mobile app on Apple's App Store (for iOS devices) and Google's Play Store (for Android devices).. In addition to the introduction of the voice mode, Anthropic has also expanded web search capabilities for all free users.
5/28/2025 11:01:26 AM
AI在线
Spott Raises $3.2 Million to Reshape AI-Powered Recruitment Platform
Recently, the San Francisco-based startup Spott announced it has secured $3.2 million in seed funding with the goal of building an AI-native recruitment platform to help recruitment agencies streamline processes and eliminate technical fragmentation. This round of financing was led by Base10 Partners, with participation from Y Combinator, Fortino, True Equity, and several angel investors. Spott has just completed the Y Combinator Winter Accelerator Program for 2025, and this funding will support its further development.Source Note: The image is generated by AI, and the image authorization service provider is Midjourney.In an interview with VentureBeat, Lander Degreve, co-founder and CEO of Spott, stated that recruitment companies have long relied on outdated software to manage daily operations.
5/28/2025 11:01:26 AM
AI在线
SAP and Alibaba Reach Strategic Cooperation to Connect to Qwen
Recently, Alibaba Group and global enterprise software giant SAP officially announced a comprehensive strategic partnership. The aim of this cooperation is to deeply integrate SAP's enterprise-level solutions with Alibaba Cloud's cloud computing infrastructure and artificial intelligence capabilities, jointly driving the digital transformation process for global enterprises.According to the cooperation agreement between the two parties, SAP will explore connecting its services to Alibaba's Qwen large model. This integration will enable enterprises to flexibly deploy SAP's ERP Cloud and private cloud versions on Alibaba Cloud.
5/28/2025 11:01:26 AM
AI在线
Google's Pichai Says AI Hardware Collaboration with Apple is Unparalleled, Sparking Tech Discussions
According to reports from Business Insider, OpenAI recently reached an acquisition deal with Jonathan Ive, the former iPhone designer, for a staggering $6.5 billion. This marks a highly anticipated collaboration in the tech industry. In response, Sundar Pichai, the CEO of Google, also shared his thoughts and joined the discussion.In the tech world, the rapid development of AI technology has driven major players to seek top talent in order to take the lead in this field.
5/28/2025 11:01:25 AM
AI在线
Rivr Robot Deliveries to the Doorstep Solve the Last 100-Yard Problem and Receive Jeff Bezos' Investment
While most delivery automation systems are still lingering at the curb, Rivr wants to bring robots directly to your doorstep.. This Zurich-based robotics startup is partnering with U.S. logistics company Veho in Austin, Texas, to pilot-test a four-wheel robot that can "climb stairs," addressing the toughest part of last-mile delivery: the final "100 yards.".
5/28/2025 11:01:25 AM
AI在线