AI在线 AI在线

Breaking Traditions! FUDOKI Model Makes Multi-Modal Generation and Understanding More Flexible and Efficient

In recent years, the field of artificial intelligence has undergone tremendous changes, particularly with large language models (LLMs) making significant progress in multi-modal tasks. These models demonstrate powerful potential in understanding and generating language, but most current multi-modal models still adopt autoregressive (AR) architectures, which limit their inference process to be relatively monotonous and lacking in flexibility. To address this limitation, a research team from The University of Hong Kong and Huawei Noah’s Ark Lab has proposed a novel model called FUDOKI.The core innovation of FUDOKI lies in its entirely new non-masked discrete flow matching architecture.

In recent years, the field of artificial intelligence has undergone tremendous changes, particularly with large language models (LLMs) making significant progress in multi-modal tasks. These models demonstrate powerful potential in understanding and generating language, but most current multi-modal models still adopt autoregressive (AR) architectures, which limit their inference process to be relatively monotonous and lacking in flexibility. To address this limitation, a research team from The University of Hong Kong and Huawei Noah’s Ark Lab has proposed a novel model called FUDOKI.

The core innovation of FUDOKI lies in its entirely new non-masked discrete flow matching architecture. Unlike traditional autoregressive models, FUDOKI achieves bidirectional information integration through parallel denoising mechanisms, significantly enhancing its performance in complex reasoning and generation tasks. This model not only bridges the gap between image generation and text understanding but also achieves unified modeling for both domains.

Brain Large Model AI

Figure source note: Image generated by AI, provided by Midjourney

This model's advantage is its mask-free design, making the generation process more flexible. During inference, FUDOKI allows dynamic adjustment of the generation results, as if it had learned human-like thinking patterns. Moreover, FUDOKI performs exceptionally well in image generation, achieving a score of 0.76 on the GenEval benchmark, surpassing same-sized autoregressive models and demonstrating high-quality generation effects and semantic accuracy.

The construction of FUDOKI relies on metric-induced probabilistic paths and optimal kinetic velocity. These technologies enable the model to consider the semantic similarity of each token during the generation process, resulting in more natural text and image generation. Additionally, during training, FUDOKI uses pre-trained autoregressive models for initialization, reducing training costs and improving efficiency.

The introduction of FUDOKI not only provides a new perspective for multi-modal generation and understanding but also lays a more solid foundation for the development of general artificial intelligence. In the future, we look forward to FUDOKI bringing further exploration and breakthroughs, driving the continued advancement of AI technology.

相关资讯

Mistral Launches New Agents API: Empowering Developers to Build Intelligent AI Agents

Mistral recently released its new Agents API, a framework designed specifically for developers to simplify the creation of AI agents that can perform various tasks such as running Python code, generating images, and conducting retrieval-augmented generation (RAG).The introduction of this API aims to provide a unified environment for large language models (LLMs) to interact with multiple tools and data sources in a structured and persistent manner.The Agents API is built on top of Mistral's language model and integrates multiple built-in connectors. These connectors enable agents to run Python code in a controlled environment, generate images through dedicated models, access real-time web searches, and utilize user-provided document libraries. One highlight is its persistent memory feature, which allows agents to maintain context across multiple interactions, supporting coherent and stateful conversations..
5/28/2025 11:01:20 AM
AI在线

Meituan Wang Xing Explains AI Strategy: No Code Platform Free to Use, 1680 Applications Launched

At the shareholders' meeting held at Hengdian Building, the headquarters of Meituan in Beijing, Wang Xing, the founder of Meituan, systematically explained the company's strategic layout and development plan in the field of artificial intelligence for the first time, revealing the deep thinking and bold practice of Meituan in the AI wave.Wang Xing divided Meituan's AI development into different stages. In the early stage, the company had already applied deep neural network algorithms in its food delivery routing and dispatching systems. The current stage focuses on the development and deployment of large language models and their derivative applications.In the face of fierce competition in the AI field, Meituan has made large-scale investments over the past three years.
6/16/2025 9:49:06 AM
AI在线

Apple Again Criticized for AI Reasoning Ability: GitHub Celebrity Rebuttal: This Is Not the Real Picture of Reasoning Ability!

Recently, Apple published a controversial paper pointing out significant defects in the reasoning abilities of current large language models (LLMs). This view quickly sparked heated discussions on social media, especially among senior software engineer Sean Goedecke from GitHub, who strongly opposed this conclusion. He argued that Apple's findings were overly simplistic and could not fully reflect the capabilities of reasoning models.Apple's paper highlighted that LLMs perform inconsistently when tackling benchmark tests such as mathematics and programming.
6/16/2025 9:49:06 AM
AI在线
  • 1