AI在线 AI在线

Ant Group and inclusionAI Jointly Launch Ming-Omni: The First Open Source Multi-modal GPT-4o

Recently, Inclusion AI and Ant Group jointly launched an advanced multimodal model called "Ming-Omni," marking a new breakthrough in intelligent technology. Ming-Omni is capable of processing images, text, audio, and video, providing powerful support for various applications. Its functions not only cover speech and image generation but also possess the ability to integrate and process multimodal inputs.** Comprehensive Multimodal Processing Capability **.

Recently, Inclusion AI and Ant Group jointly launched an advanced multimodal model called "Ming-Omni," marking a new breakthrough in intelligent technology. Ming-Omni is capable of processing images, text, audio, and video, providing powerful support for various applications. Its functions not only cover speech and image generation but also possess the ability to integrate and process multimodal inputs.

image.png

** Comprehensive Multimodal Processing Capability **

The design of Ming-Omni incorporates dedicated encoders to extract tokens from different modalities. These tokens are processed by the "Ling" module (i.e., mixture-of-experts architecture, MoE), which is equipped with newly proposed modality-specific routers. This enables Ming-Omni to efficiently handle and fuse multimodal inputs, supporting various tasks without requiring additional models, specific task fine-tuning, or structural reorganization.

** Revolution in Speech and Image Generation **

One notable highlight of Ming-Omni compared to traditional multimodal models is its support for audio and image generation. By integrating advanced audio decoders, Ming-Omni can generate natural and fluent speech. Additionally, its use of the high-quality image generation model "Ming-Lite-Uni" ensures the precision of image generation. Furthermore, the model can perform context-aware dialogues, text-to-speech conversion, and diverse image editing, showcasing its potential across multiple domains.

** Smooth Voice and Text Conversion **

Ming-Omni's capabilities in language processing are equally impressive. It has the ability to understand dialects and perform voice cloning, converting input text into speech output in various dialects, demonstrating its strong linguistic adaptability. For example, users can input different dialect sentences, and the model will be able to understand and respond in the corresponding dialect, enhancing the naturalness and flexibility of human-computer interaction.

** Open Source, Promoting Research and Development **

Notably, Ming-Omni is the first known open-source model that matches GPT-4o in terms of modality support. Inclusion AI and Ant Group have committed to making all code and model weights public, aiming to inspire further research and development within the community and drive continuous progress in multimodal intelligence technology.

The release of Ming-Omni not only injects new vitality into the field of multimodal intelligence but also provides more possibilities for various applications. As technology continues to evolve, we look forward to Ming-Omni playing a greater role in future intelligent interactions.

Project: https://lucaria-academy.github.io/Ming-Omni/

相关资讯

蚂蚁集团开源Ming-lite-omni:首个媲美GPT-4o的开源多模态模型

蚂蚁集团开源Ming-lite-omni:首个媲美GPT-4o的开源多模态模型

蚂蚁集团旗下百灵大模型团队在近期蚂蚁技术日上宣布重大决定:将统一多模态大模型Ming-lite-omni进行全面开源。 这一举措不仅标志着蚂蚁集团在AI领域的又一次重大开放,更被业界视为首个在模态支持方面能够与GPT-4o相媲美的开源模型。 220亿参数的技术突破Ming-lite-omni基于Ling-lite构建,采用先进的MoE(专家混合)架构,拥有220亿总参数和30亿激活参数的强大配置。
5/29/2025 4:00:54 PM AI在线
蚂蚁集团和inclusionAI联合推Ming-Omni:首个开源版多模态GPT-4o

蚂蚁集团和inclusionAI联合推Ming-Omni:首个开源版多模态GPT-4o

近日,Inclusion AI 与 蚂蚁集团联合推出了一款名为 “Ming-Omni” 的先进多模态模型,标志着智能技术的新突破。 Ming-Omni 能够处理图像、文本、音频及视频,为多种应用提供强大支持,其功能不仅涵盖语音和图像生成,还具备多模态输入的融合处理能力。 ** 全面的多模态处理能力 **Ming-Omni 的设计中采用了专用编码器来提取不同模态的标记(tokens),这些标记经过 “Ling” 模块(即混合专家架构,MoE)进行处理,后者配备了新提议的模态特定路由器。
6/16/2025 1:01:53 PM AI在线
陈天桥旗下盛大AI东京研究院于SIGGRAPH Asia正式亮相,揭晓数字人和世界模型成果

陈天桥旗下盛大AI东京研究院于SIGGRAPH Asia正式亮相,揭晓数字人和世界模型成果

在 SIGGRAPH Asia 2025 期间,盛大集团(Shanda Group)旗下,盛大 AI 东京研究院(Shanda AI Research Tokyo)以展台活动、BoF 学术讨论与顶尖教授闭门交流等形式完成首次公开亮相,标志着盛大在数字人的 “交互智能 (Interactive Intelligence)” 与世界模型的 “时空智能 (Spatiotemporal Intelligence)” 等两大方向的研究,正式登上国际顶级学术与产业舞台。 这一全新范式是盛大集团创始人陈天桥长期愿景的直接体现。 他多年来对脑科学与 AI 融合研究的战略投入,以及在 TCCI 首届 AI 驱动科学研讨会(AIAS 2025)上系统阐述的 “发现式智能”(discovery intelligence)理念,共同强调了智能体认知基底的重要性。
12/22/2025 1:50:00 PM 机器之心