AI在线 AI在线

Ant Group and inclusionAI Jointly Launch Ming-Omni: The First Open Source Multi-modal GPT-4o

Recently, Inclusion AI and Ant Group jointly launched an advanced multimodal model called "Ming-Omni," marking a new breakthrough in intelligent technology. Ming-Omni is capable of processing images, text, audio, and video, providing powerful support for various applications. Its functions not only cover speech and image generation but also possess the ability to integrate and process multimodal inputs.** Comprehensive Multimodal Processing Capability **.

Recently, Inclusion AI and Ant Group jointly launched an advanced multimodal model called "Ming-Omni," marking a new breakthrough in intelligent technology. Ming-Omni is capable of processing images, text, audio, and video, providing powerful support for various applications. Its functions not only cover speech and image generation but also possess the ability to integrate and process multimodal inputs.

image.png

** Comprehensive Multimodal Processing Capability **

The design of Ming-Omni incorporates dedicated encoders to extract tokens from different modalities. These tokens are processed by the "Ling" module (i.e., mixture-of-experts architecture, MoE), which is equipped with newly proposed modality-specific routers. This enables Ming-Omni to efficiently handle and fuse multimodal inputs, supporting various tasks without requiring additional models, specific task fine-tuning, or structural reorganization.

** Revolution in Speech and Image Generation **

One notable highlight of Ming-Omni compared to traditional multimodal models is its support for audio and image generation. By integrating advanced audio decoders, Ming-Omni can generate natural and fluent speech. Additionally, its use of the high-quality image generation model "Ming-Lite-Uni" ensures the precision of image generation. Furthermore, the model can perform context-aware dialogues, text-to-speech conversion, and diverse image editing, showcasing its potential across multiple domains.

** Smooth Voice and Text Conversion **

Ming-Omni's capabilities in language processing are equally impressive. It has the ability to understand dialects and perform voice cloning, converting input text into speech output in various dialects, demonstrating its strong linguistic adaptability. For example, users can input different dialect sentences, and the model will be able to understand and respond in the corresponding dialect, enhancing the naturalness and flexibility of human-computer interaction.

** Open Source, Promoting Research and Development **

Notably, Ming-Omni is the first known open-source model that matches GPT-4o in terms of modality support. Inclusion AI and Ant Group have committed to making all code and model weights public, aiming to inspire further research and development within the community and drive continuous progress in multimodal intelligence technology.

The release of Ming-Omni not only injects new vitality into the field of multimodal intelligence but also provides more possibilities for various applications. As technology continues to evolve, we look forward to Ming-Omni playing a greater role in future intelligent interactions.

Project: https://lucaria-academy.github.io/Ming-Omni/

相关资讯

蚂蚁集团开源Ming-lite-omni:首个媲美GPT-4o的开源多模态模型

蚂蚁集团旗下百灵大模型团队在近期蚂蚁技术日上宣布重大决定:将统一多模态大模型Ming-lite-omni进行全面开源。 这一举措不仅标志着蚂蚁集团在AI领域的又一次重大开放,更被业界视为首个在模态支持方面能够与GPT-4o相媲美的开源模型。 220亿参数的技术突破Ming-lite-omni基于Ling-lite构建,采用先进的MoE(专家混合)架构,拥有220亿总参数和30亿激活参数的强大配置。
5/29/2025 4:00:54 PM
AI在线

蚂蚁集团和inclusionAI联合推Ming-Omni:首个开源版多模态GPT-4o

近日,Inclusion AI 与 蚂蚁集团联合推出了一款名为 “Ming-Omni” 的先进多模态模型,标志着智能技术的新突破。 Ming-Omni 能够处理图像、文本、音频及视频,为多种应用提供强大支持,其功能不仅涵盖语音和图像生成,还具备多模态输入的融合处理能力。 ** 全面的多模态处理能力 **Ming-Omni 的设计中采用了专用编码器来提取不同模态的标记(tokens),这些标记经过 “Ling” 模块(即混合专家架构,MoE)进行处理,后者配备了新提议的模态特定路由器。
6/16/2025 1:01:53 PM
AI在线

这AI绝对偷了格莱美奖杯!直接把LLaMA喂成乐坛顶流:开源版Suno来了!

家人们震惊了! 现在 AI 成精啦,不仅能写能画,现在连唱功都是格莱美级的了! 魅惑空灵电音女声,也太好听了吧!
3/27/2025 1:24:00 PM
机器之心
  • 1