AI在线 AI在线

MOSS-TTSD Makes a Stunning Open Source Debut: A Million Hours of Training Creates a New King in AI Podcasts

MOSS-TTSD (Text to Spoken Dialogue), developed by the Tsinghua University Speech and Language Laboratory (Tencent AI Lab) in collaboration with Shanghai Chuangzhi College, Fudan University, and Musi Intelligent, has been officially open-sourced. This marks a major breakthrough in AI speech synthesis technology for dialogue scenarios.This speech dialogue generation model is based on the Qwen3-1.7B-base model and is trained further using approximately 1 million hours of single-speaker voice data and 400,000 hours of dialog voice data.

MOSS-TTSD (Text to Spoken Dialogue), developed by the Tsinghua University Speech and Language Laboratory (Tencent AI Lab) in collaboration with Shanghai Chuangzhi College, Fudan University, and Musi Intelligent, has been officially open-sourced. This marks a major breakthrough in AI speech synthesis technology for dialogue scenarios.

This speech dialogue generation model is based on the Qwen3-1.7B-base model and is trained further using approximately 1 million hours of single-speaker voice data and 400,000 hours of dialog voice data. It uses a discrete speech sequence modeling method to achieve high expressive spoken dialogue generation in both Chinese and English, making it particularly suitable for long-form content creation such as AI podcasts, audiobooks, and film and television dubbing.

The core innovation of MOSS-TTSD is its XY-Tokenizer, which adopts a two-stage multi-task learning approach. By using eight RVQ codebooks, it compresses the speech signal to a bitrate of 1 kbps while preserving semantic and acoustic information, ensuring naturalness and fluency in the generated speech. The model supports ultra-long speech generation of up to 960 seconds, avoiding unnatural transitions caused by segment stitching in traditional TTS models. Additionally, MOSS-TTSD has zero-shot voice cloning capabilities, enabling two-person voice cloning by uploading complete dialogues or single-person audio, and supports voice event control, such as laughter, adding more expressiveness to the speech.

Compared to other voice models in the market, MOSS-TTSD significantly outperforms the open-source model MoonCast in objective Chinese metrics, with excellent prosody and naturalness. However, compared to ByteDance's Douba voice model, it slightly lags in tone and rhythm. Nevertheless, with the advantages of being open-source and free for commercial use, MOSS-TTSD still shows strong application potential. Model weights, inference code, and API interfaces are fully open-sourced via GitHub (https://github.com/OpenMOSS/MOSS-TTSD) and HuggingFace (https://huggingface.co/fnlp/MOSS-TTSD-v0.5). Official documentation and online demo experiences are also available, providing developers with convenient access.

The release of MOSS-TTSD brings new vitality to the field of AI speech interaction, especially in scenarios such as long interviews, podcast production, and film and television dubbing, where its stability and expressiveness will drive the intelligent process of content creation. In the future, the team plans to further optimize the model, enhancing the accuracy of speech switching and emotional expression in multi-speaker scenarios.

Address: https://github.com/OpenMOSS/MOSS-TTSD

相关资讯

MOSS-TTSD震撼开源:百万小时训练打造AI播客新王者

由清华大学语音与语言实验室(Tencent AI Lab)联合上海创智学院、复旦大学和模思智能打造的MOSS-TTSD(Text to Spoken Dialogue)近日正式开源,标志着AI语音合成技术在对话场景中的重大突破。 这款基于Qwen3-1.7B-base模型续训练的语音对话生成模型,以约100万小时单说话人语音数据和40万小时对话语音数据为基础,采用离散化语音序列建模方法,实现了中英双语的高表现力对话语音生成,特别适合AI播客、有声小说和影视配音等长篇内容创作。 MOSS-TTSD的核心创新在于其XY-Tokenizer,采用双阶段多任务学习方式,通过八层RVQ码本将语音信号压缩至1kbps比特率,同时保留语义与声学信息,确保生成语音的自然度和流畅性。
8/1/2025 3:22:23 PM
AI在线

New BeanPod Video Generation Model to Be Released Tomorrow with Support for Seamlessly Multi-Camera Narration and Other Functions

Tomorrow, the 2025 FORCE Original Power Conference will be held in grand style. During the conference, the capability upgrade of the DouBao large model family will be unveiled. At the same time, the highly anticipated new DouBao · Video Generation Model will also be officially released.According to reports, the new DouBao · Video Generation Model has several outstanding features.
6/16/2025 9:49:01 AM
AI在线

Tencent Hunyuan 3D World Model Makes a Stunning Debut! Experience Immersive 360° Scenes for Free and Discover the Future of AI-Driven Virtual Worlds!

Recently, Tencent officially launched the Yuan 3D World Model 1.0 at the 2025 World Artificial Intelligence Conference and announced its full open-source release, becoming the industry's first 3D world generation model that supports immersive roaming, interaction, and simulation. With high-precision 360° scene generation and interactive roaming capabilities, this model has quickly sparked industry discussions, opening up infinite possibilities for game development, virtual reality (VR), and digital content creation.
7/28/2025 6:02:35 PM
AI在线
  • 1