Mistral AI Just Released a Text-to-Speech Model It Says Beats ElevenLabs — and It's Giving Away the Weights for Free
Voxtral TTS supports nine languages and adapts custom voices from under five seconds of audio, offering a low-cost option optimized for real-time and edge deployment.
- On Thursday, Paris-based Mistral released Voxtral TTS, an open-weight text-to-speech model designed for enterprise use. The system allows companies to run the model on their own servers or smartphones, avoiding reliance on third-party APIs.
- Mistral targets the voice AI market, which crossed $22 billion globally in 2026. Pierre Stock, Mistral's vice president of science, stated the company's strategy centers on efficiency and giving enterprises full control over their own AI infrastructure.
- Built on the Ministral 3B backbone, the model supports nine languages and achieves a time-to-first-audio of 90ms. It runs six times faster than real-time speech and requires roughly three gigabytes of RAM when quantized for inference.
- Internal human evaluations show Voxtral TTS achieved a 69.9 percent preference rate in voice customization tasks against ElevenLabs. Mistral intends to displace competitors by positioning itself as a cost-effective alternative to proprietary, API-first platforms.
- This release advances Mistral's goal of building a complete, enterprise-owned AI stack. Stock emphasized that audio represents a critical future interface, and the company plans to develop an end-to-end platform handling multimodal streams of audio, text, and image.
11 Articles
11 Articles
Mistral AI just released a text-to-speech model it says beats ElevenLabs — and it's giving away the weights for free
The enterprise voice AI market is in the middle of a land grab. ElevenLabs and IBM announced a collaboration just this week to bring premium voice capabilities into IBM's watsonx Orchestrate platform. Google Cloud has been expanding its Chirp 3 HD voices. OpenAI continues to iterate on its own speech synthesis. And the market underpinning all of this activity is enormous — voice AI crossed $22 billion globally in 2026, with the voice AI agents s…
With Voxtral TTS, Mistral AI has introduced its first text-to-speech model. The model is open-weight and, with only 4B parameters, is correspondingly lean. The model is designed not simply to read the text aloud, but to interpret it precisely, taking into account various factors such as... [Link to article: Voxtral TTS: Mistral releases open-weight model for text-to-speech]
Mistral AI launches Voxtral TTS, its first model of voice synthesis, with the ambition to make the voices generated more natural and expressive. If the demonstrations are convincing, the rendering still remains, in practice, unequal. Frenchman Mistral AI unveiled this 26 March 2026 Voxtral TTS, its very first
Mistral's first open-weight TTS model Voxtral clones voices from three seconds of audio across nine languages
French AI startup Mistral has released Voxtral TTS, its first text-to-speech model that supports nine languages and can clone voices from just three seconds of audio. The article Mistral's first open-weight TTS model Voxtral clones voices from three seconds of audio across nine languages appeared first on The Decoder.
Mistral AI has unveiled Voxtral TTS, an open-source AI model for speech generation. This is a direct challenge from the French to the Polish startup ElevenLabs, whose market value has exceeded $11 billion.
Coverage Details
Bias Distribution
- 100% of the sources are Center
Factuality
To view factuality data please Upgrade to Premium







