![Voxtral]()
Speech Is the Future of Human–Computer Interaction
Voice has always been our most natural user interface, long before keyboards and screens. Now, as AI evolves, speech is quickly becoming the preferred method of machine interaction. French AI leader Mistral is stepping up with its latest innovation: Voxtral, an open-source audio model built to rival proprietary giants like Whisper and ElevenLabs.
Introducing Voxtral: Open, Affordable, and Enterprise-Ready Speech AI
Mistral’s new Voxtral model offers production-grade voice intelligence while eliminating the cost and control restrictions of closed systems. With pricing starting at just $0.001 per minute, Voxtral makes high-quality transcription and language understanding more accessible than ever before.
![Speech transaction]()
Until now, businesses had to pick between affordable but inaccurate open solutions or precise but expensive proprietary APIs. Voxtral bridges this gap by offering robust transcription, semantic comprehension, and multilingual support at a fraction of the cost.
![Languages]()
Meet the Voxtral Model Family
Model |
Parameters |
Deployment |
Key Features |
Voxtral Small |
24 Billion |
Cloud/Production-scale |
Advanced Q&A, summarization, multilingual, enterprise-ready |
Voxtral Mini |
3 Billion |
Local/Edge/IoT |
Lightweight, fast, optimized for audio transcription |
Voxtral Mini Transcribe |
3 Billion (API only) |
Cloud API |
Ultra-low cost, Whisper-beating transcription performance |
Advanced Capabilities That Set Voxtral Apart
- 30-40 minute comprehension: Supports longer audio without breaking context.
- Built-in Q&A: Ask questions about the audio directly—no chaining needed.
- Text summarization: Generate summaries seamlessly in multiple languages.
- Multilingual support: Includes English, Spanish, Hindi, German, Dutch, French, and more.
- Function calling from voice: Trigger API calls or execute functions directly from spoken commands.
- Text processing: Retains strong NLP features from Mistral Small 3.1 language model.
![Audio]()
Benchmark Results: Voxtral Outperforms Whisper and Beyond
In rigorous benchmark tests, Voxtral Small consistently outperforms top models across English and multilingual tasks. It surpasses:
- Whisper large-v3
- GPT-4o-mini Transcribe
- ElevenLabs Scribe
- Gemini 2.5 Flash
It also leads state-of-the-art scores in key datasets like LibriSpeech, CHiME-4, Mozilla Common Voice, and FLEURS.
![Text]()
Easy to Deploy for Any Use Case
- Try for Free: Test Voxtral on Hugging Face or through Mistral’s chatbot Le Chat.
- Cloud Integration: Use its API in any app with pay-as-you-go pricing.
- Local Deployment: Run privately or offline for secure, regulated environments.
Enterprise Features for Advanced Users
Mistral offers custom solutions for large-scale or regulatory-sensitive deployments, including:
- Private, on-premise production deployments
- Domain-specific model fine-tuning (legal, medical, customer support, etc.)
- Upcoming features like emotion detection, speaker identification, and advanced diarization
- Dedicated integration support for enterprise-grade pipelines
![Mistral AI Voxtral]()