Mistral Unveils Voxtral: Its First Open-Source AI Audio Model

Vijay Kumari
Jul 16
1.3k
0
4

News

Voxtral

Speech Is the Future of Human–Computer Interaction

Voice has always been our most natural user interface, long before keyboards and screens. Now, as AI evolves, speech is quickly becoming the preferred method of machine interaction. French AI leader Mistral is stepping up with its latest innovation: Voxtral, an open-source audio model built to rival proprietary giants like Whisper and ElevenLabs.

Introducing Voxtral: Open, Affordable, and Enterprise-Ready Speech AI

Mistral’s new Voxtral model offers production-grade voice intelligence while eliminating the cost and control restrictions of closed systems. With pricing starting at just $0.001 per minute, Voxtral makes high-quality transcription and language understanding more accessible than ever before.

Speech transaction

Until now, businesses had to pick between affordable but inaccurate open solutions or precise but expensive proprietary APIs. Voxtral bridges this gap by offering robust transcription, semantic comprehension, and multilingual support at a fraction of the cost.

Languages

Meet the Voxtral Model Family

Model	Parameters	Deployment	Key Features
Voxtral Small	24 Billion	Cloud/Production-scale	Advanced Q&A, summarization, multilingual, enterprise-ready
Voxtral Mini	3 Billion	Local/Edge/IoT	Lightweight, fast, optimized for audio transcription
Voxtral Mini Transcribe	3 Billion (API only)	Cloud API	Ultra-low cost, Whisper-beating transcription performance

Advanced Capabilities That Set Voxtral Apart

30-40 minute comprehension: Supports longer audio without breaking context.
Built-in Q&A: Ask questions about the audio directly—no chaining needed.
Text summarization: Generate summaries seamlessly in multiple languages.
Multilingual support: Includes English, Spanish, Hindi, German, Dutch, French, and more.
Function calling from voice: Trigger API calls or execute functions directly from spoken commands.
Text processing: Retains strong NLP features from Mistral Small 3.1 language model.

Audio

Benchmark Results: Voxtral Outperforms Whisper and Beyond

In rigorous benchmark tests, Voxtral Small consistently outperforms top models across English and multilingual tasks. It surpasses:

Whisper large-v3
GPT-4o-mini Transcribe
ElevenLabs Scribe
Gemini 2.5 Flash

It also leads state-of-the-art scores in key datasets like LibriSpeech, CHiME-4, Mozilla Common Voice, and FLEURS.

Text

Easy to Deploy for Any Use Case

Try for Free: Test Voxtral on Hugging Face or through Mistral’s chatbot Le Chat.
Cloud Integration: Use its API in any app with pay-as-you-go pricing.
Local Deployment: Run privately or offline for secure, regulated environments.

Enterprise Features for Advanced Users

Mistral offers custom solutions for large-scale or regulatory-sensitive deployments, including:

Private, on-premise production deployments
Domain-specific model fine-tuning (legal, medical, customer support, etc.)
Upcoming features like emotion detection, speaker identification, and advanced diarization
Dedicated integration support for enterprise-grade pipelines

Mistral AI Voxtral