Voxtral Small

Production

Voxtral Small is our flagship 24-billion parameter model designed for production-scale voice intelligence applications. With superior accuracy and comprehensive multilingual support, Voxtral Small delivers enterprise-grade performance for demanding use cases.

Model Specifications

Parameters 24B
Context Window 32k tokens
Model Size ~48GB
License Apache 2.0

Performance

English WER SOTA
Multilingual 8+ Languages
Audio Understanding Competitive
API Cost $0.001/min

Deployment Requirements

GPU Memory Multi-GPU
Recommended A100/H100
Cloud Ready Yes
Local Deployment Advanced

Performance Benchmarks

Voxtral models achieve state-of-the-art results across multiple benchmarks, consistently outperforming leading speech recognition models in accuracy, multilingual support, and audio understanding.

English Word Error Rate (WER)

Voxtral Small
SOTA
Voxtral Mini
Excellent
Whisper Large v3
Good
GPT-4o Mini
Good

Lower WER indicates better accuracy. Voxtral Small achieves state-of-the-art results, while Voxtral Mini provides excellent performance for local deployment.

Multilingual Performance (FLEURS Dataset)

English Voxtral: 95% | Whisper: 85%
Spanish Voxtral: 92% | Whisper: 78%
French Voxtral: 91% | Whisper: 76%
Portuguese Voxtral: 89% | Whisper: 74%
Hindi Voxtral: 87% | Whisper: 72%
German Voxtral: 90% | Whisper: 75%
Dutch Voxtral: 88% | Whisper: 73%
Italian Voxtral: 89% | Whisper: 74%

Voxtral outperforms Whisper on all measured languages, demonstrating superior multilingual capabilities across diverse linguistic families.

Audio Understanding (40-example AU Benchmark)

Voxtral Small

Competitive

Matches GPT-4o-mini and Gemini 2.5 Flash performance

Voxtral Mini

Good

Excellent performance for local deployment scenarios

Built-in Q&A

Native

Direct question answering without additional LLM chaining

Audio understanding capabilities enable direct Q&A over audio content, automatic summarization, and semantic analysis without external dependencies.

Speed & Efficiency Comparison

Voxtral API

$0.001/min

50% cheaper than Whisper API

Voxtral Mini Local

Real-time

No API latency, instant processing

Context Window

32k tokens

~30 min audio or 40 min comprehension

Memory Efficiency

Optimized

Efficient memory usage for edge deployment

Voxtral offers superior cost efficiency and speed compared to closed-source alternatives, with the flexibility of local deployment for privacy-sensitive applications.

Choose Your Model

Select the perfect Voxtral model based on your specific requirements, deployment environment, and performance needs.

Choose Voxtral Small When:

  • Building production-scale voice applications
  • Requiring maximum accuracy and performance
  • Processing high volumes of audio content
  • Using cloud infrastructure with multiple GPUs
  • Needing enterprise-grade reliability
  • Working with complex multilingual content

Deployment Guide

Get started with Voxtral deployment using our comprehensive guides and code examples.

☁️

Cloud Deployment

Deploy Voxtral Small on cloud platforms with multi-GPU support for production workloads.

  • Mistral AI API integration
  • Hugging Face Inference Endpoints
  • Custom cloud infrastructure
  • Load balancing and scaling
Cloud Guide
💻

Local Deployment

Run Voxtral Mini locally on your machine for development, testing, and privacy-sensitive applications.

  • vLLM server setup
  • Docker containerization
  • GPU optimization
  • Performance tuning
Local Guide
🔧

API Integration

Integrate Voxtral into your applications using our comprehensive API documentation and examples.

  • Python SDK examples
  • REST API documentation
  • WebSocket streaming
  • Error handling
API Guide

Ready to Get Started with Voxtral?

Join thousands of developers building the future of speech AI with Voxtral's open-source models.