Voxtral Small

Production

Voxtral Small is our flagship 24-billion parameter model designed for production-scale voice intelligence applications. With superior accuracy and comprehensive multilingual support, Voxtral Small delivers enterprise-grade performance for demanding use cases.

Model Specifications

Parameters 24B

Context Window 32k tokens

Model Size ~48GB

License Apache 2.0

Performance

English WER SOTA

Multilingual 8+ Languages

Audio Understanding Competitive

API Cost $0.001/min

Deployment Requirements

GPU Memory Multi-GPU

Recommended A100/H100

Cloud Ready Yes

Local Deployment Advanced

Get Model on Hugging Face View Documentation

Voxtral Mini 1.0

Local

Voxtral Mini 1.0 is our compact 3-billion parameter model optimized for local, edge, and laptop deployments. Perfect for developers, researchers, and applications requiring privacy and offline capabilities while maintaining excellent performance.

Model Specifications

Parameters 3B

Context Window 32k tokens

Model Size ~6GB

License Apache 2.0

Performance

English WER Excellent

Multilingual 8+ Languages

Audio Understanding Good

Local Cost $0

Deployment Requirements

GPU Memory ~9.5GB VRAM

Recommended RTX 3090/4090

Laptop Ready Yes

Edge Deployment Yes

Get Model on Hugging Face View Documentation

Performance Benchmarks

Voxtral models achieve state-of-the-art results across multiple benchmarks, consistently outperforming leading speech recognition models in accuracy, multilingual support, and audio understanding.

English Word Error Rate (WER)

Voxtral Small

SOTA

Voxtral Mini

Excellent

Whisper Large v3

Good

GPT-4o Mini

Good

Lower WER indicates better accuracy. Voxtral Small achieves state-of-the-art results, while Voxtral Mini provides excellent performance for local deployment.

Multilingual Performance (FLEURS Dataset)

English Voxtral: 95% | Whisper: 85%

Spanish Voxtral: 92% | Whisper: 78%

French Voxtral: 91% | Whisper: 76%

Portuguese Voxtral: 89% | Whisper: 74%

Hindi Voxtral: 87% | Whisper: 72%

German Voxtral: 90% | Whisper: 75%

Dutch Voxtral: 88% | Whisper: 73%

Italian Voxtral: 89% | Whisper: 74%

Voxtral outperforms Whisper on all measured languages, demonstrating superior multilingual capabilities across diverse linguistic families.

Audio Understanding (40-example AU Benchmark)

Voxtral Small

Competitive

Matches GPT-4o-mini and Gemini 2.5 Flash performance

Voxtral Mini

Good

Excellent performance for local deployment scenarios

Built-in Q&A

Native

Direct question answering without additional LLM chaining

Audio understanding capabilities enable direct Q&A over audio content, automatic summarization, and semantic analysis without external dependencies.

Speed & Efficiency Comparison

Voxtral API

$0.001/min

50% cheaper than Whisper API

Voxtral Mini Local

Real-time

No API latency, instant processing

Context Window

32k tokens

~30 min audio or 40 min comprehension

Memory Efficiency

Optimized

Efficient memory usage for edge deployment

Voxtral offers superior cost efficiency and speed compared to closed-source alternatives, with the flexibility of local deployment for privacy-sensitive applications.

Choose Your Model

Select the perfect Voxtral model based on your specific requirements, deployment environment, and performance needs.

Choose Voxtral Small When:

Building production-scale voice applications
Requiring maximum accuracy and performance
Processing high volumes of audio content
Using cloud infrastructure with multiple GPUs
Needing enterprise-grade reliability
Working with complex multilingual content

Get Voxtral Small

Choose Voxtral Mini When:

Developing prototypes and proof-of-concepts
Requiring local/offline processing capabilities
Working with limited computational resources
Building privacy-sensitive applications
Deploying on edge devices or laptops
Learning and experimenting with speech AI

Get Voxtral Mini

Deployment Guide

Get started with Voxtral deployment using our comprehensive guides and code examples.

☁️

Cloud Deployment

Deploy Voxtral Small on cloud platforms with multi-GPU support for production workloads.

Mistral AI API integration
Hugging Face Inference Endpoints
Custom cloud infrastructure
Load balancing and scaling

Cloud Guide

💻

Local Deployment

Run Voxtral Mini locally on your machine for development, testing, and privacy-sensitive applications.

vLLM server setup
Docker containerization
GPU optimization
Performance tuning

Local Guide

🔧

API Integration

Integrate Voxtral into your applications using our comprehensive API documentation and examples.

Python SDK examples
REST API documentation
WebSocket streaming
Error handling

API Guide

Ready to Get Started with Voxtral?

Join thousands of developers building the future of speech AI with Voxtral's open-source models.

Try Voxtral Demo View Documentation

Voxtral Models

Voxtral Small

Model Specifications

Performance

Deployment Requirements

Voxtral Mini 1.0

Model Specifications

Performance

Deployment Requirements

Performance Benchmarks

English Word Error Rate (WER)

Multilingual Performance (FLEURS Dataset)

Audio Understanding (40-example AU Benchmark)

Voxtral Small

Voxtral Mini

Built-in Q&A

Speed & Efficiency Comparison

Voxtral API

Voxtral Mini Local

Context Window

Memory Efficiency

Choose Your Model

Choose Voxtral Small When:

Choose Voxtral Mini When:

Deployment Guide

Cloud Deployment

Local Deployment

API Integration

Ready to Get Started with Voxtral?