Overview
ElevenLabs offers several TTS models optimized for different use cases. This guide will help you choose the right model and configure it properly for your needs.Available TTS Models
Flash v2.5
Ultra-low latency for real-time applications
Turbo v2.5
Balanced quality and speed
Flash v2
English-only, low latency
Turbo v2
English-only, balanced performance
Flash v2.5
Best for: Real-time applications requiring ultra-low latency Use cases:- Agents Platform
- Interactive applications
- Games requiring immediate response
- Large-scale processing
- ~75ms latency
- 32 languages supported
- 40,000 character limit
- 50% lower price per character
Understanding Text Normalization
Text normalization converts text into a format that sounds more natural when spoken aloud. Without it, certain elements can be mispronounced or sound unnatural.Phone Number Normalization
Phone Number Normalization
Input:
123-456-7890Without normalization: “one hundred twenty-three dash four hundred fifty-six dash seven thousand eight hundred ninety”With normalization: “one two three, four five six, seven eight nine zero”Date Normalization
Date Normalization
Input:
2024-01-01Without normalization: “two thousand twenty-four dash zero one dash zero one”With normalization: “January first, two thousand twenty-four”Time Normalization
Time Normalization
Input:
14:30Without normalization: “fourteen colon thirty”With normalization: “two thirty PM”- Multilingual v2: Better at automatic normalization
- Flash v2.5/Turbo v2.5: Numbers aren’t normalized by default for speed
- Enterprise users: Can enable
apply_text_normalizationparameter for v2.5 models
Turbo v2.5
Best for: Balanced quality and speed applications Use cases: When you need higher quality than Flash but can accept slightly higher latency (~250-300ms) Key benefits:- High quality voice generation
- 32 languages supported
- 40,000 character limit
- 50% lower price per character
Flash v2
Best for: English-only real-time applications Use cases:- Agents Platform (English only)
- Interactive English applications
- ~75ms latency
- 30,000 character limit
Limitation: English only
Turbo v2
Best for: English-only balanced quality and speed Use cases: High-quality English voice generation with low latency Key benefits:- ~250-300ms latency
- 30,000 character limit
Limitation: English only
Voice Settings
Speaker Boost
Speaker Boost enhances similarity to the original speaker, making the generated voice sound more like the source voice used for cloning. Trade-offs:- ✅ Benefits: Increased similarity to the original speaker
- ❌ Costs: Higher computational load, increased latency
- 📊 Impact: Generally produces subtle differences
- Maximum similarity to the original voice is required
- Latency is not a critical concern
- Working with voices where subtle improvements are noticeable
- High-quality, non-real-time applications
- Real-time applications (like Agents Platform)
- Using Flash models for low-latency needs
- Computational cost outweighs the subtle benefits
Stability
Controls the consistency and emotional variability of the voice generation.Similarity
Determines how closely the AI adheres to the original voice.Style Exaggeration
Amplifies the speaking style of the original voice.Seed
The seed value controls variations that ElevenLabs can generate for a single utterance, ranging from0 to 4294967295.
According to ElevenLabs documentation: “If specified, our system will make a best effort to sample deterministically, such that repeated requests with the same seed and parameters should return the same result. Determinism is not guaranteed.”
Recommended Settings
Model Selection
- ✅ Use v2.5 models over v2 for multilingual support and higher character limits
- ⚡ Choose Flash for maximum speed (with slightly reduced quality)
- 🎯 Choose Turbo for better quality with acceptable latency
- 📝 Text normalization is not needed for v2 models; v2.5 models may need this option enabled
Voice Settings Configuration
Speaker Boost
Recommendation: Disabled by defaultEnable only if similarity isn’t sufficient and you can accept the latency increase.Applies to all models
Stability
Recommendation: Start around
50This provides a balanced middle ground for consistency.Applies to all modelsSimilarity
Recommendation: Start around
75Suitable for most use cases to balance quality and performance.Applies to all modelsStyle Exaggeration
Recommendation: Keep at
0 most of the timeOnly increase if you need amplified speaking style.Applies to all modelsSeed
Recommendation: Use with cautionBe aware that with many variations possible, some may sound unnatural.Applies to all models