ElevenLabs TTS Configuration Guide

Overview

ElevenLabs offers several TTS models optimized for different use cases. This guide will help you choose the right model and configure it properly for your needs.

Available TTS Models

Flash v2.5

Ultra-low latency for real-time applications

Turbo v2.5

Balanced quality and speed

Flash v2

English-only, low latency

Turbo v2

English-only, balanced performance

Flash v2.5

Best for: Real-time applications requiring ultra-low latency Use cases:

Agents Platform
Interactive applications
Games requiring immediate response
Large-scale processing

Key benefits:

~75ms latency
32 languages supported
40,000 character limit
50% lower price per character

Numbers aren’t normalized by default. Consider using the apply_text_normalization parameter (Enterprise only) or have your LLM normalize text beforehand.

Understanding Text Normalization

Text normalization converts text into a format that sounds more natural when spoken aloud. Without it, certain elements can be mispronounced or sound unnatural.

Phone Number Normalization

Input: 123-456-7890Without normalization: “one hundred twenty-three dash four hundred fifty-six dash seven thousand eight hundred ninety”With normalization: “one two three, four five six, seven eight nine zero”

Date Normalization

Input: 2024-01-01Without normalization: “two thousand twenty-four dash zero one dash zero one”With normalization: “January first, two thousand twenty-four”

Time Normalization

Input: 14:30Without normalization: “fourteen colon thirty”With normalization: “two thirty PM”

Model-specific normalization handling:

Multilingual v2: Better at automatic normalization
Flash v2.5/Turbo v2.5: Numbers aren’t normalized by default for speed
Enterprise users: Can enable apply_text_normalization parameter for v2.5 models

For best results with Flash models, preprocess your text using LLM prompts or regular expressions to normalize these elements before sending to TTS.

Turbo v2.5

Best for: Balanced quality and speed applications Use cases: When you need higher quality than Flash but can accept slightly higher latency (~250-300ms) Key benefits:

High quality voice generation
32 languages supported
40,000 character limit
50% lower price per character

Ideal when: You want Flash v2.5 use cases but prioritize quality over maximum speed

Flash v2

Best for: English-only real-time applications Use cases:

Agents Platform (English only)
Interactive English applications

Key benefits:

~75ms latency
30,000 character limit

Limitation: English only

Turbo v2

Best for: English-only balanced quality and speed Use cases: High-quality English voice generation with low latency Key benefits:

~250-300ms latency
30,000 character limit

Limitation: English only

Voice Settings

Speaker Boost

Speaker Boost enhances similarity to the original speaker, making the generated voice sound more like the source voice used for cloning. Trade-offs:

✅ Benefits: Increased similarity to the original speaker
❌ Costs: Higher computational load, increased latency
📊 Impact: Generally produces subtle differences

When to use:

Maximum similarity to the original voice is required
Latency is not a critical concern
Working with voices where subtle improvements are noticeable
High-quality, non-real-time applications

When to avoid:

Real-time applications (like Agents Platform)
Using Flash models for low-latency needs
Computational cost outweighs the subtle benefits

Stability

Controls the consistency and emotional variability of the voice generation.

Similarity

Determines how closely the AI adheres to the original voice.

Style Exaggeration

Amplifies the speaking style of the original voice.

Seed

The seed value controls variations that ElevenLabs can generate for a single utterance, ranging from 0 to 4294967295.

According to ElevenLabs documentation: “If specified, our system will make a best effort to sample deterministically, such that repeated requests with the same seed and parameters should return the same result. Determinism is not guaranteed.”

With many possible variations for the same utterance, there’s always a chance that some variations may sound unnatural.

Recommended Settings

Model Selection

✅ Use v2.5 models over v2 for multilingual support and higher character limits
⚡ Choose Flash for maximum speed (with slightly reduced quality)
🎯 Choose Turbo for better quality with acceptable latency
📝 Text normalization is not needed for v2 models; v2.5 models may need this option enabled

Voice Settings Configuration

Speaker Boost

Recommendation: Disabled by defaultEnable only if similarity isn’t sufficient and you can accept the latency increase.Applies to all models

Stability

Recommendation: Start around 50This provides a balanced middle ground for consistency.Applies to all models

Similarity

Recommendation: Start around 75Suitable for most use cases to balance quality and performance.Applies to all models

Style Exaggeration

Recommendation: Keep at 0 most of the timeOnly increase if you need amplified speaking style.Applies to all models

Seed

Recommendation: Use with cautionBe aware that with many variations possible, some may sound unnatural.Applies to all models

Refer to ElevenLabs’ official documentation for the latest updates and best practices. Refer to the ElevenLabs Voices page for details on selecting voices.

​Overview

​Available TTS Models

Flash v2.5

Turbo v2.5

Flash v2

Turbo v2

​Flash v2.5

​Understanding Text Normalization

​Turbo v2.5

​Flash v2

​Turbo v2

​Voice Settings

​Speaker Boost

​Stability

​Similarity

​Style Exaggeration

​Seed

​Recommended Settings

​Model Selection

​Voice Settings Configuration

Speaker Boost

Stability

Similarity

Style Exaggeration

Seed

Overview

Available TTS Models

Flash v2.5

Understanding Text Normalization

Turbo v2.5

Flash v2

Turbo v2

Voice Settings

Speaker Boost

Stability

Similarity

Style Exaggeration

Seed

Recommended Settings

Model Selection

Voice Settings Configuration