layers.tts is not specified, Tavus will default to cartesia engine.
If you use the default engine, you do not need to specify any parameters within the
tts layer.Configuring the TTS Layer
Define the TTS layer under thelayers.tts object. Below are the parameters available:
1. tts_engine
Specifies the supported third-party TTS engine.
- Options:
cartesia,elevenlabs.
2. api_key
Authenticates requests to your selected third-party TTS provider. You can obtain an API key from one of the following:
3. external_voice_id
Specifies which voice to use with the selected TTS engine. To find supported voice IDs, refer to the provider’s documentation:
You can use any publicly accessible custom voice from ElevenLabs or Cartesia without the provider’s API key. If the custom voice is private, you still need to use the provider’s API key.
4. tts_model_name
Model name used by the TTS engine. Refer to:
5. tts_emotion_control
If set to true, enables emotion control in speech. Defaults to true.
6. voice_settings
Optional object for controlling speed, volume, and similar effects. Which approach you use depends on your TTS engine and model:
| Engine | Model | Approach |
|---|---|---|
| ElevenLabs | All models | voice_settings in persona config |
| Cartesia | sonic-2 | voice_settings in persona config |
| Cartesia | sonic-3 | Either voice_settings (global, set once per conversation) or prompt the LLM in system_prompt to output Cartesia SSML tags for dynamic control. Not both. |
voice_settings object:
| Parameter | ElevenLabs |
|---|---|
speed | Range 0.7 to 1.2 (0.7 = slowest, 1.2 = fastest) |
stability | Range 0.0 to 1.0 (0.0 = variable, 1.0 = stable) |
similarity_boost | Range 0.0 to 1.0 (0.0 = creative, 1.0 = original) |
style | Range 0.0 to 1.0 (0.0 = neutral, 1.0 = exaggerated) |
use_speaker_boost | Boolean (enhances speaker similarity) |
See ElevenLabs Voice Settings for details.
voice_settings object (e.g. speed, emotion). SSML tags are not used for sonic-2.
Cartesia sonic-3: You can use either of these, but not both:
voice_settings— We accept speed/volume params for sonic-3. They apply globally, set once per conversation. Use this when you want a single default speed and volume for the entire conversation. Usingvoice_settingsprevents dynamic SSML control.- SSML in LLM output — Omit
voice_settingsfor speed/volume and instead add instructions to yoursystem_promptso the LLM outputs Cartesia SSML tags in its responses. This gives you dynamic, per-phrase control. See Cartesia volume, speed, and emotion.
voice_settings for sonic-3, add instructions like this to your system_prompt so the LLM outputs Cartesia SSML tags:
Example Configuration
Below is an example persona with a fully configured TTS layer:Refer to the Create Persona API for a complete list of supported fields.

