Text-to-Speech (TTS)

The TTS Layer in Tavus enables your persona to generate natural-sounding voice responses. You can configure the TTS layer using a third-party tts engine provider. If layers.tts is not specified, Tavus will default to cartesia engine.

If you use the default engine, you do not need to specify any parameters within the tts layer.

Configuring the TTS Layer

Define the TTS layer under the layers.tts object. Below are the parameters available:

1. `tts_engine`

Specifies the supported third-party TTS engine.

Options: cartesia, elevenlabs.

"tts": {
  "tts_engine": "cartesia"
}

2. `api_key`

Authenticates requests to your selected third-party TTS provider. You can obtain an API key from one of the following:

Only required when using private voices.

"tts": {
  "api_key": "your-api-key"
}

3. `external_voice_id`

Specifies which voice to use with the selected TTS engine. To find supported voice IDs, refer to the provider’s documentation:

You can use any publicly accessible custom voice from ElevenLabs or Cartesia without the provider’s API key. If the custom voice is private, you still need to use the provider’s API key.

"tts": {
  "external_voice_id": "external-voice-id"
}

4. `tts_model_name`

Model name used by the TTS engine. Refer to:

"tts": {
  "tts_model_name": "sonic"
}

5. `tts_emotion_control`

If set to true, enables emotion control in speech.

Only available for the cartesia engine.

"tts": {
  "tts_emotion_control": true
}

6. `voice_settings`

Optional object containing additional settings specific to the selected TTS engine. These settings vary per engine:

Parameter	Cartesia (Sonic-1 only)	ElevenLabs
`speed`	Range `-1.0` to `1.0` (negative = slower, positive = faster)	Range `0.0` to `1.0` (`0.0` = slowest, `1.0` = fastest)
`emotion`	Array of `"emotion:level"` tags (e.g., `"positivity:high"`)	Not available
`stability`	Not available	Range `0.0` to `1.0` (`0.0` = variable, `1.0` = stable)
`similarity_boost`	Not available	Range `0.0` to `1.0` (`0.0` = creative, `1.0` = original)
`style`	Not available	Range `0.0` to `1.0` (`0.0` = neutral, `1.0` = exaggerated)
`use_speaker_boost`	Not available	Boolean (enhances speaker similarity)

For more information on each voice setting, see:
• Cartesia Speed and Emotion Controls
• ElevenLabs Voice Settings

"tts": {
  "voice_settings": {
    "speed": 0.5,
    "emotion": ["positivity:high", "curiosity"]
  }
}

Example Configuration

Below is an example persona with a fully configured TTS layer:

{
  "persona_name": "AI Presenter",
  "system_prompt": "You are a friendly and informative video host.",
  "pipeline_mode": "full",
  "context": "You're delivering updates in a conversational tone.",
  "default_replica_id": "r665388ec672",
  "layers": {
    "tts": {
      "tts_engine": "cartesia",
      "api_key": "your-api-key",
      "external_voice_id": "external-voice-id",
      "voice_settings": {
        "speed": "normal",
        "emotion": ["positivity:high", "curiosity"]
      },
      "tts_emotion_control": true,
      "tts_model_name": "sonic"
    }
  }
}

Refer to the Create Persona API for a complete list of supported fields.

Getting Started

Conversational Video Interface

Replica

Video Generation

Resources

Text-to-Speech (TTS)

Configuring the TTS Layer

1. `tts_engine`

2. `api_key`

3. `external_voice_id`

4. `tts_model_name`

5. `tts_emotion_control`

6. `voice_settings`

Example Configuration

Getting Started

Conversational Video Interface

Replica

Video Generation

Resources

​Configuring the TTS Layer

​1. tts_engine

​2. api_key

​3. external_voice_id

​4. tts_model_name

​5. tts_emotion_control

​6. voice_settings

​Example Configuration

Configuring the TTS Layer

1. `tts_engine`

2. `api_key`

3. `external_voice_id`

4. `tts_model_name`

5. `tts_emotion_control`

6. `voice_settings`

Example Configuration