> ## Documentation Index > Fetch the complete documentation index at: https://docs.tavus.io/llms.txt > Use this file to discover all available pages before exploring further. # Text-to-Speech (TTS) > Discover how to integrate custom voices from third-party TTS engines for multilingual or localized speech output. The **TTS Layer** in Tavus enables your persona to generate natural-sounding voice responses. You can configure the TTS layer using a third-party TTS engine provider. If `layers.tts` is not specified, Tavus will default to `cartesia` engine. If you use the default engine, you do not need to specify any parameters within the `tts` layer. Set **`layers.tts`** when you [Create Persona](/api-reference/personas/create-persona) or update a persona. For how a persona fits together, see [Persona overview](/sections/conversational-video-interface/persona/overview). For languages and locale-oriented setup, see [Language support](/sections/conversational-video-interface/language-support). ## Configuring the TTS Layer Define the TTS layer under the `layers.tts` object. The snippets below show only the **`tts`** object for readability; in a full persona payload it is nested under **`layers`** (see [Example configuration](#example-configuration)). Below are the parameters available: ### 1. `tts_engine` Specifies the supported third-party TTS engine. * **Options**: `cartesia`, `elevenlabs`, `azure`. ```json theme={null} "tts": { "tts_engine": "cartesia" } ``` ### 2. `api_key` Authenticates requests to your selected third-party TTS provider. You can obtain an API key from one of the following: Only required when using private voices. * Cartesia * ElevenLabs — if using pronunciation dictionaries, the key must have the `pronunciation_dictionaries_write` scope (or full account access). See ElevenLabs API key scopes. ```json theme={null} "tts": { "api_key": "your-api-key" } ``` ### 3. `external_voice_id` Specifies which voice to use with the selected TTS engine. To find supported voice IDs, refer to the provider’s documentation: * Cartesia * ElevenLabs * Azure (e.g. `en-US-JennyNeural`) For Azure, if you create a conversation in a specific language and the persona isn't responding, verify that the selected voice ID supports that language. You can use any publicly accessible custom voice from ElevenLabs or Cartesia without the provider's API key. If the custom voice is private, you still need to use the provider's API key. ```json theme={null} "tts": { "external_voice_id": "external-voice-id" } ``` ### 4. `tts_model_name` Model name used by the TTS engine. Refer to: * Cartesia * ElevenLabs `tts_model_name` is not supported when `tts_engine` is `azure`. Azure does not use a model name, so omit this field for Azure personas. ```json theme={null} "tts": { "tts_model_name": "sonic-3" } ``` ### 5. `tts_emotion_control` If set to `true`, enables emotion control in speech. **Defaults to `true`.** ```json theme={null} "tts": { "tts_emotion_control": true } ``` ### 6. `voice_settings` Optional object for controlling speed, volume, and similar effects. **Which approach you use depends on your TTS engine and model:** | Engine | Model | Approach | | ---------- | ---------- | -------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------- | | ElevenLabs | All models | `voice_settings` in persona config | | Cartesia | sonic-2 | `voice_settings` in persona config | | Cartesia | sonic-3 | **Either** `voice_settings` (global, set once per conversation) **or** prompt the LLM in `system_prompt` to output [Cartesia SSML tags](https://docs.cartesia.ai/build-with-cartesia/sonic-3/ssml-tags) for dynamic control. Not both. | **Cartesia sonic-3:** If you use `voice_settings` for speed/volume, those settings apply globally for the whole conversation and you cannot use SSML tags for dynamic, per-phrase control. If you want dynamic control, omit `voice_settings` and have the LLM output SSML tags instead. See [Cartesia volume, speed, and emotion](https://docs.cartesia.ai/build-with-cartesia/sonic-3/volume-speed-emotion). **ElevenLabs (all models):** Set parameters in the `voice_settings` object: | Parameter | ElevenLabs | | ------------------- | ----------------------------------------------------------- | | `speed` | Range `0.7` to `1.2` (`0.7` = slowest, `1.2` = fastest) | | `stability` | Range `0.0` to `1.0` (`0.0` = variable, `1.0` = stable) | | `similarity_boost` | Range `0.0` to `1.0` (`0.0` = creative, `1.0` = original) | | `style` | Range `0.0` to `1.0` (`0.0` = neutral, `1.0` = exaggerated) | | `use_speaker_boost` | Boolean (enhances speaker similarity) | See ElevenLabs Voice Settings for details. **Cartesia sonic-2:** Use the `voice_settings` object (e.g. `speed`, `emotion`). SSML tags are not used for sonic-2. **Cartesia sonic-3:** You can use **either** of these, but not both: * **`voice_settings`** — We accept speed/volume params for sonic-3. They apply **globally**, set once per conversation. Use this when you want a single default speed and volume for the entire conversation. Using `voice_settings` prevents dynamic SSML control. * **SSML in LLM output** — Omit `voice_settings` for speed/volume and instead add instructions to your `system_prompt` so the LLM outputs [Cartesia SSML tags](https://docs.cartesia.ai/build-with-cartesia/sonic-3/ssml-tags) in its responses. This gives you dynamic, per-phrase control. See [Cartesia volume, speed, and emotion](https://docs.cartesia.ai/build-with-cartesia/sonic-3/volume-speed-emotion). Emotion control is separate; see [Emotion Control with Phoenix-4](/sections/conversational-video-interface/quickstart/emotional-expression). **Example: system prompt for Cartesia sonic-3 (dynamic speed and volume)** If you are **not** using `voice_settings` for sonic-3, add instructions like this to your `system_prompt` so the LLM outputs Cartesia SSML tags: ``` When you want to emphasize a word or phrase, use Cartesia SSML tags for speed and volume: - To slow down: phrase - To speed up: phrase - To speak louder: phrase - To speak more quietly: phrase You can combine tags, e.g. important point. Only use these tags when it improves clarity or emphasis; keep most of your response in plain text. ``` **Example: voice\_settings (ElevenLabs, Cartesia sonic-2, or Cartesia sonic-3 global)** ```json theme={null} "tts": { "voice_settings": { "speed": 0.9 } } ``` For sonic-3, this sets global speed once per conversation; for sonic-2 and ElevenLabs, it applies as configured. ## Example Configuration Below is an example persona with a fully configured TTS layer: ```json Cartesia theme={null} { "persona_name": "AI Presenter", "system_prompt": "You are a friendly and informative video host.", "pipeline_mode": "full", "context": "You're delivering updates in a conversational tone.", "default_replica_id": "r90bbd427f71", "layers": { "tts": { "tts_engine": "cartesia", "api_key": "your-api-key", "external_voice_id": "external-voice-id", "tts_emotion_control": true, "tts_model_name": "sonic-3" } } } ``` ```json ElevenLabs theme={null} { "persona_name": "Narrator", "system_prompt": "You narrate long stories with clarity and consistency.", "pipeline_mode": "full", "context": "You're reading a fictional audiobook.", "default_replica_id": "r90bbd427f71", "layers": { "tts": { "tts_engine": "elevenlabs", "api_key": "your-api-key", "external_voice_id": "elevenlabs-voice-id", "voice_settings": { "speed": 0.9 }, "tts_emotion_control": true, "tts_model_name": "eleven_turbo_v2_5" } } } ``` ```json Azure theme={null} { "persona_name": "Azure Persona", "system_prompt": "You are a friendly host.", "pipeline_mode": "full", "default_replica_id": "r90bbd427f71", "layers": { "tts": { "tts_engine": "azure", "external_voice_id": "en-US-JennyNeural" } } } ``` Refer to [Create Persona](/api-reference/personas/create-persona) for a complete list of supported fields.