Supported Languages

Tavus supports over 30 languages for spoken interaction, powered by two integrated text-to-speech (TTS) engines: Cartesia and ElevenLabs. If a selected language is not supported by our default TTS engine (Cartesia), your CVI will automatically switch to ElevenLabs to kick off the conversation.
  • English (all variants)
  • French (France, Canada)
  • German
  • Spanish (Spain, Mexico)
  • Portuguese (Brazil, Portugal)
  • Chinese
  • Japanese
  • Hindi
  • Italian
  • Korean
  • Dutch
  • Polish
  • Russian
  • Swedish
  • Turkish
  • Indonesian
  • Filipino
  • Bulgarian
  • Romanian
  • Arabic (Saudi Arabia, UAE)
  • Czech
  • Greek
  • Finnish
  • Croatian
  • Malay
  • Slovak
  • Danish
  • Tamil
  • Ukrainian
  • Hungarian
  • Norwegian
  • Vietnamese
For a full list of supported languages for each TTS engine, please click on the following links:
By default, Tavus uses the Cartesia TTS engine.

Setting the Conversation Language

To specify a language, use the language parameter in the Create Conversation. You must use the full language name, not a language code.
cURL
curl --request POST \
  --url https://tavusapi.com/v2/conversations \
  --header 'Content-Type: application/json' \
  --header 'x-api-key: <api_key>' \
  --data '{
  "persona_id": "pdced222244b",
  "replica_id": "rfe12d8b9597",
  "properties": {
    "language": "spanish"
   }
}'
Language names must match exactly with those supported by the selected TTS engine.

Smart Language Detection

To automatically detect the participant’s spoken language throughout the conversation, set language to multilingual when creating the conversation:
cURL
curl --request POST \
  --url https://tavusapi.com/v2/conversations \
  --header 'Content-Type: application/json' \
  --header 'x-api-key: <api_key>' \
  --data '{
  "persona_id": "pdced222244b",
  "replica_id": "rfe12d8b9597",
  "properties": {
    "language": "multilingual"
   }
}'
This enables ASR (Automatic Speech Recognition) to automatically switch languages, dynamically adjusting the pipeline to transcribe and respond in the detected language throughout the conversation.