Skip to main content
STT model selection is in beta until April 6th.
The STT layer transcribes participant speech in real time using automatic speech recognition (ASR). You can select a model optimized for your use case and language requirements.

STT models

Select an STT model using the stt_engine parameter in the layers.stt object. The following models are available:
ModelDescription
tavus-autoAutomatically selects the best STT model for the conversation’s language. Recommended for most use cases.
tavus-parakeetHighest throughput, lowest latency for English and European languages.
tavus-sonioxPurpose-built for Indian languages with broad multilingual coverage.
tavus-whisperBroad multilingual coverage across all supported languages.
tavus-deepgram-medicalDomain-specific English STT optimized for clinical and healthcare vocabulary. English only.
tavus-advancedDeprecated. Still active but not recommended for new integrations.
Use tavus-auto unless you have a specific language or domain requirement. It automatically routes to the best model for each conversation.

Choosing the right model

A language is listed for a model only if both STT and TTS coverage are available.
CategoryRecommended modelSupported languages
General purposetavus-autoAll 43 languages
Indic languagestavus-sonioxBengali, English, Gujarati, Hindi, Kannada, Malayalam, Marathi, Punjabi, Tamil, Telugu + broad support for all other languages
English + Europeantavus-parakeetBulgarian, Croatian, Czech, Danish, Dutch, English, Finnish, French, German, Greek, Hungarian, Italian, Polish, Portuguese, Romanian, Russian, Slovak, Spanish, Swedish, Ukrainian
Broad multilingualtavus-whisper or tavus-sonioxAll 43 languages
Medical (English)tavus-deepgram-medicalEnglish
Using Smart Language Detection requires either tavus-auto, tavus-soniox, or tavus-whisper.

Configuring the STT layer

Define the STT layer under the layers.stt object.

stt_engine

Set the STT model for transcription:
"stt": {
  "stt_engine": "tavus-auto"
}

hotwords

Use this to prioritize certain names or terms that are difficult to transcribe.
"hotwords": "Roey is the name of the person you're speaking with."
The above helps the model transcribe “Roey” correctly instead of “Rowie.”
Use hotwords for proper nouns, brand names, or domain-specific language that standard STT engines might struggle with.

Example configuration

Below is an example persona with a configured STT layer using the recommended tavus-auto engine:
{
  "persona_name": "Customer Service Agent",
  "system_prompt": "You assist users by listening carefully and providing helpful answers.",
  "pipeline_mode": "full",
  "default_replica_id": "rf4e9d9790f0",
  "layers": {
    "stt": {
      "stt_engine": "tavus-auto",
      "hotwords": "Roey is the name of the person you're speaking with."
    }
  }
}
Refer to the Create Persona API for a complete list of supported fields.