Skip to main content
The STT Layer in Tavus empowers your persona to transcribe and comprehend spoken input in real time. The STT layer supports smart_turn_detection, powered by Sparrow-0, for dynamic and responsive conversation flow with intelligent turn-taking.
Legacy Feature: smart_turn_detection in the STT layer is a legacy approach to turn-taking. For new implementations, use the Conversational Flow layer with Sparrow-1 for improved performance, faster response times, and more natural conversations.

Configuring the STT Layer

Define the STT layer under the layers.stt object. Below are the parameters available:

1. participant_pause_sensitivity

Controls how long the participant can pause before the replica responds. This setting helps you fine-tune the pacing of the conversation.
Recommended: Use the Conversational Flow layer with turn_taking_patience for more advanced control over turn-taking behavior with Sparrow-1.
  • Options:
    • high: The replica replies quickly after short pauses. Good for fast and casual conversations.
    • medium (default): Balanced timing. Allows natural pauses without feeling rushed or delayed.
    • low: The replica waits a bit longer before replying. Useful for slower or more thoughtful discussions.
    • verylow: The replica allows even longer pauses before responding.
    • superlow: The replica has the longest response delay, making it suitable for conversations where participants often pause.
"participant_pause_sensitivity": "medium"

2. participant_interrupt_sensitivity

Controls how easily the participant can interrupt the replica while it is talking. This setting helps adjust how the replica handles overlap in conversation.
Recommended: Use the Conversational Flow layer with replica_interruptibility for more advanced control over interruption handling with Sparrow-1.
  • Options:
    • high: The replica stops speaking immediately when the participant starts talking. Ideal for quick and back-and-forth exchanges.
    • medium (default): Balanced behavior. Allows short interruptions without breaking the flow.
    • low: The participant needs to speak more clearly or for a bit longer to interrupt.
    • verylow: The replica usually keeps talking unless the interruption is strong.
    • superlow: The replica rarely stops mid-sentence. It will usually finish speaking before responding.
"participant_interrupt_sensitivity": "medium"

3. hotwords

Use this to prioritize certain names or terms that are difficult to transcribe.
This field is only available for tavus-advanced engine.
"hotwords": "Roey is the name of the person you're speaking with."
The above query helps the model transcribe “Roey” correctly instead of “Rowie.”
Use hotwords for proper nouns, brand names, or domain-specific language that standard STT engines might struggle with.

4. smart_turn_detection (Legacy)

Legacy Feature: This is a legacy approach to turn-taking. For new implementations, use the Conversational Flow layer with turn_detection_model: "sparrow-1" for improved performance.
Enables dynamic turn-taking using the Sparrow-0 model, which dynamically adjusts the timeout based on what the users say. It sets a longer timeout when the user is likely not done speaking, and a shorter timeout when the user is likely done speaking.
"smart_turn_detection": true

How Turn-taking Works

Legacy Diagram: The following diagram illustrates how the legacy Sparrow-0 model works. For new implementations, use the Conversational Flow layer with Sparrow-1 for improved performance and more natural turn-taking.
  • smart_turn_detection is only supported by the tavus-advanced engine.
  • Disabling smart_turn_detection turns off Sparrow-0 and uses a fixed response delay based on participant_pause_sensitivity.
  • For new implementations, configure turn-taking via the Conversational Flow layer instead.

Example Configuration

Below is an example persona with a fully configured STT layer:
{
  "persona_name": "Customer Service Agent",
  "system_prompt": "You assist users by listening carefully and providing helpful answers.",
  "pipeline_mode": "full",
  "context": "You're handling voice-based customer support inquiries.",
  "default_replica_id": "rfe12d8b9597",
  "layers": {
    "stt": {
      "participant_pause_sensitivity": "medium",
      "participant_interrupt_sensitivity": "low",
      "hotwords": "support",
      "smart_turn_detection": true
    }
  }
}
Refer to the Create Persona API for a complete list of supported fields.