Create Persona

To get started, you’ll need to create a Persona that specifies your STT engine and VAD sensitivity. Here’s an example Persona:

{
    "system_prompt": "You are a storyteller. You like telling stories to people of all ages. Reply in brief utterances, and ask prompting questions to the user as you tell your stories to keep them engaged.",
    "context": "Here are some of your favorite stories: Little Red Riding Hood, The Ugly Duckling and The Three Little Pigs",
    "persona_name": "Mert the Storyteller",
    "layers": {
        "llm": {
            "model": "custom_model_here",
            "api_key": "example-api-key",
            "base_url": "open-ai-compatible-llm-http-endpoint",
            "tools": [<your-tools-here>],
            "speculative_inference": true,
        },
        "tts": {
            "api_key": "example-api-key",
            "tts_engine": "playht",
            "playht_user_id": "your-playht-user-id",
            "external_voice_id": "professional-voice-clone-id",
            "voice_settings": {}, // can also leave the "voice_settings" attr out if you want to use default settings
            "tts_emotion_control": false
        },
        "vqa": {
            "enabled": false // can also leave the "vqa" attr out if you want vqa enabled
        },
        "stt": {
            "participant_pause_sensitivity": "medium",
            "participant_interrupt_sensitivity": "medium",
            "stt_engine": "tavus-advanced",
            "hotwords": "This is a hotword example",
        }
    }
}

<persona created>, id: p234324a

STT Engine

The STT engine parameter controls the transcription engine that will be used. The default is tavus-advanced, but you can adjust this to tavus-turbo for a tiny latency improvement. However, tavus-advanced provides much higher transcription accuracy and supports non-English languages, so we highly recommend using it for almost all use cases.

Speech Sensitivity

These sensitivity parameters control the sensitivity of the Voice Activity Detection (VAD) engine. The defaults are medium, but you can adjust this to low or high depending on your needs. You can use the guidelines below to choose the right sensitivity for your use case:

Participant Pause Sensitivity

Controls how long of a pause the user can take before the replica responds. You can think of this as the replica’s “pause” tolerance.

  • low: A low sensitivity means you can take longer pauses before the replica responds. Use this for slower, more thoughtful conversations.
  • medium: The default behavior. A nice balance between responsiveness and thoughtful pauses.
  • high: A high sensitivity means the replica responds very quickly to the user’s speech. Use this for fast, chatty conversations, where small pauses from the participant will trigger a response.

Participant Interrupt Sensitivity

Controls how long the user can speak before the replica will be interrupted. You can think of this as the replica’s “interrupt” tolerance.

  • low: A low sensitivity means you can talk longer before the replica will stop talking and listen. Use this for slower, more thoughtful conversations.
  • medium: The default behavior. A nice balance between responsiveness and tolerance of short affirmations.
  • high: A high sensitivity means the replica will stop talking very quickly when the user speaks. Use this for fast, chatty conversations, where small responses from the participant will trigger a new response from the replica.