The Speech to Speech pipeline mode allows you to bypass ASR, LLM, and TTS by leveraging an external speech to speech model. You may use Tavus speech to speech model integrations or you may bring your own.

Getting started with Speech to Speech is as simple as configuring pipeline mode and speech to speech layer as part of the persona creation:

  • Set pipeline_mode to speech-to-speech.
  • Configure the speech to speech layer by supplying your provider and API key with websocket URL and session settings. Default provider is openai and api_key is required to specify a websocket_url.
  • System prompt and conversational context are not allowed. Instead, you can configure instructions as part of the session_settings in speech to speech layer.
  • The instructions may be updated in real time by sending a realtime_api event to the conversation through our Interactions protocol.
POST /v2/personas

{
    "persona_name": "Speech to Speech Persona",
    "pipeline_mode": "speech-to-speech"
    "layers": {
        "sts": {
            "provider": "openai",
            "api_key": "your-api-key",
            "websocket_url": "wss://your-websocket-url",
            "session_settings": {
                 "turn_detection": {"type": "server_vad"},
                 "input_audio_format": "pcm16",
                 "output_audio_format": "pcm16",
                 "voice": "sage",
                 "modalities": ["audio"],
                 "temperature": 0.8,
                 "instructions": "You are an AI assistant in a voice call. Respond to the user's audio input."
            }
    }
  }

}

From this call to Create Personas, you will receive a response containing a persona_id. For example in the following response, we have a persona_id of p24293d6.

{
  "persona_id": "p24293d6"
}

Using the above persona_id, we can create a conversation using the Create Conversation endpoint. In this request, we will include the replica_id of the replica that we want to use for this conversation and the persona_id that we created above. You can reuse personas when creating conversations. You can learn more about creating conversations here

POST /v2/conversations
{
  "replica_id": "re8e740a42",
  "persona_id": "p24293d6",
  "conversation_name": "Music Chat with DJ Kot"
}

Response:

{
  "conversation_id": "c12345",
  "conversation_name": "Music Chat with DJ Kot",
  "status": "active",
  "conversation_url": "https://tavus.daily.co/c12345",
  "replica_id": "re8e740a42",
  "persona_id": "p24293d6",
  "created_at": "2024-08-13T12:34:56Z"
}

In the response, you will receive a conversation_id. Using this conversation_id,you can join the conversation and connect to your speech to speech model.