Create Persona

To get started, you’ll need to create a Persona that specifies your custom LLM. Here’s an example Persona:

{
    "system_prompt": "You are a storyteller. You like telling stories to people of all ages. Reply in brief utterances, and ask prompting questions to the user as you tell your stories to keep them engaged.",
    "context": "Here are some of your favorite stories: Little Red Riding Hood, The Ugly Duckling and The Three Little Pigs",
    "persona_name": "Mert the Storyteller",
    "layers": {
        "llm": {
            "model": "custom_model_here",
            "api_key": "example-api-key",
            "base_url": "open-ai-compatible-llm-http-endpoint",
            "tools": [<your-tools-here>],
            "speculative_inference": true,
        },
        "tts": {
            "api_key": "example-api-key",
            "tts_engine": "playht",
            "playht_user_id": "your-playht-user-id",
            "external_voice_id": "professional-voice-clone-id",
            "voice_settings": {} // can also leave the "voice_settings" attr out if you want to use default settings
            "tts_emotion_control": false
        },
        "vqa": {
            "enabled": false // can also leave the "vqa" attr out if you want vqa enabled
        },
        "stt": {
            "participant_pause_sensitivity": "medium",
            "participant_interrupt_sensitivity": "medium",
            "stt_engine": "tavus-advanced"
        }
    }
}

<persona created>, id: p234324a

Launch a Conversation

With this persona, if we were to launch a conversation:

{
    "replica_id": "r123456789",
    "conversation_name": "My Conversation",
    "callback_url": "https://webhook.site/",
    "persona_id": "p234324a",
    "conversational_context": "You are talking to Maya, who is from Dallas, Texas. She likes a good mystery book, and her favorite author is Agatha Christie."
}

We will see user utterances coming into endpoint you provided with the /chat/completions suffix as the user speaks during a conversation.

If you set up a test webhook and set the base_url to point to that webhook’s url, you can examine an incoming chat completion request. You may notice the conversation_id is provided as a request header, and your API key can be used to authenticate requests coming onto your servers.

We make the chat completion request to the URL you provide with these settings:

completion = self.client.chat.completions.create(
    model=custom_model_here,
    messages=context,
    extra_headers=self.extra_headers,
    stream=True,
    tools=tools
)

Which means your OpenAI compatible LLM should be configured to be streamable (ie. send back chunks of chat completions over SSE (Server-side events)). Here is the OpenAI documentation on chat completions as a quick reference point on what to be returning in the request.

Speculative Inference

The speculative_inference parameter activates speculative inference, a technique that can significantly reduce response times in speech-to-text and natural language processing applications. This can be configured in the Persona.

Overview of Speculative Inference

Speculative inference is an advanced processing technique that allows AI systems to begin generating responses before all input data is available. In the context of speech recognition and natural language processing:

Behavior

When speculative_inference is set to true:

The replica will not start to speak until it is confident the user is done speaking; meanwhile progressive transcriptions will be sent to the LLM layer, each one including prior transcriptions accumulating until the replica starts speaking.

Benefits

  • Significantly faster response times
  • Improved user experience due to reduced latency
  • More natural, conversational interaction

Create a Persona with Speculative Inference

{
    "system_prompt": "You are a storyteller. You like telling stories to people of all ages. Reply in brief utterances, and ask prompting questions to the user as you tell your stories to keep them engaged.",
    "context": "Here are some of your favorite stories: Little Red Riding Hood, The Ugly Duckling and The Three Little Pigs",
    "persona_name": "Mert the Storyteller",
    "layers": {
        "llm": {
            "model": "custom_model_here",
            "api_key": "example-api-key",
            "base_url": "open-ai-compatible-llm-http-endpoint",
            "speculative_inference": true,
        }
    }
}

<persona created>, id: p234324a

Tools / Function Calling

You can pass in tools (function calls) to your LLM to enable it to perform tasks beyond just text generation. This is useful if you want to integrate external APIs or services into your LLM. Please note that tools are only available for custom LLMs, and require an intermediate layer to be built on your end to handle the tool calls. Currently, we do not run the tools for you.

Here’s a full example of a persona that includes a tool to get the current weather for a given location:

{
    "system_prompt": "You are a helpful assistant.",
    "context": "Help users get the weather for a given location.",
    "persona_name": "Weather Assistant",
    "layers": {
        "llm": {
            "model": "custom_model_here",
            "api_key": "example-api-key",
            "base_url": "open-ai-compatible-llm-http-endpoint",
            "tools": [
                {
                    "type": "function",
                    "function": {
                        "name": "get_current_weather",
                        "description": "Get the current weather in a given location",
                        "parameters": {
                            "type": "object",
                            "properties": {
                                "location": {
                                    "type": "string",
                                    "description": "The city and state, e.g. San Francisco, CA",
                                },
                                "unit": {
                                    "type": "string",
                                    "enum": ["celsius", "fahrenheit"],
                                },
                            },
                            "required": ["location"],
                        },
                    },
                }
            ],
        },
        "tts": {
            "api_key": "example-api-key",
            "tts_engine": "elevenlabs",
            "external_voice_id": "professional-voice-clone-id",
            "voice_settings": {} // can also leave the "voice_settings" attr out if you want to use default settings
            "tts_emotion_control": false
        },
        "vqa": {
            "enabled": false // can also leave the "vqa" attr out if you want vqa enabled
        }
    }
}

LLM Abstractions

We have abstracted the system such that the LLM instructions receive 3 distinct “sub-instructions” that are concatenated together. Let’s use storytelling as an example persona.

If my goal is to create a storyteller, I can do so with the combination of system_prompt (Persona), context (Persona) and conversational_context (Conversation).

  • Now, system_prompt can be something along the lines of: “You are a storyteller. You like telling stories to people of all ages.” This defines what a storyteller is.
  • context is for what that storyteller focuses on: “Here are some of your favorite stories to tell: Little Red Riding Hood, The Ugly Duckling and The Three Little Pigs” This defines what a storyteller has.
  • conversational_context is for all the details that revolve around that specific interaction between the user & replica. Something like: “You are talking to {user_name} (you may pass that in dynamically per conversation request). They are {x} years old. They like listening to {genre} stories.” This defines who the storyteller is talking to.

This allows you to create as many conversations as you want using the storyteller persona and not share conversation specific context, while also allowing you to create default system prompts on your end and create personas of varying contexts (crime novel storyteller, horror storyteller, children’s storyteller etc).

This would populate the initial system_prompt of the chat completion request we send your way, and since we send the entire context each time, anything you have in the system_prompt persists. You may also completely parse the incoming request body and choose what to send your LLM, building your own abstraction in place of what we currently offer.