Skip to main content
This is an event broadcasted by Tavus. An utterance contains the content of what was spoken and an indication of who spoke it (i.e. the user or replica). Each utterance event includes all of the words spoken by the user or replica measured from when the person started speaking to when they finished speaking. This could include multiple sentences or phrases. User utterances (role: user) are sent when the user finishes speaking and contain the transcribed text. Replica utterances (role: replica) are sent immediately when the replica begins speaking and contain the full LLM response text — including words the replica may not have actually spoken if it was interrupted. This makes them useful for quickly displaying the replica’s intended response.
If the replica is interrupted mid-sentence, the conversation.utterance event (role=replica) will still contain the full intended response. To track only the words the replica actually spoke, use streaming utterance events, which progressively report spoken text and indicate interruptions.
Utterance events can be used to keep track of what the user or the replica has said. To track how long an utterance lasts, please refer to duration in “User Started/Stopped Speaking” and “Replica Started/Stopped Speaking” events. When the speaker is the user and the persona uses Raven-1, properties may include user_audio_analysis (tone/delivery) and/or user_visual_analysis (appearance and demeanor). These fields are only present when there is relevant analysis for that utterance.
message_type
string

Message type indicates what product this event will be used for. In this case, the message_type will be conversation

Example:

"conversation"

event_type
string

This is the type of event that is being sent back. This field will be present on all events and can be used to distinguish between different event types.

Example:

"conversation.utterance"

conversation_id
string

The unique identifier for the conversation.

Example:

"c123456"

inference_id
string

This is a unique identifier for a given utterance. In this case, it will be the utterance the replica is speaking.

Example:

"83294d9f-8306-491b-a284-791f56c8383f"

properties
object

This object contains the speech property (the contents of the utterance). When the speaker is the user and the persona uses Raven-1, it may also include user_audio_analysis and/or user_visual_analysis when relevant analysis is available.