Configuring the Perception Layer
To configure the Perception Layer, define the following parameters within thelayers.perception object:
1. perception_model
Specifies the perception model to use.
- Options:
raven-1(default and recommended): Real-time emotional understanding from user audio, more natural and human-like interactions, plus all visual capabilities from raven-0.raven-0(legacy settings here)off: Disables the perception layer.
Screen Share Feature: When using Raven, screen share is enabled by default without additional configuration.
Audio Perception
Raven-1 (the default) analyzes user tone and emotion in real-time. This context is automatically sent to the LLM alongside utterances, enabling more natural, empathetic responses. For example:Perception Analysis Queries
Raven supports three kinds of queries that differ by when they run and how they affect the call:- perception_analysis_queries — Evaluated only at end of call. They do not change live behavior; they only shape the summary you get in the Perception Analysis event sent to your conversation callback.
- visual_awareness_queries and audio_awareness_queries — Evaluated throughout the call. Their answers are passed to the LLM as context, so the replica can react in real time. You receive this ongoing analysis in each user turn via the Utterance event as
user_visual_analysisanduser_audio_analysis.
Visual Perception Configuration
2. visual_awareness_queries
An array of custom queries that Raven continuously monitors in the visual stream.
visual_awareness_queries examples
visual_awareness_queries examples
Queries that Raven evaluates continuously during the call (on the order of every second). The answers are fed into the rolling visual context for the LLM, so the replica can respond to what it “sees.” This same context also supports the end-of-call summary. You can read the ongoing visual analysis for each user utterance in the Utterance event as user_visual_analysis.When to use: When you want the replica to pay attention to something visual in real time (e.g. expression, clothing, objects on screen).Example:
3. perception_analysis_queries
An array of custom queries that Raven processes at the end of the call to generate a visual analysis summary for the user.
perception_analysis_queries examples
perception_analysis_queries examples
Queries that are answered once, at the end of the call, by looking at what was observed over the whole conversation. They do not affect the call itself—only the content of the end-of-call summary. (Currently the summary is visual only; naming is kept general for future support.)When to use: When you want the post-call report to answer specific questions (e.g. “Did the user ever have two people on screen?”, “How often was the user looking at the screen?”).Example:The answers are delivered in a Perception Analysis event. Example payload:
You do not need to set
visual_awareness_queries in order to use perception_analysis_queries.4. visual_tool_prompt
Tell Raven when and how to trigger tools based on what it sees.
5. visual_tools
Defines callable functions that Raven can trigger upon detecting specific visual conditions. Each tool must include a type and a function object detailing its schema.
Please see Tool Calling for more details.
Audio Perception Configuration (Raven-1)
The following fields are available when usingraven-1 and enable custom audio-based perception capabilities.
6. audio_awareness_queries
An array of custom queries that Raven-1 continuously monitors in the audio stream. Use these to track specific audio patterns or user states.
audio_awareness_queries examples
audio_awareness_queries examples
Queries that Raven-1 evaluates continuously during the call on the audio stream. The answers are passed to the LLM as context so the replica can respond to tone and delivery. You can read the ongoing audio analysis for each user utterance in the Utterance event as user_audio_analysis. (There is no separate end-of-call summary for audio.)When to use: When you want the replica to react to how the user sounds (e.g. frustrated, confused, in a hurry).Example:
7. audio_tool_prompt
Tell Raven-1 when and how to trigger tools based on what it hears (beyond the automatic emotion analysis).
8. audio_tools
Defines callable functions that Raven-1 can trigger based on audio analysis. Each tool must include a type and a function object detailing its schema.
Example Configurations
Visual Perception Example
Visual Perception Example
This example demonstrates a persona that monitors for visual cues (bright outfits) and triggers a tool when detected.
Audio Perception Example
Audio Perception Example
This example demonstrates a persona that monitors user tone and escalates to a human agent when sustained frustration is detected.
Please see the Create a Persona endpoint for more details.

