Configuring the Perception Layer
To configure the Perception Layer, define the following parameters within thelayers.perception object:
1. perception_model
Specifies the perception model to use.
- Options:
raven-1(default and recommended): Real-time emotional understanding from user audio, more natural and human-like interactions, plus advanced visual perception.off: Disables the perception layer.
Screen Share Feature: When using Raven, screen share is enabled by default without additional configuration.
Audio Perception
Raven-1 (the default) analyzes user tone and emotion in real-time. This context is automatically sent to the LLM alongside utterances, enabling more natural, empathetic responses. For example:Audio analysis output is limited to 32 tokens per utterance.
Perception Analysis Queries
Raven supports three kinds of queries that differ by when they run and how they affect the call:- perception_analysis_queries - Evaluated only at end of call. They do not change live behavior; they only shape the summary you get in the Perception Analysis event sent to your conversation callback.
- visual_awareness_queries and audio_awareness_queries - Evaluated throughout the call. Their answers are passed to the LLM as context, so the PAL can react in real time. You receive this ongoing analysis in each user turn via the Utterance event as
user_visual_analysisanduser_audio_analysis.
Visual Perception Configuration
2. visual_awareness_queries
An array of custom queries that Raven continuously monitors in the visual stream.
visual_awareness_queries examples
visual_awareness_queries examples
Queries that Raven evaluates continuously during the call (on the order of every second). The answers are fed into the rolling visual context for the LLM, so the PAL can respond to what it “sees.” This same context also supports the end-of-call summary. You can read the ongoing visual analysis for each user utterance in the Utterance event as user_visual_analysis.When to use: when you want the PAL to pay attention to something visual in real time (e.g. expression, clothing, objects on screen).Example:
3. perception_analysis_queries
An array of custom queries that Raven processes at the end of the call to generate a visual analysis summary for the user.
perception_analysis_queries examples
perception_analysis_queries examples
Queries that are answered once, at the end of the call, by looking at what was observed over the whole conversation. They do not affect the call itself-only the content of the end-of-call summary. (Currently the summary is visual only; naming is kept general for future support.)When to use: When you want the post-call report to answer specific questions (e.g. “Did the user ever have two people on screen?”, “How often was the user looking at the screen?”).Example:The answers are delivered in a Perception Analysis event. Example payload:
You do not need to set
visual_awareness_queries in order to use perception_analysis_queries.4. visual_tool_prompt
Tell Raven when and how to trigger tools based on what it sees.
5. visual_tools
Legacy inline perception tools. For new integrations, create vision tools at /v2/tools with origin: "vision" and attach them to the PAL - see Tool Calling for Perception.
The field below defines OpenAI-style function objects directly on the PAL. Tavus still merges them at conversation start alongside any registry tools you attach, but inline tools cannot use registry-only settings (delivery, API auth, etc.).
Legacy field names (
perception_tools, perception_tool_prompt) still work - see Migration from Legacy Perception to Raven-1. For the full legacy inline reference, see Legacy inline tool calling.Audio Perception Configuration (Raven-1)
The following fields are available when usingraven-1 and enable custom audio-based perception capabilities.
6. audio_awareness_queries
An array of custom queries that Raven-1 continuously monitors in the audio stream. Use these to track specific audio patterns or user states.
Audio analysis output is limited to 32 tokens per query response.
audio_awareness_queries examples
audio_awareness_queries examples
Queries that Raven-1 evaluates continuously during the call on the audio stream. The answers are passed to the LLM as context so the PAL can respond to tone and delivery. You can read the ongoing audio analysis for each user utterance in the Utterance event as user_audio_analysis. (There is no separate end-of-call summary for audio.)When to use: when you want the PAL to react to how the user sounds (e.g. frustrated, confused, in a hurry).Example:
7. audio_tool_prompt
Tell Raven-1 when and how to trigger tools based on what it hears (beyond the automatic emotion analysis).
8. audio_tools
Legacy inline perception tools. For new integrations, create audio tools at /v2/tools with origin: "audio" and attach them to the PAL - see Tool Calling for Perception.
Requires
perception_model: "raven-1". Legacy inline details: Legacy inline tool calling.Example Configurations
The JSON below uses legacy inline
visual_tools / audio_tools for illustration. New PALs should define tools in the tools registry instead.Visual Perception Example
Visual Perception Example
This example demonstrates a PAL that monitors for visual cues (bright outfits) and triggers a tool when detected.
Audio Perception Example
Audio Perception Example
This example demonstrates a PAL that monitors user tone and escalates to a human agent when sustained frustration is detected.
Please see the Create a PAL endpoint for more details.

