origin.
This page documents the tools registry (
/v2/tools with origin: "vision" or "audio"). If your PAL still embeds tools under layers.perception.visual_tools or layers.perception.audio_tools, see Legacy inline tool calling.Perception tool calling is only available with Raven (
perception_model: "raven-1" on the PAL’s perception layer).How Perception Tools Work
Perception runs as a parallel step alongside the conversational LLM. Raven analyses the audio and video streams continuously and fires a tool the moment it detects something matching one of the tool descriptions you defined. There are two flavors, picked via the tool’sorigin:
- Vision tools (
origin: "vision") - triggered by what Raven sees in the video stream (e.g. an ID card, a bright outfit, a hat). - Audio tools (
origin: "audio") - triggered by what Raven hears in the audio stream (e.g. sarcasm, sustained frustration).
Defining a Perception Tool
Thename, description, parameters, and delivery fields work the same way they do for LLM tools - see Tool Calling for LLM for the full reference.
| Field | Type | Required | Description |
|---|---|---|---|
name | string | ✅ | Unique identifier, scoped to your account. Must match ^[a-zA-Z_][a-zA-Z0-9_]{0,63}$. |
description | string | ✅ | What Raven should look or listen for. Be specific - this is what triggers the tool. |
parameters | object | ❌ | JSON Schema for the arguments Raven extracts when the cue is detected. |
origin | string | ✅ | "vision" or "audio". |
delivery | object | ❌ | Defaults to {"app_message": true}. API is also supported (same shape as LLM tools). |
You do not need to set
on_call, on_resolve, or static_filler on a perception tool. Omit them and the API applies the only allowed values (null, "fire_and_forget", null respectively). Passing any other value returns a 400.Vision Tool Example
Create a vision tool
conversation.perception_tool_call event with modality: "vision", the name, structured arguments, and a frames array of base64-encoded images that triggered the call.
Audio Tool Example
Create an audio tool
conversation.perception_tool_call event with modality: "audio" and the structured arguments.
Attaching to a PAL
Perception tools are attached the same way as LLM tools:Attach perception tools
perception layer has perception_model: "raven-1" for vision and audio tools to fire.
Delivery
Perception tools use the samedelivery field as LLM tools - see Tool Delivery and Tool Authentication. The only perception-specific bit: the app-message event is conversation.perception_tool_call (not conversation.tool_call).
Because perception tools are fire-and-forget, the response body your API returns is not consumed by the conversational LLM. A
2xx is enough to acknowledge receipt; a non-2xx is logged but does not affect the conversation.Replace
<api-key> with your actual API key. You can generate one in the PAL Maker.
