Perception with Raven
Raven-0 is a revolutionary real-time multimodal vision and video understanding system that fundamentally reimagines how AI perceives and interacts with humans. Unlike traditional systems that rely on frame-by-frame analysis, Raven-0 implements a context-aware, human-like perception system that mirrors the primary visual cortex’s functioning.
Key Capabilities
Raven-0 provides advanced perception capabilities that go far beyond traditional vision systems:
How Raven Works
Raven-0 implements a dual-track vision processing system that mirrors human perception:
Ambient Perception
Ambient perception acts as the replica’s “eyes,” continuously processing and understanding the visual environment at a low level. This provides ambient context that informs the replica’s responses without requiring explicit queries.
- Default Queries: Raven automatically processes visual information to understand who the user is, what they look like, their emotional state, and other contextual information.
- Custom Queries: You can define custom visual queries that Raven will continuously monitor for, allowing for specialized use cases.
Active Perception
When specific visual information is needed, Raven can perform detailed on-demand analysis:
- Speculative Execution: Raven uses speculative execution to pre-process likely visual queries while the user is speaking, minimizing perceived latency.
Screenshare Vision
Raven processes screen content with higher detail retention, allowing for animations, dynamic content and pages to be captured. This way, you can share your calendar, documents, and other content with your replica, and switching between screens is seamless.
End-of-call Perception Analysis
At the end of a call, Raven will summarize the visual artifacts that were detected throughout the call. This is a feature that is only available when the persona has raven-0
specified in the perception layer, and will be broadcasted as a Perception Analysis event and separately as a conversation callback.
Use Cases
Configuring Raven
You can configure Raven’s behavior through the Create Persona API by adjusting the perception
parameters.
Perception Parameters
The perception model to use. Options include raven-0
for advanced multimodal perception or basic
for simpler vision capabilities, and off
to disable all perception.
Custom queries that Raven will continuously monitor for in the visual stream. These provide ambient context without requiring explicit prompting. These allow the replica to be aware of these additional visual cues.
A prompt that details how and when to use the tools that are passed to the perception layer. This helps the replica understand the context of the perception tools and grounds it.
Tools that can be triggered based on visual context, enabling automated actions in response to visual cues from your system.