To ensure the highest quality Phoenix-4 replica, your training video must follow the specifications outlined below.Documentation Index
Fetch the complete documentation index at: https://docs.tavus.io/llms.txt
Use this file to discover all available pages before exploring further.
Phoenix-4 training requirements have changed from previous models.
- Training now uses 30 seconds speaking + 30 seconds listening
- The listening segment must remain neutral with minimal movement
Camera & Framing
- Place the camera at eye level - ensure your face fills at least 25% of the frame.
- Use a stable camera at eye level - your face centered and clearly visible (waist-up framing)
- Sit at least 3 feet from the camera in a natural, Zoom-style setup - head, shoulders, and upper chest clearly visible. Ensure well-lit space with a simple background.
- Record in 1080p using a desktop app - avoid browser recording and low resolution cameras
- Head & clothing separation – keep your neck fully visible with clear separation between your head and clothing.

Hair & Clothing Separation
There must be a clear visual distinction between your head and clothing, and your neck fully visible.- Keep your neck and jawline fully visible with clear separation from your clothing
- Avoid high collars or clothing that covers the neck
- Keep hair away from the face and positioned behind the shoulders
- Avoid bangs, loose strands, or complex hairstyles that obscure the face, neck, or shoulder
Supported Video Formats
Whether recording through the Developer Portal or uploading a pre-recorded training video via the API,, ensure your video meets the following requirements- Minimum frame rate: 25 FPS
- Minimum resolution: 1080p
- Maximum file size: 750 MB
- Supported formats:
.webmand.mp4(H.264 video codec + AAC audio codec)
Consent Statement
If you’re creating a real human replica, you must include a verbal consent statement in the video. This ensures ethical use and compliance with data protection laws. Say the following script clearly in your video:I, (your name), am currently speaking and give consent to Tavus to create an AI clone of me by using the audio and video samples I provide. I understand that this AI clone can be used to create videos that look and sound like me.
Consent is only required for personal replicas. If you’re creating an synthetic replica or using AI-generated training video, you can skip this.
Recording Structure
Your video must be one continuous shot, containing 30 seconds of speaking followed by 30 seconds of still footage. You can use a script provided by Tavus or speak on any topic of your choice.Speaking Segment (30 Seconds)
- Speak naturally on any topic - the content itself does not matter
- Speak clearly and enunciate well - keeping your teeth visible while talking
- Keep head and body movement minimal
- Avoid hand gestures or sudden head turns\

Replica training typically takes 3–4 hours. You can track the training progress by:
- Providing a
callback_urlwhen creating the replica via API - Using the Get Replica Status API
- Checking the Developer Portal
High-Quality Training Example
| ✅ Do | ❌ Don’t |
|---|---|
| ✅ Keep your full face visible and in focus | ❌ Wear clothes that blend into the background |
| ✅ Keep hair and loose strands behind your shoulders and face | ❌ Wear accessories (hats, glasses, jewelry, etc.) |
| ✅ Sit still, facing the camera | ❌ Move around or change positions |
| ✅ Speak clearly with good enunciation (teeth fully visible) | ❌ Block your face or mouth with hands or microphone |
Full Body Replica
To create a full body replica for conversational video, follow these guidelines.- Record in vertical format, with the full body visible from head to toe
- Stand still throughout the recording and avoid large movements or hand gestures
- Use consistent lighting with minimal shadows or exposure changes
- 4K resolution is recommended for best quality

All standard recording requirements for replica training still apply.


