Skip to main content

Documentation Index

Fetch the complete documentation index at: https://docs.tavus.io/llms.txt

Use this file to discover all available pages before exploring further.

Use this path when you call Create Replica with train_image_url and voice_name. The image file must be reachable at a publicly accessible URL (for example a presigned S3 GET URL), same as for video uploads.
We recommend using the Developer Portal to upload an image as it provides real-time validation to ensure it meets all requirements before training.

Image Requirements

Upload a clear, front-facing headshot that meets the following requirements:
  • Formats: JPG or PNG
  • Minimum resolution: 512×512 pixels
  • Only one person visible in the image
  • Head and shoulders clearly visible in frame
  • No glasses, hats, or face-covering accessories
  • Avoid visible jewelry such as large earrings or necklaces
  • Keep hair behind the shoulders and away from the face and neck
  • Use even lighting with minimal shadows across the face
Screenshot 2026 05 06 115555 1
Screenshot 2026 05 07 125757
Image-based training is a faster and simpler way to create a replica without recording a training video. It offers a simpler setup and is ideal for quick prototyping or AI-generated characters.
Images will not work if they contain multiple people, subjects under 18, non-human characters, visible accessories (such as glasses, headphones, or jewelry), hair in front of shoulders, off-center framing, or unnatural poses such as leaning or lying down.

How voice_name works

Image-based training does not create a new voice from your source material. Instead, you must set voice_name to a stock voice identifier slug (for example anna). This selects a voice tied to an existing Tavus stock replica so the trained replica has a usable default voice.

Example voice_name values

Below are example voice_name slugs with a short sample clip for each.
benjamin
james
liam
anna
julia
ivy
When you run Conversational Video Interface (CVI) sessions later, you are not locked into that stock voice for every conversation. You can attach a persona whose TTS layer uses an external voice (from Cartesia or ElevenLabs). See Text-to-Speech (TTS) for how to set external_voice_id and related fields.
By using the image training API, you affirm that you have the rights to use the image you supply (for example likeness and publicity rights where applicable). Tavus may reject images that appear to depict unauthorized or impermissible subjects. This is separate from the verbal consent requirement for human replicas trained from video; see Consent Statement when using train_video_url.
Replica training typically takes 3–4 hours.