Skip to main content
POST
/
v2
/
documents
curl --request POST \
  --url https://tavusapi.com/v2/documents \
  --header 'Content-Type: application/json' \
  --header 'x-api-key: <api-key>' \
  --data '
{
  "document_url": "https://docs.example.com/"
}
'
{
  "document_id": "d8-5c71baca86fc",
  "document_name": "Example Docs",
  "document_url": "https://docs.example.com/",
  "status": "started",
  "progress": null,
  "error_message": "<string>",
  "created_at": "2024-01-01T12:00:00Z",
  "updated_at": "2024-01-01T12:00:00Z",
  "callback_url": "https://your-server.com/webhook",
  "tags": [
    "docs",
    "website"
  ],
  "crawl_config": {
    "depth": 2,
    "max_pages": 10
  },
  "crawled_urls": [
    "https://docs.example.com/",
    "https://docs.example.com/getting-started",
    "https://docs.example.com/api"
  ],
  "last_crawled_at": "2024-01-01T12:00:00Z",
  "crawl_count": 1
}

Documentation Index

Fetch the complete documentation index at: https://docs.tavus.io/llms.txt

Use this file to discover all available pages before exploring further.

Authorizations

x-api-key
string
header
required

Body

application/json
document_url
string
required

Direct URL to a file or a website for your Knowledge Base. Submitting this URL starts processing asynchronously; the document can be used in conversations once processing completes, which may take a few minutes depending on file size.

For now, our Knowledge Base only supports documents written in English and works best for conversations in English. We will be expanding our Knowledge Base language support soon.

Maximum file size 50MB. Supported file formats: .pdf, .txt, .docx, .doc, .png, .jpg, .pptx, .csv, and .xlsx. Website URLs are supported: a snapshot of the page is processed into document content; use the crawl object for multi-page crawling from a starting URL.

Example:

"https://docs.example.com/"

document_name
string

Optional name for the document. If not provided, a default name will be generated.

Example:

"Example Docs"

callback_url
string

Optional URL that receives status updates while the document processes asynchronously (e.g. started, processing, ready, error).

Example:

"https://your-server.com/webhook"

tags
string[]

Optional tags to categorize the document for management and for use with document-based access in conversations. After the document is ready, attach it via document_ids on Create Persona or Create Conversation.

Example:
["docs", "website"]
crawl
object

Optional configuration for website crawling. When provided with a website URL, the system follows links from the starting URL and processes multiple pages into a single document. Without this parameter, only the single page at the URL is scraped.

Rate limits: at most 100 crawl documents per user, at most 5 concurrent crawls at any time, and a 1-hour cooldown between recrawls of the same document.

To fetch fresh content after a crawled document exists, use Recrawl Document.

Response

Document created successfully

document_id
string

Unique identifier for the created document

Example:

"d8-5c71baca86fc"

document_name
string

Name of the document

Example:

"Example Docs"

document_url
string

URL of the document or website

Example:

"https://docs.example.com/"

status
enum<string>

Current status of the document processing. Possible values: started, processing, ready, error, recrawling.

Available options:
started,
processing,
ready,
error,
recrawling
Example:

"started"

progress
integer | null

Processing progress as a percentage (0-100). Null when processing has not started or is complete.

Example:

null

error_message
string | null

Error code indicating why processing failed. Only present when status is error. Possible values include: file_download_failed, file_format_unsupported, file_size_too_large, file_empty, invalid_file_url, document_processing_failed, website_processing_failed, chunking_failed, embedding_failed, vector_store_failed, contact_support.

created_at
string

ISO 8601 timestamp of when the document was created

Example:

"2024-01-01T12:00:00Z"

updated_at
string

ISO 8601 timestamp of when the document was last updated

Example:

"2024-01-01T12:00:00Z"

callback_url
string

URL that will receive status updates

Example:

"https://your-server.com/webhook"

tags
string[]

Array of document tags

Example:
["docs", "website"]
crawl_config
object

The crawl configuration used for this document (only present for crawled websites)

crawled_urls
string[] | null

List of URLs that were crawled (only present for crawled websites after processing completes)

Example:
[
"https://docs.example.com/",
"https://docs.example.com/getting-started",
"https://docs.example.com/api"
]
last_crawled_at
string | null

ISO 8601 timestamp of when the document was last crawled

Example:

"2024-01-01T12:00:00Z"

crawl_count
integer | null

Number of times the document has been crawled

Example:

1