Skip to main content
POST
/
v2
/
documents
/
{document_id}
/
recrawl
Recrawl Document
curl --request POST \
  --url https://tavusapi.com/v2/documents/{document_id}/recrawl \
  --header 'Content-Type: application/json' \
  --header 'x-api-key: <api-key>' \
  --data '
{
  "crawl": {
    "depth": 3,
    "max_pages": 50
  }
}
'
{
  "document_id": "d8-5c71baca86fc",
  "document_name": "Company Website",
  "document_url": "https://example.com/",
  "status": "recrawling",
  "progress": null,
  "error_message": "<string>",
  "created_at": "2024-01-01T12:00:00Z",
  "updated_at": "2024-01-15T10:30:00Z",
  "callback_url": "https://your-server.com/webhook",
  "tags": [
    "website",
    "company"
  ],
  "crawl_config": {
    "depth": 2,
    "max_pages": 10
  },
  "crawled_urls": [
    "https://docs.example.com/",
    "https://docs.example.com/getting-started"
  ],
  "last_crawled_at": "2024-01-01T12:05:00Z",
  "crawl_count": 1
}

Documentation Index

Fetch the complete documentation index at: https://docs.tavus.io/llms.txt

Use this file to discover all available pages before exploring further.

Authorizations

x-api-key
string
header
required

Path Parameters

document_id
string
required

Unique id of the crawl-backed website document to refresh. Use when the source site changed, you want to refresh content on a schedule, or retry after crawl or processing errors.

The document must be in ready or error (otherwise 409). It must have been created with a crawl configuration unless you supply crawl in the request body for this call.

The same document cannot be recrawled more than once within each 1-hour cooldown (429 if invoked too soon).

Body

application/json

Optional body. Omit entirely to reuse the crawl depth / max_pages stored from document creation, or include crawl to override those values for this run only.

After 202, status is typically recrawling until processing finishes. If you set callback_url when creating the document, webhooks report progress until the document returns to ready or error.

Poll Get Document for current status, crawl_count, and last_crawled_at.

Account limits: at most 5 concurrent crawls per user and at most 100 crawl-backed documents per user.

crawl
object

Optional depth and max_pages for this recrawl only; overrides stored crawl settings from document creation when provided. If omitted, the original crawl configuration is used.

What runs: the same starting URL as the original crawl, links followed within these limits, fresh page content processed, existing vectors replaced when processing completes, and crawl_count / last_crawled_at updated (see the 202 payload and Get Document while status is recrawling).

Response

Recrawl initiated successfully

document_id
string

Unique identifier for the document

Example:

"d8-5c71baca86fc"

document_name
string

Name of the document

Example:

"Company Website"

document_url
string

URL of the document

Example:

"https://example.com/"

status
enum<string>

After a successful recrawl request, typically recrawling until processing completes, then ready or error. Other values: started, processing.

Available options:
started,
processing,
ready,
error,
recrawling
Example:

"recrawling"

progress
integer | null

Processing progress as a percentage (0-100). Null when processing has not started or is complete.

Example:

null

error_message
string | null

Error code indicating why processing failed. Only present when status is error. Possible values include: file_download_failed, file_format_unsupported, file_size_too_large, file_empty, invalid_file_url, document_processing_failed, website_processing_failed, chunking_failed, embedding_failed, vector_store_failed, contact_support.

created_at
string

ISO 8601 timestamp of when the document was created

Example:

"2024-01-01T12:00:00Z"

updated_at
string

ISO 8601 timestamp of when the document was last updated

Example:

"2024-01-15T10:30:00Z"

callback_url
string

If set on Create Document, Tavus POSTs status updates here while this recrawl runs through completion.

Example:

"https://your-server.com/webhook"

tags
string[]

Array of document tags

Example:
["website", "company"]
crawl_config
object

The crawl configuration being used for the recrawl

crawled_urls
string[] | null

List of URLs from the previous crawl (will be updated when recrawl completes)

Example:
[
"https://docs.example.com/",
"https://docs.example.com/getting-started"
]
last_crawled_at
string | null

ISO 8601 timestamp of the previous crawl

Example:

"2024-01-01T12:05:00Z"

crawl_count
integer

Number of times the document has been crawled (will increment when recrawl completes)

Example:

1