> ## Documentation Index
> Fetch the complete documentation index at: https://docs.tavus.io/llms.txt
> Use this file to discover all available pages before exploring further.

# Recrawl Document

> Trigger a recrawl of a website document to fetch fresh content.

<Info>
  For AI agents, use `https://docs.tavus.io/openapi.yaml` for the full HTTP API contract.
</Info>


## OpenAPI

````yaml post /v2/documents/{document_id}/recrawl
openapi: 3.0.3
info:
  title: Tavus Developer API Collection
  version: 1.0.0
  contact: {}
servers:
  - url: https://tavusapi.com
security:
  - apiKey: []
tags:
  - name: Videos
  - name: Replicas
  - name: Voices
  - name: Conversations
  - name: Personas
  - name: Pronunciation Dictionaries
  - name: Replacements
  - name: Transcriptions
  - name: Documents
paths:
  /v2/documents/{document_id}/recrawl:
    post:
      tags:
        - Documents
      summary: Recrawl Document
      description: Trigger a recrawl of a website document to fetch fresh content.
      operationId: recrawlDocument
      parameters:
        - in: path
          name: document_id
          required: true
          schema:
            type: string
          description: >
            Unique id of the crawl-backed **website** document to refresh. Use
            when the source site changed, you want to refresh content on a
            schedule, or retry after crawl or processing errors.


            The document must be in **`ready`** or **`error`** (otherwise
            `409`). It must have been created with a `crawl` configuration
            unless you supply `crawl` in the request body for this call.


            The same document cannot be recrawled more than once within each
            **1-hour** cooldown (`429` if invoked too soon).
          example: d8-5c71baca86fc
      requestBody:
        required: false
        description: >
          Optional body. Omit entirely to reuse the crawl `depth` / `max_pages`
          stored from document creation, or include `crawl` to override those
          values for this run only.


          After **`202`**, status is typically **`recrawling`** until processing
          finishes. If you set `callback_url` when [creating the
          document](/api-reference/documents/create-document), webhooks report
          progress until the document returns to **`ready`** or **`error`**.


          Poll [Get Document](/api-reference/documents/get-document) for current
          status, `crawl_count`, and `last_crawled_at`.


          **Account limits:** at most **5** concurrent crawls per user and at
          most **100** crawl-backed documents per user.
        content:
          application/json:
            schema:
              type: object
              properties:
                crawl:
                  type: object
                  description: >
                    Optional `depth` and `max_pages` for **this recrawl only**;
                    overrides stored crawl settings from document creation when
                    provided. If omitted, the original crawl configuration is
                    used.


                    **What runs:** the same starting URL as the original crawl,
                    links followed within these limits, fresh page content
                    processed, existing vectors replaced when processing
                    completes, and `crawl_count` / `last_crawled_at` updated
                    (see the `202` payload and Get Document while status is
                    `recrawling`).
                  properties:
                    depth:
                      type: integer
                      description: >-
                        How many levels deep to follow links from the starting
                        URL (1-10). A depth of 1 means only pages directly
                        linked from the starting URL.
                      minimum: 1
                      maximum: 10
                      example: 2
                    max_pages:
                      type: integer
                      description: >-
                        Maximum number of pages to crawl (1-100). Processing
                        stops once this limit is reached.
                      minimum: 1
                      maximum: 100
                      example: 10
            examples:
              override_crawl:
                summary: Override crawl settings for this recrawl
                value:
                  crawl:
                    depth: 3
                    max_pages: 50
      responses:
        '202':
          description: Recrawl initiated successfully
          content:
            application/json:
              schema:
                type: object
                properties:
                  document_id:
                    type: string
                    description: Unique identifier for the document
                    example: d8-5c71baca86fc
                  document_name:
                    type: string
                    description: Name of the document
                    example: Company Website
                  document_url:
                    type: string
                    description: URL of the document
                    example: https://example.com/
                  status:
                    type: string
                    description: >-
                      After a successful recrawl request, typically
                      **`recrawling`** until processing completes, then
                      **`ready`** or **`error`**. Other values: `started`,
                      `processing`.
                    enum:
                      - started
                      - processing
                      - ready
                      - error
                      - recrawling
                    example: recrawling
                  progress:
                    type: integer
                    nullable: true
                    description: >-
                      Processing progress as a percentage (0-100). Null when
                      processing has not started or is complete.
                    example: null
                  error_message:
                    type: string
                    nullable: true
                    description: >-
                      Error code indicating why processing failed. Only present
                      when status is `error`. Possible values include:
                      `file_download_failed`, `file_format_unsupported`,
                      `file_size_too_large`, `file_empty`, `invalid_file_url`,
                      `document_processing_failed`, `website_processing_failed`,
                      `chunking_failed`, `embedding_failed`,
                      `vector_store_failed`, `contact_support`.
                  created_at:
                    type: string
                    description: ISO 8601 timestamp of when the document was created
                    example: '2024-01-01T12:00:00Z'
                  updated_at:
                    type: string
                    description: ISO 8601 timestamp of when the document was last updated
                    example: '2024-01-15T10:30:00Z'
                  callback_url:
                    type: string
                    description: >-
                      If set on [Create
                      Document](/api-reference/documents/create-document), Tavus
                      POSTs status updates here while this recrawl runs through
                      completion.
                    example: https://your-server.com/webhook
                  tags:
                    type: array
                    description: Array of document tags
                    items:
                      type: string
                    example:
                      - website
                      - company
                  crawl_config:
                    type: object
                    description: The crawl configuration being used for the recrawl
                    properties:
                      depth:
                        type: integer
                        example: 2
                      max_pages:
                        type: integer
                        example: 10
                  crawled_urls:
                    type: array
                    nullable: true
                    description: >-
                      List of URLs from the previous crawl (will be updated when
                      recrawl completes)
                    items:
                      type: string
                    example:
                      - https://docs.example.com/
                      - https://docs.example.com/getting-started
                  last_crawled_at:
                    type: string
                    nullable: true
                    description: ISO 8601 timestamp of the previous crawl
                    example: '2024-01-01T12:05:00Z'
                  crawl_count:
                    type: integer
                    description: >-
                      Number of times the document has been crawled (will
                      increment when recrawl completes)
                    example: 1
        '400':
          description: Bad Request - Validation error
          content:
            application/json:
              schema:
                type: object
                properties:
                  error:
                    type: string
                    description: The error message
                    example: Document was not created with crawl configuration
        '401':
          description: Unauthorized
          content:
            application/json:
              schema:
                type: object
                properties:
                  message:
                    type: string
                    description: The error message
                    example: Invalid access token
        '404':
          description: Not Found
          content:
            application/json:
              schema:
                type: object
                properties:
                  message:
                    type: string
                    description: The error message
                    example: Document not found
        '409':
          description: Conflict - Document state prevents recrawl
          content:
            application/json:
              schema:
                type: object
                properties:
                  error:
                    type: string
                    description: The error message
                    example: >-
                      Document must be in 'ready' or 'error' state to recrawl,
                      current status: processing
        '429':
          description: Too Many Requests - Rate limit exceeded or cooldown period
          content:
            application/json:
              schema:
                type: object
                properties:
                  error:
                    type: string
                    description: The error message
                    example: >-
                      Recrawl cooldown: please wait 45 minutes before recrawling
                      this document.
      security:
        - apiKey: []
components:
  securitySchemes:
    apiKey:
      type: apiKey
      in: header
      name: x-api-key

````