Skip to main content
This guide will walk you through the different endpoints of Firecrawl and how to use them fully with all its parameters.

Basic scraping with Firecrawl

To scrape a single page and get clean markdown content, you can use the /scrape endpoint.
# pip install firecrawl-py

from firecrawl import Firecrawl

firecrawl = Firecrawl(api_key="fc-YOUR-API-KEY")

doc = firecrawl.scrape("https://firecrawl.dev")

print(doc.markdown)

Scraping PDFs

Firecrawl supports PDFs. Use the parsers option (e.g., parsers: ["pdf"]) when you want to ensure PDF parsing.

Scrape options

When using the /scrape endpoint, you can customize scraping with the options below.

Formats (formats)

  • Type: array
  • Strings: ["markdown", "links", "html", "rawHtml", "summary"]
  • Object formats:
    • JSON: { type: "json", prompt, schema }
    • Screenshot: { type: "screenshot", fullPage?, quality?, viewport? }
    • Change tracking: { type: "changeTracking", modes?, prompt?, schema?, tag? } (requires markdown)
  • Default: ["markdown"]

Full page content vs main content (onlyMainContent)

  • Type: boolean
  • Description: By default the scraper returns only the main content. Set to false to return full page content.
  • Default: true

Include tags (includeTags)

  • Type: array
  • Description: HTML tags/classes/ids to include in the scrape.

Exclude tags (excludeTags)

  • Type: array
  • Description: HTML tags/classes/ids to exclude from the scrape.

Wait for page readiness (waitFor)

  • Type: integer
  • Description: Milliseconds to wait before scraping (use sparingly).
  • Default: 0

Freshness and cache (maxAge)

  • Type: integer (milliseconds)
  • Description: If a cached version of the page is newer than maxAge, Firecrawl returns it instantly; otherwise it scrapes fresh and updates the cache. Set 0 to always fetch fresh.
  • Default: 172800000 (2 days)

Request timeout (timeout)

  • Type: integer
  • Description: Max duration in milliseconds before aborting.
  • Default: 30000 (30 seconds)

PDF parsing (parsers)

  • Type: array
  • Description: Control parsing behavior. To parse PDFs, set parsers: ["pdf"].

Actions (actions)

When using the /scrape endpoint, Firecrawl allows you to perform various actions on a web page before scraping its content. This is particularly useful for interacting with dynamic content, navigating through pages, or accessing content that requires user interaction.
  • Type: array
  • Description: Sequence of browser steps to run before scraping.
  • Supported actions:
    • wait { milliseconds }
    • click { selector }
    • write { selector, text }
    • press { key }
    • scroll { direction: "up" | "down" }
    • scrape { selector } (scrape a sub-element)
    • executeJavascript { script }
    • pdf (trigger PDF render in some flows)
from firecrawl import Firecrawl

firecrawl = Firecrawl(api_key='fc-YOUR-API-KEY')

doc = firecrawl.scrape('https://example.com', {
  "actions": [
    {"type": "wait", "milliseconds": 1000},
    {"type": "click", "selector": "#accept"},
    {"type": "scroll", "direction": "down"},
    {"type": "write", "selector": "#q", "text": "firecrawl"},
    {"type": "press", "key": "Enter"}
  ],
  "formats": ['markdown']
})

print(doc.markdown)

Example Usage

cURL
curl -X POST https://api.firecrawl.dev/v2/scrape \
    -H '
    Content-Type: application/json' \
    -H 'Authorization: Bearer fc-YOUR-API-KEY' \
    -d '{
      "url": "https://docs.firecrawl.dev",
      "formats": [
        "markdown",
        "links",
        "html",
        "rawHtml",
        { "type": "screenshot", "fullPage": true, "quality": 80 }
      ],
      "includeTags": ["h1", "p", "a", ".main-content"],
      "excludeTags": ["#ad", "#footer"],
      "onlyMainContent": false,
      "waitFor": 1000,
      "timeout": 15000,
      "parsers": ["pdf"]
    }'
In this example, the scraper will:
  • Return the full page content as markdown.
  • Include the markdown, raw HTML, HTML, links, and a screenshot in the response.
  • Include only the HTML tags <h1>, <p>, <a>, and elements with the class .main-content, while excluding any elements with the IDs #ad and #footer.
  • Wait for 1000 milliseconds (1 second) before scraping to allow the page to load.
  • Set the maximum duration of the scrape request to 15000 milliseconds (15 seconds).
  • Parse PDFs explicitly via parsers: ["pdf"].
Here is the API Reference: Scrape Endpoint Documentation

JSON extraction via formats

Use the JSON format object in formats to extract structured data in one pass:
curl -X POST https://api.firecrawl.dev/v2/scrape \
  -H 'Content-Type: application/json' \
  -H 'Authorization: Bearer fc-YOUR-API-KEY' \
  -d '{
    "url": "https://firecrawl.dev",
    "formats": [{
      "type": "json",
      "prompt": "Extract the features of the product",
      "schema": {"type": "object", "properties": {"features": {"type": "object"}}, "required": ["features"]}
    }]
  }'

Extract endpoint

Use the dedicated extract job API when you want asynchronous extraction with status polling.
import Firecrawl from '@mendable/firecrawl-js';

const firecrawl = new Firecrawl({ apiKey: 'fc-YOUR-API-KEY' });

// Start extract job
const started = await firecrawl.startExtract({
  urls: ['https://docs.firecrawl.dev'],
  prompt: 'Extract title',
  schema: { type: 'object', properties: { title: { type: 'string' } }, required: ['title'] }
});

// Poll status
const status = await firecrawl.getExtractStatus(started.id);
console.log(status.status, status.data);

Crawling multiple pages

To crawl multiple pages, use the /v2/crawl endpoint.
cURL
curl -X POST https://api.firecrawl.dev/v2/crawl \
    -H 'Content-Type: application/json' \
    -H 'Authorization: Bearer fc-YOUR-API-KEY' \
    -d '{
      "url": "https://docs.firecrawl.dev"
    }'
Returns an id
{ "id": "1234-5678-9101" }

Check Crawl Job

Used to check the status of a crawl job and get its result.
cURL
curl -X GET https://api.firecrawl.dev/v2/crawl/1234-5678-9101 \
  -H 'Content-Type: application/json' \
  -H 'Authorization: Bearer fc-YOUR-API-KEY'

Pagination/Next URL

If the content is larger than 10MB or if the crawl job is still running, the response may include a next parameter, a URL to the next page of results.

Crawl prompt and params preview

You can provide a natural-language prompt to let Firecrawl derive crawl settings. Preview them first:
cURL
curl -X POST https://api.firecrawl.dev/v2/crawl/params-preview \
  -H 'Content-Type: application/json' \
  -H 'Authorization: Bearer fc-YOUR-API-KEY' \
  -d '{
    "url": "https://docs.firecrawl.dev",
    "prompt": "Extract docs and blog"
  }'

Crawler options

When using the /v2/crawl endpoint, you can customize the crawling behavior with:

includePaths

  • Type: array
  • Description: Regex patterns to include.
  • Example: ["^/blog/.*$", "^/docs/.*$"]

excludePaths

  • Type: array
  • Description: Regex patterns to exclude.
  • Example: ["^/admin/.*$", "^/private/.*$"]

maxDiscoveryDepth

  • Type: integer
  • Description: Max discovery depth for finding new URLs.

limit

  • Type: integer
  • Description: Max number of pages to crawl.
  • Default: 10000

crawlEntireDomain

  • Type: boolean
  • Description: Explore across siblings/parents to cover the entire domain.
  • Default: false
  • Type: boolean
  • Description: Follow links to external domains.
  • Default: false

allowSubdomains

  • Type: boolean
  • Description: Follow subdomains of the main domain.
  • Default: false

delay

  • Type: number
  • Description: Delay in seconds between scrapes.
  • Default: undefined

scrapeOptions

  • Type: object
  • Description: Options for the scraper (see Formats above).
  • Example: { "formats": ["markdown", "links", {"type": "screenshot", "fullPage": true}], "includeTags": ["h1", "p", "a", ".main-content"], "excludeTags": ["#ad", "#footer"], "onlyMainContent": false, "waitFor": 1000, "timeout": 15000}
  • Defaults: formats: ["markdown"], caching enabled by default (maxAge ~ 2 days)

Example Usage

cURL
curl -X POST https://api.firecrawl.dev/v2/crawl \
    -H 'Content-Type: application/json' \
    -H 'Authorization: Bearer fc-YOUR-API-KEY' \
    -d '{
      "url": "https://docs.firecrawl.dev",
      "includePaths": ["^/blog/.*$", "^/docs/.*$"],
      "excludePaths": ["^/admin/.*$", "^/private/.*$"],
      "maxDiscoveryDepth": 2,
      "limit": 1000
    }'
The /v2/map endpoint identifies URLs related to a given website.

Usage

cURL
curl -X POST https://api.firecrawl.dev/v2/map \
    -H 'Content-Type: application/json' \
    -H 'Authorization: Bearer fc-YOUR-API-KEY' \
    -d '{
      "url": "https://docs.firecrawl.dev"
    }'

Map Options

  • Type: string
  • Description: Filter links containing text.

limit

  • Type: integer
  • Description: Maximum number of links to return.
  • Default: 100

sitemap

  • Type: "only" | "include" | "skip"
  • Description: Control sitemap usage during mapping.
  • Default: "include"

includeSubdomains

  • Type: boolean
  • Description: Include subdomains of the website.
  • Default: true
Here is the API Reference for it: Map Endpoint Documentation Thanks for reading!