API Reference

Scrape any website. Even the ones that fight back.

Base URL https://api.sessemi.com

Sessemi is a scraping API that handles anti-bot protection automatically. Send a URL, get back clean HTML. Challenges from Cloudflare and DataDome are detected and solved transparently, with more vendors shipping soon. Your response comes back with success: true and clean content ready to parse.

Authentication

All requests require an API key passed via the X-API-Key header.

curl -X POST https://api.sessemi.com/scrape \
  -H "X-API-Key: your_api_key" \
  -H "Content-Type: application/json" \
  -d '{"url": "https://example.com"}'bash

You can also pass the key as a query parameter: ?key=your_api_key

Quick Start

Scrape a simple site (datacenter proxy, 1 credit):

curl -X POST https://api.sessemi.com/scrape \
  -H "X-API-Key: your_api_key" \
  -H "Content-Type: application/json" \
  -d '{
    "url": "https://example.com"
  }'bash

Scrape a protected site with challenge solving (residential proxy):

curl -X POST https://api.sessemi.com/scrape \
  -H "X-API-Key: your_api_key" \
  -H "Content-Type: application/json" \
  -d '{
    "url": "https://example.com",
    "pool": "residential"
  }'bash

Response:

{
  "success": true,
  "url": "https://example.com",
  "resolved_url": "https://example.com/",
  "html": "<!doctype html>...",
  "html_size": 48210,
  "status_code": 200,
  "challenge_type": "solved",
  "challenge_provider": "cloudflare",
  "pool": "residential",
  "solved": true,
  "credits_charged": 5,
  "credits_remaining": 2495,
  "duration_ms": 8420,
  "cookies": [...]
}json

POST /scrape

Scrape a URL. Challenges are detected and solved automatically. Fingerprint management is always on.

Request Parameters

Parameter	Type	Description
url required	string	Target URL to scrape.
pool	string	Proxy pool: `datacenter` (default, 1 credit) or `residential` (3 credits, or 5 with solve). Datacenter is fast and cheap — use for sites without anti-bot protection. Residential uses premium IPs with higher trust scores — use for protected sites.
solve	boolean	Enable challenge solving (Cloudflare, DataDome). Adds +2 credits to the pool base rate. Enabled by default with `pool: "residential"`. When a challenge is detected, Sessemi solves it automatically and returns the page content. Set `solve: false` explicitly to disable — challenges will be returned as `challenge_unsolved`.
country	string	Proxy country code (e.g. `FR`, `DE`, `US`). 240+ countries supported. Routes through a residential proxy in that country with proprietary fingerprint management. Only with `pool: "residential"` or `stealth: true`.
session	string	Session ID. Pins requests to a persistent environment — cookies, localStorage, and JS state carry across requests. Use for authenticated scraping (login → scrape) or multi-step JS interactions. Expires after 5 minutes idle or 10 minutes total. Any string works — created automatically on first use. Not needed for anti-bot bypass or pagination — Sessemi handles anti-bot bypass automatically.
screenshot	boolean	Include a base64-encoded PNG screenshot in the response. +1 credit
wait_for	string	CSS selector to wait for before returning HTML. Comma-separated for OR logic: `.products, .no-results`
wait_for_js	string	JS expression that must return truthy before returning HTML. Example: `window.__DATA__ !== undefined`
wait_timeout	integer	Max seconds to wait for `wait_for` / `wait_for_js`. Default: 10.
js_on_load	string	JavaScript run in the page while it loads, before `wait_for` / `wait_for_js` resolves. Use it to dismiss an overlay or cookie-consent banner that hides content. It may run more than once as the page settles, so keep it idempotent — acting on an element that isn't present should be a harmless no-op. Requires `render: true`.
script	string	JavaScript to execute after page load. Result returned in `script_result`. See JS Execution.
warmup	boolean	Prime the session by loading the site's homepage before the target URL. Reduces the chance of blocks on deep links.
render	boolean	Enable JavaScript rendering. Use for JS-rendered SPAs (React, Vue, Angular) where you need the full DOM. When omitted, Sessemi automatically chooses the fastest method for each request.
timeout	integer	Request timeout in seconds. Default: 90. Challenge solving on protected sites can take 15–30 seconds — setting this too low may cause solve attempts to fail. For JS-heavy or protected sites, keep the default or increase it.
retry	integer	Number of automatic retries on failure. Max: 5. See Retries.
retry_on	string[]	Failure types that trigger retry: `server_error`, `challenge_timeout`, `navigate_failed`, `empty_page`
stealth	boolean	The recommended flag for protected sites. Automatically selects residential proxy, challenge solving, and IP rotation. One toggle, maximum success rate. Credit cost depends on your plan — see pricing. Explicit user values for `pool`, `country`, etc. are not overridden — stealth only fills in defaults.
block_resources	boolean	Block images, fonts, media, and tracker scripts during JS rendering. Dramatically reduces page load time (5–10×) on heavy sites. DOM stays intact — `<img src>` attributes are preserved. If `screenshot: true`, images stay loaded. Fonts, media, and trackers are always blocked.
headers	object	Advanced. Custom HTTP headers as key-value pairs (e.g. `{"Referer": "https://example.com"}`). Overrides default headers on the initial request. `Host` and `Connection` cannot be overridden. Use with care — overriding headers like `Accept-Language` can reduce success rates if they conflict with the proxy country.
exclude_cookies	string[]	Advanced. Cookie names that should NOT be captured into the session's stored state. The origin's `Set-Cookie` for these names is still returned in `cookies`, it just isn't persisted for replay on subsequent requests in the same `session`. Use when the origin sets a cookie you don't want the session to remember — e.g. a geo-default selector that would override an earlier explicit choice. Requires `session`.
pin_cookies	string[]	Advanced. Cookie names to lock to their first captured non-empty value for the session's lifetime. Once pinned, a pinned cookie is immune to overwrite and to clearing by subsequent responses. Use for values your client authoritatively controls (a locale you injected, a tenant ID, a configuration toggle). Do not pin values the server can legitimately invalidate (auth tokens, session IDs, server-managed selectors) — pinning those can mask state drift and produce silently wrong responses. Requires `session`.

Response Fields

Field	Type	Description
success	boolean	Whether the scrape succeeded.
url	string	The requested URL.
resolved_url	string	Final URL after redirects.
html	string	Full rendered HTML of the page.
html_size	integer	Size of HTML in bytes.
status_code	integer	HTTP status code of the page.
challenge_type	string	Challenge result: `clear`, `solved`, `timeout`, `needs_human`
challenge_provider	string	Detected provider: `cloudflare`, `akamai`, `datadome`, `none`
duration_ms	integer	Total request duration in milliseconds.
cookies	array	Cookies the origin set on this response (its `Set-Cookie` headers, as `name`, `value`, `domain`, `path`). One-shot — describes this response only.
site_cookies	array	Snapshot of the cookies currently stored for this `session` + domain after capture from this response. Same shape as `cookies`. Empty when no `session` is set or nothing is stored. Excludes anti-bot cookies (they don't enter the session store). Use this to detect when a value you expected has been cleared by the origin, e.g. to trigger a re-establish step in your client.
screenshot	string	Base64-encoded PNG. Only present if requested.
script_result	any	Return value of custom JS. Only present if `script` was provided.
failure_type	string	On failure: `server_error`, `challenge_timeout`, `challenge_unsolved`, `navigate_failed`, `blocked`, `burned`
wait_for_match	string	Which wait condition matched: `css`, `js`, `timeout`
pool	string	Proxy pool used: `datacenter` or `residential`
error	string	Error message on failure.
warning	string	Non-fatal advisory. Present when the request succeeded but something may need attention (e.g. datacenter solve).
solved	boolean	Whether an anti-bot challenge was detected and successfully solved. Only `true` when `challenge_type` is `"solved"`.
credits_charged	integer	Credits consumed by this request.
credits_remaining	integer	Credits left in your billing cycle.
json	string	Response body when Content-Type is `application/json`. Present instead of `html` for JSON API endpoints.
response_headers	object	HTTP response headers from the target. Available on direct HTTP requests; omitted when JS rendering is used.
stealth	boolean	`true` when stealth mode was active for this request.
queued_ms	integer	Time spent waiting in the queue, in milliseconds. `0` when resources were immediately available.
retry_count	integer	Number of retries that were performed. `0` if the first attempt succeeded.
user_agent	string	The User-Agent string used for this request.
challenge_details	string	Additional challenge info, e.g. `type=slider`, `type=managed`.

Async / Batch

For long-running scrapes or batch jobs, use async mode. Submit requests with ?async=true — the API returns immediately with a task ID. Poll GET /tasks/{id} for results.

This eliminates HTTP timeout issues and enables batch scraping: submit multiple URLs and collect results as they finish. All parameters (stealth, country, retry, etc.) work identically.

Submit async request

curl -X POST "https://api.sessemi.com/scrape?async=true" \
  -H "X-API-Key: your_api_key" \
  -H "Content-Type: application/json" \
  -d '{
    "url": "https://example.com",
    "stealth": true
  }'bash

Response (HTTP 202):

{
  "task_id": "049ebae94d32d35a3956cfc9167fce5b",
  "status": "queued",
  "poll": "/tasks/049ebae94d32d35a3956cfc9167fce5b"
}json

Poll for results

curl https://api.sessemi.com/tasks/049ebae94d32d35a3956cfc9167fce5b \
  -H "X-API-Key: your_api_key"bash

Response when complete:

{
  "task_id": "049ebae94d32d35a3956cfc9167fce5b",
  "status": "done",
  "created_at": "2026-03-24T20:45:00Z",
  "started_at": "2026-03-24T20:45:00Z",
  "completed_at": "2026-03-24T20:45:02Z",
  "http_status": 200,
  "result": {
    "success": true,
    "url": "https://example.com",
    "html": "<!doctype html>...",
    "html_size": 252360,
    ...
  }
}json

List recent tasks

curl https://api.sessemi.com/tasks \
  -H "X-API-Key: your_api_key"bash

Task lifecycle

Status	Description
`queued`	Task accepted, waiting for capacity
`running`	Scrape in progress
`done`	Scrape completed successfully. Result in `result` field.
`failed`	Scrape failed (blocked, timeout, etc.). Error details in `result` field.

Notes

Tasks are stored server-side — results survive page reloads and client disconnects. Poll from any client.
Each task has a 5-minute execution budget. If the scrape (including retries) doesn't complete within 5 minutes, the task is marked failed.
Completed tasks are automatically deleted after 1 hour. Poll promptly or store results on your end.
?async=true is not compatible with session (sessions require a persistent connection).
Billing works identically — credits are charged when the scrape completes.

Batch pattern: For large jobs, submit all URLs with ?async=true, then poll GET /tasks periodically to collect results. Each URL runs as an independent task with its own 5-minute budget.

POST /screenshot

Navigate to a URL and return a full-page PNG screenshot. Returns raw image/png bytes, not JSON.

Parameter	Type	Description
url required	string	URL to screenshot.
timeout	integer	Timeout in seconds. Default: 30.

curl -X POST https://api.sessemi.com/screenshot \
  -H "X-API-Key: your_api_key" \
  -H "Content-Type: application/json" \
  -d '{"url": "https://example.com"}' \
  -o screenshot.pngbash

Sessions

Sessions are available on all plans. Each request through a session consumes credits like any other request — higher-tier plans include more credits, making sessions more economical for high-volume use.

Sessions let you make multiple requests that share the same environment — cookies, localStorage, and JS state carry across requests. Pass any string as the session parameter and it's created automatically on first use. Sessions expire after 5 minutes of inactivity, or 10 minutes total — whichever comes first.

Do you need sessions? Probably not. Anti-bot bypass, proxy management, and IP rotation are all automatic — no session parameter needed. Sessions are for authenticated scraping (login via JS → scrape protected pages) and multi-step interactions (click "Load more" → scrape expanded content). Pagination, crawling, and batch scraping don't need sessions.

{
  "url": "https://example.com/page-1",
  "session": "my-crawl-123"
}json

All subsequent requests with "session": "my-crawl-123" share the same cookies, IP, and fingerprint. No setup or teardown needed.

Cookies in a session

Sessemi follows browser-faithful cookie semantics by default. When the origin sets a cookie, it's stored for replay on subsequent requests in the same session. When the origin clears a cookie (an empty Set-Cookie or one with a past expiry), the stored value is cleared. Last write wins on the same name. This is what a browser would do — and it matters, because applications scraping multi-step flows often rely on detecting cookie disappearance as a signal that server-side state has changed.

Every response with a session carries a site_cookies field — the current snapshot of what the session is storing for that domain. Use it to detect drift without polling anything:

{
  "success": true,
  "session": "my-crawl-123",
  "site_cookies": [
    { "name": "locale", "value": "fr_FR" },
    { "name": "store_id", "value": "840098" }
  ]
}json

Anti-bot cookies (cf_clearance, _abck, datadome) are managed internally and never appear in site_cookies.

Overriding cookie behavior

Two opt-in request fields let you depart from browser semantics when scraping calls for it. Default semantics work for most cases — reach for these only when you understand why.

exclude_cookies — names you don't want persisted. The origin's Set-Cookie for those names is dropped before it reaches the session store. Useful when the origin sets a geo-default value on every response and you don't want the session to "remember" it.

{
  "url": "https://example.com/products",
  "session": "my-crawl-123",
  "exclude_cookies": ["store_id"]
}json

pin_cookies — names whose first captured non-empty value should be locked for the session's lifetime. Once pinned, a pinned cookie is immune to overwrite and to clearing by subsequent responses. The session keeps replaying the original value.

{
  "url": "https://example.com/set-locale/fr_FR",
  "session": "my-crawl-123",
  "pin_cookies": ["locale"]
}json

Pin only what you authoritatively control. Pin is the right primitive for values your client injected and the server is not authoritative over — locales, tenant IDs, configuration toggles. Pinning a value the server can legitimately invalidate (auth tokens, server-managed session IDs, store selectors that the server may rotate) can mask state drift: the session keeps presenting the pinned value, but the server has moved on, and the responses you parse may be silently wrong. If you're not sure, don't pin — implement re-establishment in your client instead, using site_cookies on each response to detect when a value you depend on has been cleared.

GET /me

Returns your account info: tier, credits remaining, and limits.

curl https://api.sessemi.com/me \
  -H "X-API-Key: your_api_key"bash

Challenge Solving

When a site presents an anti-bot challenge (Cloudflare Turnstile, DataDome slider), Sessemi detects and solves it automatically. Your response comes back with success: true and the page content — the challenge is handled transparently.

Solving is optimized for batches. When you scrape multiple pages from the same domain, the first challenge may take longer to solve, but subsequent pages are typically much faster.

Automatic IP rotation

Some anti-bot providers block requests based on IP reputation rather than the challenge itself. When this happens, Sessemi automatically rotates to a fresh IP and retries — up to 3 different IPs per request. This is invisible to you; a successful response simply takes a few seconds longer. If all IPs are exhausted, the response returns blocked — try a different country.

Enabling solve

Solving is enabled by default on residential proxy requests. You can also enable it on datacenter with solve: true.

{
  "url": "https://protected-site.com",
  "pool": "residential"
}json

Which pool for solving? Residential proxies have the highest success rate for challenge solving. Datacenter proxies with solve: true work but are more likely to be flagged by anti-bot providers. Use datacenter + solve as a budget option (3 credits vs 5) for lighter protection.

To disable solving on residential (e.g. for sites with no anti-bot protection that just need a residential IP), set solve: false explicitly. Challenges will be returned as challenge_unsolved. Residential without solve costs 3 credits instead of 5.

Supported providers

Provider	Challenge Types	Method
Cloudflare	JS Challenge, Managed Challenge, Turnstile	JavaScript execution + proof-of-work computation
DataDome	Device Check, Slider CAPTCHA	Device attestation + interactive challenge

The response includes challenge_provider and challenge_type so you can see what happened.

For a per-vendor breakdown of what's covered and how each provider is handled, see the vendor coverage guide.

Fingerprint Management

Every request uses a realistic device fingerprint matched to real-world configurations. This is always on. No configuration needed.

Proxies

Requests are routed through our proxy pools automatically based on the pool parameter. Datacenter is the default — fast and cheap. Use pool: "residential" when sites block datacenter IPs. Sessemi manages all proxy infrastructure: session stickiness, geo-targeting, and rotation.

Wait Conditions

Wait for dynamic content to render before returning HTML.

CSS Selector

Wait until a CSS selector is present in the DOM. Use comma-separated selectors for OR logic — the first match wins.

{
  "url": "https://example.com/products",
  "wait_for": ".product-card, .no-results",
  "wait_timeout": 15
}json

JavaScript Expression

Wait until a JS expression returns a truthy value. Checked every 200ms.

{
  "url": "https://example.com/app",
  "wait_for_js": "window.__DATA__ && window.__DATA__.products.length > 0"
}json

Scraping Tips

Heavy Pages

Some pages (news portals, ad-heavy homepages) load dozens of scripts and trackers. Through residential proxies with higher latency, this can cause timeouts. Use wait_for to grab the content you need as soon as it appears, without waiting for every ad script to finish:

{
  "url": "https://heavy-site.com",
  "pool": "residential",
  "country": "JP",
  "wait_for": "#main-content",
  "wait_timeout": 15
}json

Lazy-Loaded Images

Many sites defer image loading until the user scrolls. These images often use data-src instead of src until they enter the viewport. If you need the real image URLs, extract from data-src (or data-lazy-src, data-original) in your parsing logic, or trigger lazy loading with a script:

{
  "url": "https://example.com/products",
  "script": "window.scrollTo(0, document.body.scrollHeight); return true",
  "wait_for": ".product-card img[src*='cdn']",
  "wait_timeout": 10
}json

JS-Rendered Content (SPAs)

Sites built with React, Vue, or Angular render content with JavaScript after the initial page load. The HTML shell loads quickly but the actual products/listings appear later. Always use wait_for or wait_for_js for these sites:

{
  "url": "https://spa-site.com/catalog",
  "wait_for_js": "document.querySelectorAll('.product-card').length >= 10",
  "wait_timeout": 15
}json

Tip: Server-rendered sites (most traditional e-commerce, news, and catalog pages) return full HTML immediately — all element attributes (href, src, etc.) are present without needing wait_for. Use it only when you see empty or placeholder content in your results.

JS Execution

Run custom JavaScript after the page loads. The return value is included in script_result.

{
  "url": "https://example.com/products",
  "script": "return [...document.querySelectorAll('.price')].map(e => e.textContent)"
}json

Script-only mode: Send script + session without a url to run JS on the current page of an existing session. Useful for pagination, clicking "load more", or extracting data after interactions.

Fetching a same-site API: Send url + script where the script fetch()es a JSON endpoint on the same site as the loaded page, and return the parsed JSON in script_result. Handy for pulling prices, availability, or other data served by a site's own API.

Retries

Set retry to automatically retry on connection failures and server errors. Each retry uses a fresh proxy. Max: 5.

{
  "url": "https://example.com",
  "retry": 2,
  "retry_on": ["server_error", "navigate_failed"]
}json

Default retry_on (when retries are set but types aren't specified): server_error, challenge_timeout, navigate_failed, empty_page.

Anti-bot retries are automatic. When a site blocks a request based on IP reputation, Sessemi rotates to a fresh IP and retries internally — you don't need to configure retry for this. The retry param is for connection and server errors. If all internal attempts fail, the response returns blocked — try a different country.

Credit note: Only successful attempts are billed. Failed retries are free (see Scrape Failed Protection below).

Credit Pricing

Credits are charged based on the proxy pool and features you select.

Configuration	Credits
Datacenter	1
Datacenter + solve	3
Residential	3
Residential + solve	5
Screenshot addon	+1

Simple pricing: Datacenter is the default — 1 credit per request on any plan. Add pool: "residential" for premium IPs with challenge solving enabled by default. Add solve: false on residential to skip solving and pay just the residential base rate. Failed requests are free.

Scrape Failed Protection

We only charge for successful requests. If a scrape fails, you are not billed — regardless of the proxy pool or features used.

Outcome	Failure Type	Billed?	Why
Success (200, 404)	—	Yes	You got the content you requested.
Blocked	`blocked`	No	Target site rejected the request. You got nothing useful.
Challenge timeout	`challenge_timeout`	No	Anti-bot challenge was not solved in time.
Challenge unsolved	`challenge_unsolved`	No	Anti-bot challenge detected but solving was not enabled. Add `solve: true` or use `pool: "residential"`.
Navigate failed	`navigate_failed`	No	Page could not be loaded (DNS, timeout, crash).
Server error	`server_error`	No	Target returned HTTP 5xx. Not our fault or yours.
Empty page	`empty_page`	No	Page loaded but returned no usable content.
Session burned	`burned`	No	Request fingerprint was flagged. Automatically retried with a fresh identity.

Fairness Policy

To prevent abuse, we monitor failure rates per account. If your failure rate exceeds 30% over a rolling 1-hour window (minimum 20 requests), the Scrape Failed Protection is temporarily disabled and all requests are billed — including failures.

This policy exists to prevent intentional scraping of unreachable or blocked targets to consume proxy bandwidth without cost. Normal usage is never affected.

Tip: If you're seeing a high failure rate, check that your target URLs are reachable and that you're using the right proxy pool. Residential proxies with the correct country dramatically reduce blocks on geo-restricted sites.

Error Codes

200Success.

400Bad request — missing URL or invalid parameter.

401Missing or invalid API key.

402Insufficient credits. Response includes credits_remaining.

429Concurrency limit exceeded — too many in-flight requests. Retry after the Retry-After header value (typically 1 second).

500Unexpected internal error. Retry the request.

On failure, the response includes "success": false with an error message and failure_type for programmatic handling.

Limits

Each plan has a concurrency limit — the maximum number of requests that can be in-flight at the same time. If you exceed it, the API returns 429 with a Retry-After header. Credits are the monthly usage gate.

Plan	Credits / Month	Concurrent Requests
Free	1,000	2
Basic ($20/mo)	20,000	20
Pro ($100/mo)	100,000	50

Need more? Higher credit volumes and concurrency available on request. Contact us.