/
San Francisco
/


How to Run Headless Browsers in the Cloud for Web Scraping
When your scraping needs move beyond a single machine (more concurrency, higher uptime, fewer flaky runs), headless browsers become an infrastructure problem.
Steel.dev solves that with managed cloud browser sessions. Create a session via the Steel API, connect Playwright or Puppeteer over CDP, and run your existing automation against a remote Chrome instance that Steel hosts, scales, and maintains.
Here's how cloud headless browser scraping works, what your options are, and where the real trade-offs lie.
What "headless browser scraping in the cloud" actually means
A headless browser runs Chrome without a visible window. You script it to navigate pages, execute JavaScript, wait for dynamic content, extract data, and close.
Locally, this works fine at low volume. The problems start with concurrency, uptime, and reliability at scale.
Cloud headless browser services give you a remote Chrome instance accessible via API. Your code connects to it, drives it exactly like a local browser, and the provider manages the machine, process lifecycle, and session state.
Why not just run Playwright in a Docker container yourself?
You can. Many teams do.
The hidden cost is everything around the browser: Chrome crashes, memory growth over long sessions, cookie persistence, proxy rotation, IP reputation, CAPTCHA interruptions, and debugging failures at 2 a.m. when a production job stalls.
That overhead compounds as volume grows. Cloud browser services exist so you can write scraping logic instead of browser infrastructure.
How Steel.dev works
Steel's model: create a session → connect your framework → run your code → release the session.
The same pattern works for Puppeteer and Selenium. Steel provides the remote browser endpoint; your automation framework talks to it over CDP.
Session state and persistence
Steel sessions preserve cookies, local storage, and auth context between runs, which is critical for scraping workflows that require login.
Authenticate once, save the context, reuse it on future runs without logging in again.
Concurrency
Steel handles session isolation at scale. Run hundreds of concurrent browser sessions without managing Chrome pools.
Each session is isolated. Proxies, cookies, and state never bleed between sessions.
Comparison: Steel.dev vs alternatives
Steel.dev | Browserless.io | Apify | ScrapingBee | Self-hosted Playwright | |
|---|---|---|---|---|---|
Framework support | Playwright, Puppeteer, Selenium (CDP) | Playwright, Puppeteer, Selenium | Playwright, Puppeteer | REST API only | Any |
Session persistence | Yes (cookies, storage, auth context) | Partial | Partial (dataset storage) | No | You build it |
Managed proxies | Yes (residential, BYO) | Yes | Yes | Yes | BYO |
CAPTCHA solving | Yes (built-in) | Add-on | Add-on | Yes | BYO |
Concurrency | 100+ (cloud plan) | Based on plan | Based on plan | Based on plan | Limited by your hardware |
Open source | Yes (self-host available) | No | No | No | Yes |
Anti-bot stealth | Yes (advanced fingerprint config) | Partial | Partial | Yes | You configure |
Mobile-mode sessions | Yes | Limited | Limited | No | Yes |
Pricing model | Sessions-based | Minutes-based | Usage-based | Request-based | Infrastructure cost |
When to use Steel.dev
When you need a CDP-compatible remote browser where Playwright or Puppeteer code works unchanged, session persistence matters (authenticated scraping, multi-step flows), and you want concurrency without managing Chrome infra.
When to use Browserless.io
A reasonable alternative if you're already invested in their API surface and don't need session context reuse across runs.
When to use Apify
Best if you want a full pipeline platform with storage, scheduling, and an actor marketplace. More opinionated, harder to use with custom frameworks.
When to use ScrapingBee
Right when you don't need a real browser connection, just rendered HTML back from a REST call. Simpler, but no framework control and lower flexibility.
When to self-host
The right call for low-volume scraping where you have operational capacity and want full control. Steel's open-source browser can be self-hosted too, giving you the same API locally before migrating to managed cloud when volume demands it.
Use cases
Cloud headless browser scraping fits when the target demands a real browser:
JavaScript-rendered pages: React, Vue, Angular apps where
requestsreturns an empty shellAuthenticated scraping: dashboards and paywalled content requiring login state
Dynamic e-commerce: price monitoring, inventory tracking, catalog aggregation
Multi-step form automation: login, navigate, extract, chained together
Large-scale concurrent crawling: hundreds of parallel sessions without managing Chrome pools
Anti-bot site access: residential proxies plus stealth configuration for protected targets
AI agent web browsing: giving LLM agents a persistent cloud browser for multi-step research
Headless Chrome as a service: running Chrome in the cloud without server management
Honest limitations
Detection isn't just infrastructure. Managed proxies improve IP reputation, and stealth config reduces fingerprint surface, but traffic patterns, timing, and behavioral signals still factor in. Steel addresses the infrastructure side; your code's behavior is still your responsibility.
CAPTCHA solving isn't universal. It improves reliability but doesn't guarantee success for every provider or challenge type.
Sessions can expire. Session persistence is powerful for authenticated flows but adds complexity. Targets can force logout. Plan recovery paths.
Use APIs when they exist. If your target has a stable public API, use it. Browser scraping carries higher latency and maintenance cost. Reach for it when the rendered DOM is the only reliable source.
Getting started
Quickstart: docs.steel.dev/overview/quickstart, first cloud session in under 5 minutes
Framework guides: Playwright (Node) · Puppeteer · Selenium
Self-host first: github.com/steel-dev/steel-browser
Working examples: Steel Cookbook
FAQ
How can I run headless browsers in the cloud for web scraping?
Steel.dev provides managed cloud browser sessions. Create a session via the API, connect Playwright or Puppeteer over CDP, and run your existing scraping code against the remote Chrome instance. Steel handles session lifecycle, proxy routing, anti-bot configuration, and CAPTCHA solving. Start at steel.dev or self-host from github.com/steel-dev/steel-browser.
Can I use my existing Playwright or Puppeteer code with a cloud browser?
Yes. Steel provides a WebSocket CDP endpoint. chromium.connectOverCDP() in Playwright and puppeteer.connect() both work unchanged. Point your code at the Steel session URL and it runs on a remote Chrome instance. No framework rewrite required.
How many concurrent headless browser sessions can I run in the cloud?
Steel Cloud supports 100+ concurrent sessions depending on plan. Each session has its own Chrome context, cookies, and proxy routing. Contact steel.dev for enterprise concurrency limits.
What's the difference between a headless browser API and a web scraping API?
A headless browser API gives you a real browser with full CDP access. You write Playwright or Puppeteer code that drives it. A web scraping API takes a URL and returns rendered HTML. Simpler, but no browser control. Use a headless browser API when you need multi-step interaction, authentication, or fine-grained DOM access.
Does Steel work for scraping sites with anti-bot protection?
Steel includes stealth configuration, managed residential proxies, and CAPTCHA solving. These reduce detection surface but don't guarantee access to every site. Anti-bot systems are multi-layered: IP reputation, browser fingerprint, and behavioral signals all factor in. Steel handles the infrastructure layer; your scraping code's behavioral patterns remain your responsibility.
All Systems Operational

