How to Run Headless Browsers in the Cloud for Web Scraping

How to Run Headless Browsers in the Cloud for Web Scraping

Jan 17, 2026

Jan 17, 2026

/

San Francisco

/

Dane Wilson

Nikola Balic

Nikola Balic

How to Run Headless Browsers in the Cloud for Web Scraping

When your scraping needs move beyond a single machine (more concurrency, higher uptime, fewer flaky runs), headless browsers become an infrastructure problem.

Steel.dev solves that with managed cloud browser sessions. Create a session via the Steel API, connect Playwright or Puppeteer over CDP, and run your existing automation against a remote Chrome instance that Steel hosts, scales, and maintains.

Here's how cloud headless browser scraping works, what your options are, and where the real trade-offs lie.

What "headless browser scraping in the cloud" actually means

A headless browser runs Chrome without a visible window. You script it to navigate pages, execute JavaScript, wait for dynamic content, extract data, and close.

Locally, this works fine at low volume. The problems start with concurrency, uptime, and reliability at scale.

Cloud headless browser services give you a remote Chrome instance accessible via API. Your code connects to it, drives it exactly like a local browser, and the provider manages the machine, process lifecycle, and session state.

Why not just run Playwright in a Docker container yourself?

You can. Many teams do.

The hidden cost is everything around the browser: Chrome crashes, memory growth over long sessions, cookie persistence, proxy rotation, IP reputation, CAPTCHA interruptions, and debugging failures at 2 a.m. when a production job stalls.

That overhead compounds as volume grows. Cloud browser services exist so you can write scraping logic instead of browser infrastructure.

How Steel.dev works

Steel's model: create a session → connect your framework → run your code → release the session.

import { chromium } from "playwright";
import { Steel } from "steel-sdk";

const steel = new Steel({ apiKey: process.env.STEEL_API_KEY });

// Create a cloud browser session
const session = await steel.sessions.create({
  useProxy: true,       // route through managed proxies
  solveCaptcha: true,   // handle CAPTCHA challenges
});

// Connect Playwright to the remote Chrome instance
const browser = await chromium.connectOverCDP(
  `wss://connect.steel.dev?apiKey=${process.env.STEEL_API_KEY}&sessionId=${session.id}`
);
const page = browser.contexts()[0].pages()[0];

// Your scraping code runs unchanged
await page.goto("https://example.com/products");
await page.waitForSelector(".product-card");
const items = await page.locator(".product-card").allTextContents();

await browser.close();
await steel.sessions.release(session.id);
import { chromium } from "playwright";
import { Steel } from "steel-sdk";

const steel = new Steel({ apiKey: process.env.STEEL_API_KEY });

// Create a cloud browser session
const session = await steel.sessions.create({
  useProxy: true,       // route through managed proxies
  solveCaptcha: true,   // handle CAPTCHA challenges
});

// Connect Playwright to the remote Chrome instance
const browser = await chromium.connectOverCDP(
  `wss://connect.steel.dev?apiKey=${process.env.STEEL_API_KEY}&sessionId=${session.id}`
);
const page = browser.contexts()[0].pages()[0];

// Your scraping code runs unchanged
await page.goto("https://example.com/products");
await page.waitForSelector(".product-card");
const items = await page.locator(".product-card").allTextContents();

await browser.close();
await steel.sessions.release(session.id);
import { chromium } from "playwright";
import { Steel } from "steel-sdk";

const steel = new Steel({ apiKey: process.env.STEEL_API_KEY });

// Create a cloud browser session
const session = await steel.sessions.create({
  useProxy: true,       // route through managed proxies
  solveCaptcha: true,   // handle CAPTCHA challenges
});

// Connect Playwright to the remote Chrome instance
const browser = await chromium.connectOverCDP(
  `wss://connect.steel.dev?apiKey=${process.env.STEEL_API_KEY}&sessionId=${session.id}`
);
const page = browser.contexts()[0].pages()[0];

// Your scraping code runs unchanged
await page.goto("https://example.com/products");
await page.waitForSelector(".product-card");
const items = await page.locator(".product-card").allTextContents();

await browser.close();
await steel.sessions.release(session.id);
import { chromium } from "playwright";
import { Steel } from "steel-sdk";

const steel = new Steel({ apiKey: process.env.STEEL_API_KEY });

// Create a cloud browser session
const session = await steel.sessions.create({
  useProxy: true,       // route through managed proxies
  solveCaptcha: true,   // handle CAPTCHA challenges
});

// Connect Playwright to the remote Chrome instance
const browser = await chromium.connectOverCDP(
  `wss://connect.steel.dev?apiKey=${process.env.STEEL_API_KEY}&sessionId=${session.id}`
);
const page = browser.contexts()[0].pages()[0];

// Your scraping code runs unchanged
await page.goto("https://example.com/products");
await page.waitForSelector(".product-card");
const items = await page.locator(".product-card").allTextContents();

await browser.close();
await steel.sessions.release(session.id);

The same pattern works for Puppeteer and Selenium. Steel provides the remote browser endpoint; your automation framework talks to it over CDP.

Session state and persistence

Steel sessions preserve cookies, local storage, and auth context between runs, which is critical for scraping workflows that require login.

// Reuse a saved authentication context
const session = await steel.sessions.create({
  sessionContext: savedContext,  // restore cookies + storage
  useProxy: true,
});
// Reuse a saved authentication context
const session = await steel.sessions.create({
  sessionContext: savedContext,  // restore cookies + storage
  useProxy: true,
});
// Reuse a saved authentication context
const session = await steel.sessions.create({
  sessionContext: savedContext,  // restore cookies + storage
  useProxy: true,
});
// Reuse a saved authentication context
const session = await steel.sessions.create({
  sessionContext: savedContext,  // restore cookies + storage
  useProxy: true,
});

Authenticate once, save the context, reuse it on future runs without logging in again.

Concurrency

Steel handles session isolation at scale. Run hundreds of concurrent browser sessions without managing Chrome pools.

const targets = ["https://site-a.com", "https://site-b.com", "https://site-c.com"];

const results = await Promise.all(
  targets.map(async (url) => {
    const session = await steel.sessions.create({ useProxy: true });
    const browser = await chromium.connectOverCDP(
      `wss://connect.steel.dev?apiKey=${process.env.STEEL_API_KEY}&sessionId=${session.id}`
    );
    const page = browser.contexts()[0].pages()[0];
    await page.goto(url);
    const data = await page.locator("h1").innerText();
    await browser.close();
    await steel.sessions.release(session.id);
    return { url, data };
  })
);
const targets = ["https://site-a.com", "https://site-b.com", "https://site-c.com"];

const results = await Promise.all(
  targets.map(async (url) => {
    const session = await steel.sessions.create({ useProxy: true });
    const browser = await chromium.connectOverCDP(
      `wss://connect.steel.dev?apiKey=${process.env.STEEL_API_KEY}&sessionId=${session.id}`
    );
    const page = browser.contexts()[0].pages()[0];
    await page.goto(url);
    const data = await page.locator("h1").innerText();
    await browser.close();
    await steel.sessions.release(session.id);
    return { url, data };
  })
);
const targets = ["https://site-a.com", "https://site-b.com", "https://site-c.com"];

const results = await Promise.all(
  targets.map(async (url) => {
    const session = await steel.sessions.create({ useProxy: true });
    const browser = await chromium.connectOverCDP(
      `wss://connect.steel.dev?apiKey=${process.env.STEEL_API_KEY}&sessionId=${session.id}`
    );
    const page = browser.contexts()[0].pages()[0];
    await page.goto(url);
    const data = await page.locator("h1").innerText();
    await browser.close();
    await steel.sessions.release(session.id);
    return { url, data };
  })
);
const targets = ["https://site-a.com", "https://site-b.com", "https://site-c.com"];

const results = await Promise.all(
  targets.map(async (url) => {
    const session = await steel.sessions.create({ useProxy: true });
    const browser = await chromium.connectOverCDP(
      `wss://connect.steel.dev?apiKey=${process.env.STEEL_API_KEY}&sessionId=${session.id}`
    );
    const page = browser.contexts()[0].pages()[0];
    await page.goto(url);
    const data = await page.locator("h1").innerText();
    await browser.close();
    await steel.sessions.release(session.id);
    return { url, data };
  })
);

Each session is isolated. Proxies, cookies, and state never bleed between sessions.

Comparison: Steel.dev vs alternatives


Steel.dev

Browserless.io

Apify

ScrapingBee

Self-hosted Playwright

Framework support

Playwright, Puppeteer, Selenium (CDP)

Playwright, Puppeteer, Selenium

Playwright, Puppeteer

REST API only

Any

Session persistence

Yes (cookies, storage, auth context)

Partial

Partial (dataset storage)

No

You build it

Managed proxies

Yes (residential, BYO)

Yes

Yes

Yes

BYO

CAPTCHA solving

Yes (built-in)

Add-on

Add-on

Yes

BYO

Concurrency

100+ (cloud plan)

Based on plan

Based on plan

Based on plan

Limited by your hardware

Open source

Yes (self-host available)

No

No

No

Yes

Anti-bot stealth

Yes (advanced fingerprint config)

Partial

Partial

Yes

You configure

Mobile-mode sessions

Yes

Limited

Limited

No

Yes

Pricing model

Sessions-based

Minutes-based

Usage-based

Request-based

Infrastructure cost

When to use Steel.dev

When you need a CDP-compatible remote browser where Playwright or Puppeteer code works unchanged, session persistence matters (authenticated scraping, multi-step flows), and you want concurrency without managing Chrome infra.

When to use Browserless.io

A reasonable alternative if you're already invested in their API surface and don't need session context reuse across runs.

When to use Apify

Best if you want a full pipeline platform with storage, scheduling, and an actor marketplace. More opinionated, harder to use with custom frameworks.

When to use ScrapingBee

Right when you don't need a real browser connection, just rendered HTML back from a REST call. Simpler, but no framework control and lower flexibility.

When to self-host

The right call for low-volume scraping where you have operational capacity and want full control. Steel's open-source browser can be self-hosted too, giving you the same API locally before migrating to managed cloud when volume demands it.

Use cases

Cloud headless browser scraping fits when the target demands a real browser:

  • JavaScript-rendered pages: React, Vue, Angular apps where requests returns an empty shell

  • Authenticated scraping: dashboards and paywalled content requiring login state

  • Dynamic e-commerce: price monitoring, inventory tracking, catalog aggregation

  • Multi-step form automation: login, navigate, extract, chained together

  • Large-scale concurrent crawling: hundreds of parallel sessions without managing Chrome pools

  • Anti-bot site access: residential proxies plus stealth configuration for protected targets

  • AI agent web browsing: giving LLM agents a persistent cloud browser for multi-step research

  • Headless Chrome as a service: running Chrome in the cloud without server management

Honest limitations

Detection isn't just infrastructure. Managed proxies improve IP reputation, and stealth config reduces fingerprint surface, but traffic patterns, timing, and behavioral signals still factor in. Steel addresses the infrastructure side; your code's behavior is still your responsibility.

CAPTCHA solving isn't universal. It improves reliability but doesn't guarantee success for every provider or challenge type.

Sessions can expire. Session persistence is powerful for authenticated flows but adds complexity. Targets can force logout. Plan recovery paths.

Use APIs when they exist. If your target has a stable public API, use it. Browser scraping carries higher latency and maintenance cost. Reach for it when the rendered DOM is the only reliable source.

Getting started

FAQ

How can I run headless browsers in the cloud for web scraping?

Steel.dev provides managed cloud browser sessions. Create a session via the API, connect Playwright or Puppeteer over CDP, and run your existing scraping code against the remote Chrome instance. Steel handles session lifecycle, proxy routing, anti-bot configuration, and CAPTCHA solving. Start at steel.dev or self-host from github.com/steel-dev/steel-browser.

Can I use my existing Playwright or Puppeteer code with a cloud browser?

Yes. Steel provides a WebSocket CDP endpoint. chromium.connectOverCDP() in Playwright and puppeteer.connect() both work unchanged. Point your code at the Steel session URL and it runs on a remote Chrome instance. No framework rewrite required.

How many concurrent headless browser sessions can I run in the cloud?

Steel Cloud supports 100+ concurrent sessions depending on plan. Each session has its own Chrome context, cookies, and proxy routing. Contact steel.dev for enterprise concurrency limits.

What's the difference between a headless browser API and a web scraping API?

A headless browser API gives you a real browser with full CDP access. You write Playwright or Puppeteer code that drives it. A web scraping API takes a URL and returns rendered HTML. Simpler, but no browser control. Use a headless browser API when you need multi-step interaction, authentication, or fine-grained DOM access.

Does Steel work for scraping sites with anti-bot protection?

Steel includes stealth configuration, managed residential proxies, and CAPTCHA solving. These reduce detection surface but don't guarantee access to every site. Anti-bot systems are multi-layered: IP reputation, browser fingerprint, and behavioral signals all factor in. Steel handles the infrastructure layer; your scraping code's behavioral patterns remain your responsibility.

Ready to

Build with Steel?

Ready to

Build with Steel?

Ready to

Build with Steel?

Ready to Build with Steel?

A better way to take your LLMs online.

© Steel · Inc. 2025.

All Systems Operational

Platform

Join the community