Eight ways to build a browser agent

May 4, 2026

San Francisco

Nikola Balic

We landed eight new starters in the Steel cookbook. Same task, same four tools, same typed output. Different agent loops.

If you've spent an afternoon picking between LangGraph and Pydantic AI and Mastra, you know the marketing pages don't help much. You want two files open side by side, the same task running through each, and a clear read on where each framework gets out of your way and where it doesn't.

That's the cookbook now.

Why one task

The task is github.com/trending/python. It runs in seconds, requires no credentials, and actually needs a browser. The page renders without JavaScript, but the structured-output step makes you exercise tools, schemas, and the agent loop end-to-end. Small enough to not waste your afternoon, real enough to expose real problems.

The scaffolding is identical, so whatever differs is the framework.

The same four tools across every starter means the prompts, schemas, and failure modes are shared too. If a model forgets to open a session on one starter, it forgets on the others.

The recipe

Four tools, same names everywhere:

open_session  -> create a Steel session, connect Playwright over CDP
navigate      -> goto(url), wait for domcontentloaded
snapshot      -> readable text + visible links, capped
extract       -> rows scraped from a CSS selector, returned as

open_session  -> create a Steel session, connect Playwright over CDP
navigate      -> goto(url), wait for domcontentloaded
snapshot      -> readable text + visible links, capped
extract       -> rows scraped from a CSS selector, returned as

open_session  -> create a Steel session, connect Playwright over CDP
navigate      -> goto(url), wait for domcontentloaded
snapshot      -> readable text + visible links, capped
extract       -> rows scraped from a CSS selector, returned as

open_session  -> create a Steel session, connect Playwright over CDP
navigate      -> goto(url), wait for domcontentloaded
snapshot      -> readable text + visible links, capped
extract       -> rows scraped from a CSS selector, returned as

The output is a Zod or Pydantic schema: summary plus an array of {name, url, stars, description}. The model loop runs until the schema validates or the step budget runs out (15 in most starters).

What landed

Python

langgraph — explicit state machine. Three nodes (agent, tools, format), conditional edges, ToolNode and tools_condition from the prebuilts. If you want to see the graph instead of trusting an SDK, this is the one.
pydantic-ai — provider-agnostic, dependency-injection style. deps_type=BrowserDeps threads the Playwright Page through every tool via RunContext.deps. The final turn is typed because output_type=FinalReport ties it to the schema.
openai-agents-py — Agent(tools=[...], output_type=FinalReport), one Runner.run() call, the SDK runs the loop. Tools are plain async functions wrapped with @function_tool; the SDK reads the docstring for the JSON schema.
claude-agent-sdk-py — the engine behind Claude Code, exposed as a library. Tools live in an in-process MCP server (create_sdk_mcp_server). tools=[] and setting_sources=[] strip the built-ins and the .claude/ discovery so the recipe runs identically everywhere.

TypeScript

mastra — wraps the Vercel AI SDK with typed tools, a model router, and the Mastra Studio playground. Same four tools, registered on an Agent, with structured output via Zod.
openai-agents-ts — @openai/agents. One quirk: Zod compiles to OpenAI's strict JSON Schema, so .optional() is rejected. Use .nullable() instead, and drop .url() — same restriction.
claude-agent-sdk-ts — @anthropic-ai/claude-agent-sdk, the Node port of the same engine. tools: [] plus settingSources: [] plus allowedTools: ["mcp__steel__*"] is the trio that turns Claude Code into a focused browser agent.
vercel-ai-sdk-ts — AI SDK v6's ToolLoopAgent. The terminator is a reportFindings tool with no execute — calling it stops the loop and the call's input is your final typed answer. This sidesteps the Anthropic-on-tools issue where forcing JSON output disables tool calling.
vercel-ai-sdk-nextjs — same loop, Next.js chat app. streamText server-side, useChat on the client, every tool call surfaces as a typed tool-* part on the message stream. A Live View iframe lights up the moment the agent opens a session, so you watch the browser in the right pane while the chat runs in the left.

Pick one and run

git clone https://github.com/steel-dev/steel-cookbook
cd steel-cookbook/examples/<your-pick>
cp .env.example .env       # STEEL_API_KEY + your model provider key

git clone https://github.com/steel-dev/steel-cookbook
cd steel-cookbook/examples/<your-pick>
cp .env.example .env       # STEEL_API_KEY + your model provider key

git clone https://github.com/steel-dev/steel-cookbook
cd steel-cookbook/examples/<your-pick>
cp .env.example .env       # STEEL_API_KEY + your model provider key

git clone https://github.com/steel-dev/steel-cookbook
cd steel-cookbook/examples/<your-pick>
cp .env.example .env       # STEEL_API_KEY + your model provider key

Python starters use uv (uv sync && uv run main.py). TypeScript starters use npm (npm install && npm start). Each one prints a Live View URL when it opens a session — paste that into another tab and watch the agent click around.

Try it

We restructured the whole catalog while we were in there. Basics, AI Agents, and Advanced Features, with separate views for TypeScript and Python. There's also AGENTS.md at the repo root — a nav file for AI agents, so your coding assistant can find the right example without you explaining the layout.

These eight are just the framework starters. The catalog also has computer use models, Stagehand, Browser-use, and advanced features like auth context reuse and persistent profiles.