Can I run Skyvern workflows without writing any code?

Skyvern offers multiple workflow creation paths beyond code. You can build workflows through the visual workflow builder, use the MCP integration to describe goals conversationally to Claude, or leverage the Workflow Copilot's transcript-to-workflow and URL-to-workflow AI Skills. The Python SDK remains available for developers who prefer programmatic control, but non-technical operations teams regularly build and run production automations without touching code.

How does Skyvern handle workflows that require waiting for email-based verification codes?

Skyvern supports email-based OTP through forwarding rules that route verification emails to a Skyvern TOTP endpoint, typically configured via services like Zapier. The system waits up to approximately 10 minutes for code delivery, extracts the verification code, and enters it automatically. Extended API response delays of 4+ minutes can occur before the OTP request initiates, so workflows dependent on email-based 2FA should account for authentication overhead when setting completion time expectations.

What's the difference between Skyvern's code generation mode and full AI-driven execution?

After a workflow runs successfully, Skyvern compiles it into deterministic Playwright code that subsequent runs use, cutting LLM token consumption by approximately 90% and delivering 2x faster execution. When a target site changes and the compiled code fails, Skyvern automatically falls back to visual AI reasoning, adapts to the new layout, and self-updates the code. This hybrid approach balances cost and speed while maintaining self-healing capabilities.

Can Skyvern handle portals that log users out after periods of inactivity?

Skyvern workflows support 'save draft' blocks positioned between steps to persist partial progress on portals with aggressive session timeouts. When a workflow is logged out mid-process, the system reports the exact point where execution stopped using saved block status, enabling teams to check saved drafts instead of rerunning entire workflows. This capability addresses timeout-related failures on government and healthcare portals with strict session limits.

How do I integrate Skyvern workflow results with my existing systems instead of using spreadsheets?

Skyvern exposes structured JSON output via a data_extraction_schema parameter that defines the shape of extracted data, making results immediately compatible with downstream databases and ERPs. Workflows deliver results through webhook endpoints, include native Salesforce bi-directional integration, support Google Sheets read/write operations, and auto-generate dedicated API endpoints per workflow. Teams can route extraction outputs directly into practice-management systems, credentialing software, or custom internal tools without manual spreadsheet transfers.

What prevents my Skyvern workflows from triggering bot detection when running at high volume?

Skyvern includes native residential and ISP proxy infrastructure covering 20+ countries with ZIP-level geographic targeting, integrated anti-bot bypass capabilities, and concurrency throttling controls that limit parallel runs per hour. Teams can configure queue-based execution and max-runs-per-hour settings to prevent volume spikes that trigger WAF detection on high-security sites. The proxy layer and automation logic operate as one system, so geographic routing happens automatically without requiring separate infrastructure setup.

Best framework for building internal tools that interact with carrier portals in 2026?

Teams running production workflows across carrier portals need self-healing automation that adapts when sites change layouts or rename form fields, native 2FA and CAPTCHA handling for authentication flows, and the ability to run one workflow definition across dozens of portals without per-site code. Skyvern's visual approach reads pages by appearance and context at runtime, so portal redesigns do not create maintenance tickets. Code-first frameworks require ongoing selector patching and custom authentication middleware that compounds as portal count grows.

Can I use my own LLM API keys instead of Skyvern's when running workflows?

Skyvern supports deployment configurations where customers use their own API keys when interacting with LLMs rather than routing through Skyvern's keys. This architectural option gives enterprises direct control over AI provider relationships, enables teams to leverage existing enterprise AI contracts and negotiated rates, and ensures LLM usage costs and data flows remain transparent to security and compliance teams. Customer-managed keys address data governance policies requiring all AI interactions to flow through organization-approved providers.

How do I test workflow changes without affecting production automations?

Skyvern includes workflow duplication that enables you to clone production workflows for testing and experimentation in isolated copies. Version history with visual comparison shows changes between workflow iterations, and one-click rollback instantly restores previous versions if new changes cause regressions. This capability is particularly valuable for sensitive use cases like government filings, credentialing, and financial transactions where testing against production workflows carries unacceptable risk.

What's the fastest way to verify Skyvern completed a workflow successfully without logging into the portal manually?

When workflows are triggered from spreadsheets or external systems, Skyvern automatically updates screenshots in the source system to verify completion status, eliminating the need for manual portal login. Webhook integration delivers structured error codes that distinguish logout events from other failure types, reporting the exact point where execution stopped. Every run produces timestamped execution logs, screenshots, and video replays that provide audit-ready evidence of completion without requiring portal access.

Stagehand vs Skyvern: Which is Better? (June 2026)

Q: What's the main infrastructure difference between the two tools when running automation at scale?

Skyvern includes production infrastructure out of the box—isolated browser contexts per session, native proxy rotation covering 20+ countries, CAPTCHA and 2FA handling, and serverless scaling from hundreds to millions of concurrent runs. Stagehand depends on your own Playwright setup, so teams running it at scale typically build their own proxy rotation, session management, and retry logic, which adds engineering overhead over time.

Suchintan Singh

05 Jun 2026 • 12 min read

You automate a workflow once, and three weeks later the portal moves a button and your script breaks. That's the maintenance cycle everyone who's tried browser automation has lived through. Stagehand tries to soften that cycle by letting you mix stable selectors with AI-powered fallback instructions. Skyvern avoids it by reading pages visually at runtime, so layout changes don't require code updates. This Stagehand vs Skyvern breakdown covers how each handles the self-healing problem, what the infrastructure difference means at scale, and where the browser automation comparison between hybrid scripting and task-level execution matters most for teams running workflows in production. When you're weighing Skyvern vs Stagehand, the real question is whether you want to patch selectors or whether you want a system that adapts without your intervention.

TLDR:

Stagehand gives developers hybrid control over browser automation, mixing Playwright-style code with AI instructions at the step level.
Skyvern operates at the task level, reading pages visually at runtime and working through multi-step workflows without scripting.
RPA teams spend a large share of their time maintaining bots; Skyvern's visual approach adapts when portals change layouts or rename buttons.
Stagehand relies on your own Playwright setup for auth, proxies, and retry logic; Skyvern includes 2FA, CAPTCHA handling, and isolated browser contexts out of the box.
Skyvern offers a Python SDK and REST API for cross-language use; Stagehand is TypeScript-native and tightly coupled to Node.js environments.

What is Stagehand?

Stagehand is a browser automation framework built for developers who want AI-powered control without giving up the precision of code. Its hybrid design lets you mix Playwright-style selectors with plain-language AI instructions at the step level, so you decide exactly how much AI involvement each browser interaction gets. Teams that find fully autonomous agents too unpredictable for production, but find pure selector-based scripting too fragile to maintain, are the intended audience.

Key Features

Hybrid step-level control lets developers mix deterministic Playwright selectors with AI-powered instructions in the same workflow.
AI-powered element detection checks elements again when pages change, providing some resilience to minor UI updates.
TypeScript-native SDK integrates naturally with Node.js ecosystems and frontend or full-stack JavaScript teams.
Methods like act(), extract(), and observe() give developers fine-grained, code-level control over each browser interaction.
Element locator caching speeds up repeat runs on pages that stay stable between executions.

Limitations

DOM dependency means structural page changes or authentication interruptions can still break workflows mid-run despite AI fallback.
Authentication relies on Playwright's built-in session management, which can struggle with TOTP-based logins, MFA, and CAPTCHA challenges.
Teams running Stagehand at scale must build their own proxy rotation, session management, and retry logic, adding ongoing engineering overhead.
Cached locators become a liability when a page updates, reintroducing the same brittleness that caching was meant to prevent.
The TypeScript-only architecture makes cross-language use harder, limiting integration with Python-based agent frameworks or non-JavaScript backend services.

Bottom Line

Stagehand fits developer teams working in TypeScript who want precise, step-level control over automation logic and are comfortable writing and maintaining that logic themselves. Teams prototyping agent workflows or building structured automation pipelines where granular AI-versus-deterministic control supports the maintenance overhead will find it a solid match. It is not well suited for teams that need to hand off complex, multi-step workflows across dozens of portals and get reliable results without owning and patching the execution layer over time.

What is Skyvern?

Skyvern is an AI browser automation tool built to handle the web workflows that break every other automation approach. Where selector-based tools depend on stable DOM structures and brittle XPath expressions, Skyvern reads pages visually using computer vision and LLM reasoning, identifying interactive elements by their appearance and context at runtime. The core value proposition is direct: if a human can do it in a browser, Skyvern can automate it without APIs, without brittle scripts, and without breaking when websites change.

Key Features

Task-level execution accepts a plain-language goal and works through full multi-step workflows without requiring a scripting layer to maintain.
Visual page reading at runtime means layout changes, renamed buttons, and reshuffled forms do not require code updates to keep workflows running.
Native 2FA, TOTP handling, and CAPTCHA solving are built in, covering the authentication challenges that cause selector-based approaches to stall.
A residential and ISP proxy network covering 20+ countries is included alongside isolated browser contexts per session for production-grade security and anti-bot bypass.
A Python SDK and REST API make Skyvern callable from any language, with native fit for AI agent frameworks, data pipelines, and backend services.

Limitations

Visual inference at runtime uses more compute than deterministic selector execution, which can affect cost at very high step volumes.
Teams automating portals with aggressive anti-bot detection should conduct proof-of-concept testing before committing to production, as success rates vary by site.
Phone and SMS-based two-factor authentication is not supported, which blocks certain government and healthcare portals that mandate it.
The learning curve for configuring complex multi-step workflows with conditional logic can be steeper than writing straightforward Playwright scripts.
Ecosystem maturity is still developing compared to existing RPA platforms, meaning some integrations and edge-case workflows may require additional configuration.

Bottom Line

Skyvern fits operations and engineering teams running multi-step authenticated workflows across carrier portals, government sites, insurance platforms, and vendor procurement flows where selector-based tools break too often to support. Teams processing high volumes of data extraction from sites with no API, or AI agent builders who need a browser automation layer their orchestration framework can call programmatically, will find the strongest match. It is less suited for teams running simple single-site automations where the platform's full capability set exceeds their needs, or for workflows that depend entirely on SMS-based authentication that the platform does not currently support.

Looking at How Stagehand and Skyvern Tackle Common Requirements

Both of the solutions provide automation, but how they approach doing so differs. We assessed both against important categories for teams looking at automation tools:

Hybrid control vs. task-level automation
Authentication, CAPTCHA, and production infrastructure
Developer experience and language support
API-first flexibility
Self-healing, caching, and long-term maintenance

Hybrid Control vs. Task-Level Automation

Stagehand operates at the code level. You write scripts that call methods like act(), extract(), and observe(), and Stagehand uses AI to interpret those instructions against the live DOM. It gives you fine-grained control over each browser interaction, which is exactly what developers want when building structured automation pipelines or prototyping agent workflows.

Skyvern operates at the task level. You describe a goal in plain language, and Skyvern reads the page visually, reasons about what needs to happen, and works through the full workflow on its own. There's no scripting layer to maintain.

Where the Difference Shows Up in Practice

The gap between these two approaches matters most when workflows get complex. Multi-step tasks that span logins, dynamic forms, file uploads, and conditional page states require Stagehand to handle each transition explicitly in code. Skyvern handles those transitions as part of goal execution.

For teams that want developer control and are comfortable writing automation logic, Stagehand fits well. For teams that need to hand off a goal and get a result without owning the execution layer, Skyvern is the closer match.

The table below shows how each tool handles the core automation challenges that matter in production.

Feature	Stagehand	Skyvern
Automation Approach	Code-level control mixing Playwright selectors with AI instructions at each step	Task-level execution reading pages visually at runtime without scripting
Authentication & Infrastructure	Playwright session management; teams build their own proxy rotation and retry logic	Native 2FA and TOTP handling, CAPTCHA solving, proxy network across 20+ countries, isolated browser contexts per session
Language Support	TypeScript-native SDK tightly coupled to Node.js runtime	Python SDK plus REST API callable from any language
Maintenance Model	AI-powered element detection with DOM dependency; cached locators can go stale when pages change	Visual inference at execution time; no cached selectors, adapts to layout changes without code updates

Authentication, CAPTCHA, and Production Infrastructure

Stagehand handles authentication through Playwright's built-in session management, which works well for straightforward login flows but can struggle with multi-factor authentication, TOTP-based logins, and CAPTCHA challenges that require reasoning about what's on screen.

Skyvern was built with production authentication in mind from the start. It stores credentials securely and handles TOTP natively, so workflows that require two-factor authentication don't need custom middleware or workarounds. CAPTCHA handling is also built in, covering the visual and behavioral challenges that cause selector-based approaches to stall.

Infrastructure for Scale

Three infrastructure differences matter when you're running automation in production instead of in a test environment.

Skyvern runs each session in an isolated browser context, which prevents state from leaking between concurrent workflows and keeps credentials sandboxed per run. Proxy support and anti-bot bypass come out of the box, which matters when automating portals that actively detect and block scripted traffic.
Teams using Stagehand at scale typically wire in their own proxy rotation, session management, and retry logic, which adds real engineering overhead over time.

For teams running occasional scripts, that overhead is manageable. For teams running hundreds of workflows across carrier portals, insurance platforms, or government sites, the built-in production infrastructure in Skyvern removes a category of maintenance work that tends to quietly compound.

Developer Experience and Language Support

Both Stagehand and Skyvern offer TypeScript-first SDKs, though their approaches to language support diverge in important ways from there.

Stagehand is built natively in TypeScript, which makes it a natural fit for frontend and full-stack JavaScript developers. If your team already lives in a Node.js ecosystem, the integration path is short.
Skyvern, on the other hand, is Python-native. The Skyvern Python SDK covers task creation, workflow orchestration, credential management, and data extraction in a single package. For engineering teams working in data pipelines, backend services, or AI agent frameworks, that tends to be where Python already lives.

API-First Flexibility

Beyond the SDKs, Skyvern exposes a REST API that any language can call. Teams not working in Python can still trigger browser automation tasks from Go, Ruby, or JavaScript services without needing to wrap an SDK. Stagehand's architecture is more tightly coupled to the TypeScript runtime, which makes cross-language use harder to support.

There are three areas where this distinction matters most in practice:

AI agent integrations, where Python-based frameworks like LangChain or CrewAI expect Python-native tool definitions
Data engineering workflows that process extraction outputs downstream in pandas or similar libraries
Backend microservices written in non-JavaScript languages that need to call browser automation as a side effect

Self-Healing, Caching, and Workflow Maintenance

Stagehand handles selector drift through its AI-powered element detection, which checks elements again when a page changes. For simple cases, this works. But the recovery is still tied to the DOM, so structural changes or authentication interruptions can still break a workflow mid-run.

Skyvern takes a different approach. Because it reads pages visually at runtime, every step is checked fresh against the current state of the page. There are no stored selectors to go stale. If a portal redesigns its layout or swaps a button label, Skyvern re-reads the visual context and keeps going without requiring a code change.

How Caching Affects Maintenance Load

Stagehand includes caching for element locators, which speeds up repeat runs on stable pages. The tradeoff is that cached locators can become a liability when a page updates, pulling the workflow back to the same brittleness problem that caching was meant to avoid.

Skyvern does not rely on locator caching. The visual inference happens at execution time, which means there is no cache to invalidate and no stale reference to debug. For teams running workflows across dozens of carrier portals or vendor sites, that distinction matters a great deal. A single layout change on one portal does not become a maintenance ticket.

Implications for Long-Running Workflows

Maintenance burden is one of the most underestimated costs in browser automation. RPA teams spend considerable effort maintaining bots instead of building new ones. Stagehand reduces that burden compared to raw Playwright, but the DOM dependency means some level of ongoing upkeep is still expected. Skyvern's visual-first architecture pushes that number lower, since the system adapts to page changes without human intervention.

Human judgment still matters when a workflow hits a genuinely novel state, and Skyvern flags those cases for review instead of failing silently.

Code Example: Running an Authenticated Workflow with Skyvern

The example below shows how to run an authenticated, multi-step workflow using the Skyvern Python SDK. Credentials are stored once in the encrypted vault and never passed to the LLM. The task accepts a plain-language goal, handles TOTP-based 2FA automatically, and returns structured JSON output; no selectors to write or maintain.

import asyncio
from skyvern import Skyvern

# Initialize the client with your API key
client = Skyvern(api_key="YOUR_API_KEY")

async def main():
    # Store credentials once in the encrypted vault — never sent to the LLM
    credential = await client.create_credential(
        name="Carrier Portal Login",
        credential_type="password",
        credential={
            "username": "ops-user@example.com",
            "password": "your-portal-password",
        },
    )

    # Run the workflow — Skyvern reads the page visually at runtime,
    # so layout changes on the portal do not require code updates
    task = await client.run_task(
        url="https://carrier-portal.example.com",
        prompt="Log in and retrieve the latest policy quote for account #ACT-9821. "
               "COMPLETE when the quote summary is visible.",
        credential_id=credential.credential_id,   # Reference stored credentials
        totp_identifier="ops-user@example.com",   # Route 2FA codes automatically
        data_extraction_schema={
            "type": "object",
            "properties": {
                "policy_number": {"type": "string"},
                "coverage_type": {"type": "string"},
                "premium":       {"type": "number"},
                "effective_date":{"type": "string"},
            },
        },
        wait_for_completion=True,  # Block until the workflow finishes
    )

    # task.output returns clean, structured JSON ready for downstream systems
    print(task.output)

asyncio.run(main())

The credential_id keeps credentials out of prompts and logs entirely. The totp_identifier tells Skyvern where to route incoming 2FA codes. The data_extraction_schema defines the shape of the output, so the result comes back as consistent, database-ready JSON instead of raw page content.

Why Skyvern is the Better Choice

Stagehand is a solid choice if your team lives in TypeScript and wants precise control over which workflow steps use AI versus deterministic code. For narrow, developer-owned automations where that tradeoff is worth maintaining, it holds up.

For most teams, though, Skyvern removes the overhead that makes Stagehand hard to scale. Built-in 2FA and CAPTCHA solving, native Bitwarden credential integration, a residential proxy network covering more than 20 countries, and serverless scaling from hundreds to millions of concurrent runs are all included without extra tooling. Pricing is transparent at $0.05 per step, with no hidden fees layered on top.

Where Stagehand asks you to write navigation logic, manage separate LLM provider costs, and patch code when sites change, Skyvern accepts a plain-language goal and executes the full workflow. The system re-reads pages visually at runtime and adapts when layouts shift. Your engineering time doesn't get spent keeping it current.

Final Thoughts on Picking the Right Automation Approach

The right tool comes down to what you're willing to own. If you want code-level control and your team can handle ongoing script maintenance, Stagehand fits. If you need workflows that keep running when sites redesign their layouts and you'd rather not spend engineering time patching selectors, Skyvern handles that structurally. We run live demos on real carrier portals and vendor sites so you can see exactly how visual automation responds when a page changes. Schedule one here and bring your hardest workflow.

FAQ

How do I decide whether Stagehand or Skyvern fits my workflow better?

Match the decision to how much control you need over individual steps. Stagehand gives you fine-grained control at the code level, where you write scripts that mix deterministic selectors with AI-powered instructions, which works well when you want to own the execution logic. Skyvern accepts a plain-language goal and executes the full workflow autonomously, which fits teams that need to hand off a task and get a result without maintaining orchestration code.

What's the main infrastructure difference between the two tools when running automation at scale?

Skyvern includes production infrastructure out of the box: isolated browser contexts per session, native proxy rotation covering 20+ countries, CAPTCHA and 2FA handling, and serverless scaling from hundreds to millions of concurrent runs. Stagehand depends on your own Playwright setup, so teams running it at scale typically build their own proxy rotation, session management, and retry logic, which adds engineering overhead over time.

Who is Stagehand best suited for?

Stagehand fits developer teams working in TypeScript who want precise control over which workflow steps use AI versus deterministic code, and who are comfortable writing and maintaining automation logic themselves. Teams prototyping agent workflows or building structured automation pipelines where that level of control supports the maintenance burden will find it a solid match.

When should I consider switching from selector-based automation to a visual approach?

If your team spends more than a few hours each week patching broken scripts after target sites update layouts or rename form elements, the maintenance burden has crossed the threshold where visual automation pays off. Skyvern re-reads pages at runtime, so layout changes and button relabels do not create maintenance tickets; the system adapts without code changes.

Can Stagehand handle multi-factor authentication and CAPTCHA challenges reliably?

Stagehand handles authentication through Playwright's session management, which works for straightforward login flows but can struggle with TOTP-based logins, multi-factor authentication, and CAPTCHA challenges that require reasoning about what's on screen. Teams running workflows that depend on 2FA or CAPTCHA solving typically need to build custom middleware or workarounds when using Stagehand.

What is Stagehand?

Key Features

Limitations

Bottom Line

What is Skyvern?

Key Features

Limitations

Bottom Line

Looking at How Stagehand and Skyvern Tackle Common Requirements

Hybrid Control vs. Task-Level Automation

Where the Difference Shows Up in Practice

Authentication, CAPTCHA, and Production Infrastructure

Infrastructure for Scale

Developer Experience and Language Support

API-First Flexibility

Self-Healing, Caching, and Workflow Maintenance

How Caching Affects Maintenance Load

Implications for Long-Running Workflows

Code Example: Running an Authenticated Workflow with Skyvern

Why Skyvern is the Better Choice

Final Thoughts on Picking the Right Automation Approach

FAQ

How do I decide whether Stagehand or Skyvern fits my workflow better?

What's the main infrastructure difference between the two tools when running automation at scale?

Who is Stagehand best suited for?

When should I consider switching from selector-based automation to a visual approach?

Can Stagehand handle multi-factor authentication and CAPTCHA challenges reliably?

Sign up for more like this.