Skip to content

Browser Automation

Automate web browsers with Playwright. Navigate pages, take screenshots, extract data, and use LLM vision to understand and interact with any website.

How It Works

Browser automation adds a browser tool type to workflows. Each step controls a headless Chromium instance — navigate, click, type, scroll, screenshot, and extract data. For AI-driven automation, the analyze action sends a screenshot to an LLM vision model that decides what to do next.

Tool Definition

json
{
  "name": "browser",
  "type": "browser",
  "params": {
    "timeout": 60000
  }
}

Actions

launch

Start a browser session.

json
{
  "id": "start",
  "action": "browser",
  "params": {
    "action": "launch",
    "headless": true,
    "viewport": { "width": 800, "height": 600 }
  }
}

Go to a URL.

json
{
  "id": "go",
  "action": "browser",
  "params": {
    "action": "navigate",
    "url": "https://example.com",
    "waitUntil": "domcontentloaded"
  }
}

Returns { url, title }.

click

Click an element by CSS selector.

json
{
  "id": "click_login",
  "action": "browser",
  "params": {
    "action": "click",
    "selector": "#login-button"
  }
}

type

Type text into an input field.

json
{
  "id": "enter_email",
  "action": "browser",
  "params": {
    "action": "type",
    "selector": "input[name='email']",
    "text": "user@example.com"
  }
}

scroll

Scroll the page up or down.

json
{
  "id": "scroll_down",
  "action": "browser",
  "params": {
    "action": "scroll",
    "direction": "down",
    "amount": 500
  }
}

wait

Wait for an element to appear.

json
{
  "id": "wait_results",
  "action": "browser",
  "params": {
    "action": "wait",
    "selector": ".results-loaded",
    "timeout": 10000
  }
}

screenshot

Capture the page as a compressed JPEG (~50-100KB).

json
{
  "id": "capture",
  "action": "browser",
  "params": {
    "action": "screenshot"
  }
}

Returns { screenshot, success } where screenshot is a data:image/jpeg;base64,... string.

extract

Extract text content from the page or a specific element.

json
{
  "id": "get_prices",
  "action": "browser",
  "params": {
    "action": "extract",
    "selector": ".price-list"
  }
}

Returns { content, success }.

html

Get raw HTML content.

json
{
  "id": "get_html",
  "action": "browser",
  "params": {
    "action": "html",
    "selector": ".product-grid"
  }
}

Returns { html, success }.

evaluate

Run JavaScript in the browser context.

json
{
  "id": "get_data",
  "action": "browser",
  "params": {
    "action": "evaluate",
    "script": "Array.from(document.querySelectorAll('.item')).map(el => ({ name: el.textContent, href: el.href }))"
  }
}

Returns { result, success }.

analyze

Screenshot + LLM vision in one step. Sends the screenshot to your configured LLM provider and returns the AI's analysis.

json
{
  "id": "understand_page",
  "action": "browser",
  "params": {
    "action": "analyze",
    "prompt": "What products are shown on this page? List their names and prices."
  }
}

Returns { screenshot, analysis }.

close

End the browser session.

json
{
  "id": "cleanup",
  "action": "browser",
  "params": {
    "action": "close"
  }
}

Example Workflow

A scraper that navigates to a URL, analyzes the page with AI vision, and extracts data:

json
{
  "id": "browser-scrape",
  "name": "Browser Scraper",
  "tools": [
    { "name": "browser", "type": "browser" }
  ],
  "workflows": [{
    "name": "scrape_and_analyze",
    "trigger": { "type": "manual" },
    "steps": [
      {
        "id": "launch",
        "action": "browser",
        "params": { "action": "launch", "headless": true }
      },
      {
        "id": "navigate",
        "action": "browser",
        "dependsOn": ["launch"],
        "params": { "action": "navigate", "url": "{{url}}" }
      },
      {
        "id": "analyze",
        "action": "browser",
        "dependsOn": ["navigate"],
        "params": { "action": "analyze", "prompt": "{{prompt}}" }
      },
      {
        "id": "extract",
        "action": "browser",
        "dependsOn": ["analyze"],
        "params": { "action": "extract", "selector": "{{selector}}" },
        "optional": true,
        "defaultValue": { "content": "" }
      },
      {
        "id": "close",
        "action": "browser",
        "dependsOn": ["extract"],
        "params": { "action": "close" }
      }
    ]
  }]
}

Route Configuration

Expose browser automation as an API endpoint:

json
{
  "path": "/browser/scrape",
  "method": "post",
  "requireAuth": true,
  "authProvider": "firebase",
  "subscribable": {
    "enabled": true,
    "queueName": "default",
    "estimatedTime": "30s"
  },
  "integrations": {
    "actions": [{
      "type": "workflow",
      "workflowId": "browser-scrape",
      "input": {
        "url": "{{body.url}}",
        "prompt": "{{body.prompt}}",
        "selector": "{{body.selector}}"
      }
    }]
  }
}

Subscribable Streaming

With subscribable enabled, the endpoint returns immediately with subscription URLs:

bash
curl -X POST /browser/scrape \
  -H "Authorization: Bearer $TOKEN" \
  -d '{"url": "https://example.com", "prompt": "Extract all product info"}'

Response:

json
{
  "accepted": true,
  "jobId": "abc-123",
  "estimatedTime": "30s",
  "subscribe": {
    "sse": "/queues/default/jobs/abc-123/subscribe",
    "websocket": { "path": "/ws", "channel": "job:abc-123" },
    "poll": "/queues/default/jobs/abc-123"
  }
}

Subscribe via SSE to receive live updates:

javascript
const eventSource = new EventSource("/queues/default/jobs/abc-123/subscribe");

eventSource.onmessage = (e) => {
  const data = JSON.parse(e.data);
  if (data.screenshot) {
    // Render live browser frame (~50-100KB JPEG)
    img.src = data.screenshot;
  }
  if (data.analysis) {
    // Show AI analysis
    console.log(data.analysis);
  }
};

Session Isolation

Each workflow execution gets its own Playwright browser context with separate cookies, storage, and state. No data leaks between sessions.

Vision Loop

For autonomous multi-step browsing, chain actions in a loop pattern:

  1. screenshot — capture current page
  2. analyze — LLM decides what to do next
  3. Execute the action (click, type, scroll)
  4. Repeat until goal is met

The analyze action is a shortcut that combines steps 1 and 2. Use evaluate or extract when you know the CSS selectors to skip LLM costs entirely.

Setup

bash
npm install playwright
npx playwright install chromium

Backflow - Configuration-driven API framework