Browser Automation

Automate web browsers with Playwright. Navigate pages, take screenshots, extract data, and use LLM vision to understand and interact with any website.

How It Works

Browser automation adds a browser tool type to workflows. Each step controls a headless Chromium instance — navigate, click, type, scroll, screenshot, and extract data. For AI-driven automation, the analyze action sends a screenshot to an LLM vision model that decides what to do next.

Tool Definition

json

{
  "name": "browser",
  "type": "browser",
  "params": {
    "timeout": 60000
  }
}

Actions

launch

Start a browser session.

json

{
  "id": "start",
  "action": "browser",
  "params": {
    "action": "launch",
    "headless": true,
    "viewport": { "width": 800, "height": 600 }
  }
}

navigate

Go to a URL.

json

{
  "id": "go",
  "action": "browser",
  "params": {
    "action": "navigate",
    "url": "https://example.com",
    "waitUntil": "domcontentloaded"
  }
}

Returns { url, title }.

click

Click an element by CSS selector.

json

{
  "id": "click_login",
  "action": "browser",
  "params": {
    "action": "click",
    "selector": "#login-button"
  }
}

type

Type text into an input field.

json

{
  "id": "enter_email",
  "action": "browser",
  "params": {
    "action": "type",
    "selector": "input[name='email']",
    "text": "user@example.com"
  }
}

scroll

Scroll the page up or down.

json

{
  "id": "scroll_down",
  "action": "browser",
  "params": {
    "action": "scroll",
    "direction": "down",
    "amount": 500
  }
}

wait

Wait for an element to appear.

json

{
  "id": "wait_results",
  "action": "browser",
  "params": {
    "action": "wait",
    "selector": ".results-loaded",
    "timeout": 10000
  }
}

screenshot

Capture the page as a compressed JPEG (~50-100KB).

json

{
  "id": "capture",
  "action": "browser",
  "params": {
    "action": "screenshot"
  }
}

Returns { screenshot, success } where screenshot is a data:image/jpeg;base64,... string.

extract

Extract text content from the page or a specific element.

json

{
  "id": "get_prices",
  "action": "browser",
  "params": {
    "action": "extract",
    "selector": ".price-list"
  }
}

Returns { content, success }.

html

Get raw HTML content.

json

{
  "id": "get_html",
  "action": "browser",
  "params": {
    "action": "html",
    "selector": ".product-grid"
  }
}

Returns { html, success }.

evaluate

Run JavaScript in the browser context.

json

{
  "id": "get_data",
  "action": "browser",
  "params": {
    "action": "evaluate",
    "script": "Array.from(document.querySelectorAll('.item')).map(el => ({ name: el.textContent, href: el.href }))"
  }
}

Returns { result, success }.

analyze

Screenshot + LLM vision in one step. Sends the screenshot to your configured LLM provider and returns the AI's analysis.

json

{
  "id": "understand_page",
  "action": "browser",
  "params": {
    "action": "analyze",
    "prompt": "What products are shown on this page? List their names and prices."
  }
}

Returns { screenshot, analysis }.

close

End the browser session.

json

{
  "id": "cleanup",
  "action": "browser",
  "params": {
    "action": "close"
  }
}

Example Workflow

A scraper that navigates to a URL, analyzes the page with AI vision, and extracts data:

json

{
  "id": "browser-scrape",
  "name": "Browser Scraper",
  "tools": [
    { "name": "browser", "type": "browser" }
  ],
  "workflows": [{
    "name": "scrape_and_analyze",
    "trigger": { "type": "manual" },
    "steps": [
      {
        "id": "launch",
        "action": "browser",
        "params": { "action": "launch", "headless": true }
      },
      {
        "id": "navigate",
        "action": "browser",
        "dependsOn": ["launch"],
        "params": { "action": "navigate", "url": "{{url}}" }
      },
      {
        "id": "analyze",
        "action": "browser",
        "dependsOn": ["navigate"],
        "params": { "action": "analyze", "prompt": "{{prompt}}" }
      },
      {
        "id": "extract",
        "action": "browser",
        "dependsOn": ["analyze"],
        "params": { "action": "extract", "selector": "{{selector}}" },
        "optional": true,
        "defaultValue": { "content": "" }
      },
      {
        "id": "close",
        "action": "browser",
        "dependsOn": ["extract"],
        "params": { "action": "close" }
      }
    ]
  }]
}

Route Configuration

Expose browser automation as an API endpoint:

json

{
  "path": "/browser/scrape",
  "method": "post",
  "requireAuth": true,
  "authProvider": "firebase",
  "subscribable": {
    "enabled": true,
    "queueName": "default",
    "estimatedTime": "30s"
  },
  "integrations": {
    "actions": [{
      "type": "workflow",
      "workflowId": "browser-scrape",
      "input": {
        "url": "{{body.url}}",
        "prompt": "{{body.prompt}}",
        "selector": "{{body.selector}}"
      }
    }]
  }
}

Subscribable Streaming

With subscribable enabled, the endpoint returns immediately with subscription URLs:

bash

curl -X POST /browser/scrape \
  -H "Authorization: Bearer $TOKEN" \
  -d '{"url": "https://example.com", "prompt": "Extract all product info"}'

Response:

json

{
  "accepted": true,
  "jobId": "abc-123",
  "estimatedTime": "30s",
  "subscribe": {
    "sse": "/queues/default/jobs/abc-123/subscribe",
    "websocket": { "path": "/ws", "channel": "job:abc-123" },
    "poll": "/queues/default/jobs/abc-123"
  }
}

Subscribe via SSE to receive live updates:

javascript

const eventSource = new EventSource("/queues/default/jobs/abc-123/subscribe");

eventSource.onmessage = (e) => {
  const data = JSON.parse(e.data);
  if (data.screenshot) {
    // Render live browser frame (~50-100KB JPEG)
    img.src = data.screenshot;
  }
  if (data.analysis) {
    // Show AI analysis
    console.log(data.analysis);
  }
};

Session Isolation

Each workflow execution gets its own Playwright browser context with separate cookies, storage, and state. No data leaks between sessions.

Vision Loop

For autonomous multi-step browsing, chain actions in a loop pattern:

screenshot — capture current page
analyze — LLM decides what to do next
Execute the action (click, type, scroll)
Repeat until goal is met

The analyze action is a shortcut that combines steps 1 and 2. Use evaluate or extract when you know the CSS selectors to skip LLM costs entirely.

Setup

bash

npm install playwright
npx playwright install chromium

Browser Automation ​

How It Works ​

Tool Definition ​

Actions ​

launch ​

navigate ​

click ​

type ​

scroll ​

wait ​

screenshot ​

extract ​

html ​

evaluate ​

analyze ​

close ​

Example Workflow ​

Route Configuration ​

Subscribable Streaming ​

Session Isolation ​

Vision Loop ​

Setup ​

Browser Automation

How It Works

Tool Definition

Actions

launch

navigate

click

type

scroll

wait

screenshot

extract

html

evaluate

analyze

close

Example Workflow

Route Configuration

Subscribable Streaming

Session Isolation

Vision Loop

Setup