$cd ../tutorials/
Advanced35 min read

Mastering Browser Automation

Leverage OpenClaw's native "Computer Use" to click, type, and navigate the web just like a human.

introduction.md

Introduction

Beyond simple API integrations, OpenClaw possesses profound capabilities when interacting with graphical user interfaces. Using the Computer Use model standard (popularized by Claude 3.5 Sonnet), your local agent can take control of a Chromium instance to perform complex visual navigation.

This tutorial covers everything from setting up your local Chrome testing environment to writing resilient prompt flows for web scraping.

⚠️ Security Notice: Browser automation gives the AI control over an active session. Do not authorize the agent to perform financial transactions or access highly sensitive accounts unattended.
prerequisites.md

1. Prerequisites

  • β€’OpenClaw v1.3.0 or newer.
  • β€’An LLM that supports Vision + Computer Use tools (e.g., Anthropic models or specialized local models like Qwen2-VL).
  • β€’Google Chrome or Chromium installed on the host machine.
enable_browser.json

2. Enabling the Browser Tool

In your OpenClaw configuration file (~/.openclaw/config.json), ensure the browser capability is enabled.

{
"capabilities": {
"computer_use": true,
"browser_path": "/usr/bin/google-chrome"
}
}
coordinate_click.md

3. The "Coordinate & Click" Workflow

Unlike traditional DOM-based scrapers (like Puppeteer or Playwright), OpenClaw "sees" the screen. It takes a screenshot, calculates the X/Y coordinates of the button you want, and moves the virtual mouse to click it.

Example: Form Filling

You can prompt the agent naturally:

Please open https://example-crm.com/login. Type "admin@company.com" into the email field. Type my password from the secure vault into the password field. Click the blue "Sign In" button.
captcha.md

4. Dealing with Captchas

Because OpenClaw acts through a real browser profile, it naturally avoids many basic bot-detection scripts. However, for visible CAPTCHAs, you have two options:

  1. 1.Human-in-the-loop: Add a prompt instruction
  2. 2.API Solvers: Integrate a third-party solver skill alongside the browser skill.
visual_scraping.md

5. Extracting Data (Visual Scraping)

Instead of parsing complex HTML nested tables, you can ask OpenClaw to visually construct the data.

Go to Yahoo Finance for AAPL. Look at the summary table on the left. Extract the "Previous Close", "Open", and "Market Cap" into a JSON object.
troubleshoot.md

Troubleshooting

  • β€’Click misses the target: Ensure your display scaling is set to 100%. Fractional scaling (150%) can confuse coordinate mapping.
  • β€’"Cannot find executable": Verify the browser_path in your config exactly matches your system's Chrome installation.
$ cd ../tutorials/* END_OF_TUTORIAL */