Open Source · MIT License

Give your AI agent
eyes.

Without the vision model.

WebScope renders any web page into a structured text grid that LLMs can read, understand, and interact with — no screenshots, no vision APIs, no pixel parsing.

Start for free View on GitHub

or try it below ↓

$ npm install -g webscope

webscope

$ █

Why WebScope?

Screenshot vs Text Grid

Same page, radically different cost.

Screenshot + Vision Model ~1.2 MB

× GPT-4V / Claude Vision API call

× ~1,500ms latency

× ~2,000 tokens for the image alone

× Hallucinations on dense UIs

WebScope Text Grid ~1.8 KB

[0] Hacker News  [1] new | [2] past | [3] comments     [4] login

 1. [5] Show HN: WebScope – text-grid browser
    for AI agents  (github.com)
    142 pts by adityapandey 3h ago | [6] 89 comments

 2. [7] Why LLMs don't need screenshots
    87 pts by somebody 5h ago | [8] 34 comments

[9:______________________]  [10 Search]

✓ No vision API needed

✓ ~80ms render time

✓ ~50-150 tokens total

✓ Spatial layout preserved

Features

Built for AI agents

Everything your agent needs to browse the web, in one package.

Spatial Layout Preserved

See where elements are on the page, not just what they are. Proximity, grouping, and visual hierarchy — all intact.

No Vision Model

Zero API calls to GPT-4V or Claude Vision. The output is plain text — native to how LLMs already think.

Full JavaScript Execution

Real Chromium renders SPAs, auth flows, and dynamic content — exactly like a browser would.

Session Management

Persistent cookies, localStorage, and auth state across page transitions. Resume flows anytime.

Multi-Agent Ready

Isolated sessions run in parallel. Multiple agents browse simultaneously without collisions.

MCP Native

16 tools, plug into Claude Desktop, Cursor, Windsurf, or Cline with zero configuration.

Live Demo

See it in action

One command. Real output. ~500 bytes.

Terminal

$ webscope https://example.com

Rendering: https://example.com

════════════════════════════════════════
           Example Domain
════════════════════════════════════════

  This domain is for use in illustrative
  examples in documents. You may use this
  domain in literature without prior
  coordination or asking for permission.

  [0]More information...

────────────────────────────────────────
Elements: 1 interactive | Rendered in 84ms

Integrations

Works with your stack

Drop WebScope into whatever you're already using.

Zero-config. Install once, any MCP client gets full web browsing.

// Claude Desktop — claude_desktop_config.json
{
  "mcpServers": {
    "webscope": {
      "command": "webscope-mcp"
    }
  }
}

Ready-made tool definitions for function calling.

import json

with open("tools/tool_definitions.json") as f:
    webscope_tools = json.load(f)["tools"]

response = openai.chat.completions.create(
    model="gpt-4",
    messages=[{"role": "user", "content": "Go to example.com"}],
    tools=webscope_tools,
)

LangChain tools, ready to use with any agent.

from tools.langchain import get_webscope_tools

# Start the server: webscope --serve 3000
tools = get_webscope_tools(base_url="http://localhost:3000")

agent = initialize_agent(tools, llm, agent="zero-shot-react-description")
agent.run("Find the top story on Hacker News")

CrewAI tool classes, plug into any agent role.

from tools.crewai import (
    WebScopeBrowseTool, WebScopeClickTool, WebScopeTypeTool
)

researcher = Agent(
    role="Web Researcher",
    tools=[WebScopeBrowseTool(), WebScopeClickTool()],
    llm=llm,
)

REST API — call from anything. curl, Python, your orchestrator.

# Start the server
$ webscope --serve 3000

# Navigate
$ curl -X POST http://localhost:3000/navigate \
    -H 'Content-Type: application/json' \
    -d '{"url": "https://example.com"}'

# Interact
$ curl -X POST http://localhost:3000/click -d '{"ref": 3}'
$ curl -X POST http://localhost:3000/type -d '{"ref": 7, "text": "hello"}'

Use it as a library — no server required.

const { AgentBrowser } = require('webscope');

const browser = new AgentBrowser({ cols: 120 });
const { view, elements, meta } = await browser.navigate(
  'https://example.com'
);

console.log(view);        // The text grid
console.log(elements);    // { 0: { selector, tag, text, href }, ... }

await browser.click(3);
await browser.type(7, 'hello');
await browser.close();

Under the Hood

How it works

Render

A real Chromium instance loads the page with full JavaScript and CSS execution.

Extract

Every visible element's position, size, text, and interactivity is captured via getBoundingClientRect().

Map

Pixel coordinates are converted to character grid positions, preserving spatial layout.

Annotate

Interactive elements get [ref] numbers so agents can click, type, and scroll by reference.

Use Cases

What you can build

Job Applications

Automate multi-step ATS flows across Greenhouse, Lever, and custom portals.

Web Research

Extract structured data from any page. Search, navigate, and aggregate results programmatically.

UI Testing

Verify layouts and content without pixel matching. Assert field values across page transitions.

Form Automation

Fill, validate, and submit complex forms across multi-step wizards.

Content Monitoring

Track changes on live pages. Detect updates, compare snapshots, alert on diffs.

Give your AI agent eyes.