page-agent: Drop a Single Script Tag to Embed an AI Agent Inside Any Web Page

2026-03-17 · # AI 활용

Web automation tools have long worked by looking in from the outside. Whether Selenium or Playwright, a separate process remote-controls the browser. browser-use layered an LLM on top, but still required a Python server. Alibaba’s page-agent¹ flips that premise entirely. The agent lives inside the web page.

Inside-Out: The Core Idea Behind page-agent

simon_luv_pho, the creator of page-agent, described it this way in a Hacker News Show HN post:

“I’m experimenting with an ‘inside-out’ paradigm. By dropping the library into a page, you get a client-side agent that interacts natively with the live DOM tree and inherits the user’s active session out of the box.” — simon_luv_pho, HN²

Where existing tools spin up a headless browser and manipulate the DOM from outside, page-agent injects directly into the current page via a single <script> tag. This difference goes beyond mere deployment convenience—it represents an architectural shift.

<script src="https://cdn.jsdelivr.net/npm/page-agent@1.5.7/dist/iife/page-agent.demo.js" crossorigin="true"></script>

That single line surfaces an AI agent UI on the page. Type a natural-language instruction like “click the login button,” and the agent parses the DOM, locates the button, and clicks it.

[!KEY] page-agent requires no server and no browser extension. It runs entirely as in-page JavaScript and relies on text-based DOM manipulation instead of screenshots, so it works without a multimodal LLM.

How It Works: Text-Based DOM Processing

page-agent never takes a screenshot. Instead, it parses the page’s HTML structure as text and identifies interactive elements—buttons, input fields, links. This DOM processing layer is based on browser-use’s code (MIT license)¹.

graph TD
    A[Natural language input] --> B[page-agent core]
    B --> C[DOM parsing<br/>Text extraction]
    C --> D[Send to LLM<br/>Decide action]
    D --> E[Execute DOM manipulation]
    E --> F[Human-in-the-loop<br/>Approve / Reject UI]
    F -->|Approve| G[Task complete]
    F -->|Reject| A

The advantages are clear. First, sending text to an LLM costs dramatically fewer tokens than sending screenshots. Second, skipping image analysis means faster response times. Third, a plain text LLM is all you need—no GPT-4 Vision or other multimodal model required.

BYO LLM: Plug In Any Model You Want

page-agent follows a BYO LLM (Bring Your Own LLM) philosophy. Any model that speaks the OpenAI API format will work—GPT-4, Claude, Qwen, Mistral, even a locally hosted open-source model.

import { PageAgent } from 'page-agent'

const agent = new PageAgent({
  model: 'qwen3.5-plus',
  baseURL: 'https://dashscope.aliyuncs.com/compatible-mode/v1',
  apiKey: 'YOUR_API_KEY',
  language: 'ko-KR',
})

await agent.execute('Change the shipping address on the order to Gangnam-gu, Seoul')

The npm package weighs in at 224 kB³. Not exactly featherweight for a frontend library, but reasonable given that it bundles both UI components and the full DOM processing logic.

How It Differs from Existing Tools

The web automation ecosystem expanded rapidly between 2025 and 2026. Playwright, browser-use, Stagehand, and now page-agent—each takes a fundamentally different approach.

Criterion	page-agent	browser-use	Stagehand	Playwright
Runs in	Browser (client)	Server (Python)	Server (Node.js)	Server (multi-lang)
Backend required	No	Yes	Yes	Yes
Vision model	Not needed	Optional	Optional	N/A
Integration effort	One script tag	Significant setup	Playwright extension	Significant setup
Human-in-the-loop	Built-in	Not supported	Not supported	Not supported
Multi-page	Requires Chrome ext	Native	Native	Native
Primary use case	In-page copilot	Server automation	AI + test hybrid	E2E testing

The key distinction is positioning. Playwright and Selenium are test tools that control the browser from outside. browser-use added an AI layer on top but remained server-side. Stagehand extended Playwright with AI primitives like act(), extract(), and observe()⁴.

page-agent is the only tool born on the client side. That means an SPA developer can embed an AI copilot in their product with just a few lines of code—collapsing 20-click ERP workflows into a single sentence, or overlaying a natural-language interface on an admin panel.

In a previous agent-browser skill review, I looked at token-efficiency issues in browser automation tools. page-agent sidesteps the problem entirely by never using screenshots at all.

The Chrome Extension: Breaking the Single-Page Barrier

The biggest limitation of page-agent is that, by default, it only operates on the current page. Navigating to another tab or hopping between sites is off the table.

A separate Chrome extension addresses this⁵. Once installed, the in-page agent can open other tabs, extract information, and return to the original page for multi-page workflows. The interesting part is the role reversal: conventional browser automation tools have an external program controlling the browser, but with page-agent, the web app controls the browser.

That said, the extension requires manual installation, which dilutes page-agent’s core selling point of “one script tag and you’re done” in multi-page scenarios.

Real-World Use Cases

The page-agent documentation and community highlight three primary scenarios:

SaaS AI Copilot — The most straightforward application: embedding an AI assistant in your own web product. No backend rewrite needed; just add a script to the frontend for natural-language UI control. Especially valuable for form-heavy systems like CRMs and ERPs.

Accessibility — Users with visual or motor impairments can operate web apps via voice commands or natural language. Where traditional accessibility tools depend on static labels, page-agent leverages an LLM’s natural-language understanding for more flexible interaction.

Smart Form Filling — Turn a 20-click admin workflow into a single sentence: “Mark all March orders from Seoul as shipped.”

Limitations and Caveats

page-agent is not a silver bullet. Several structural limitations exist.

Security Concerns — API keys are exposed on the client side. In production, you need a proxy server to keep them hidden. The demo test LLM API is not an official Alibaba Cloud product and may be discontinued at any time⁶.

DOM Structure Dependency — Text-based DOM parsing inherently means it works best when the HTML is clean and well-structured. Heavily obfuscated pages or Canvas/WebGL-based UIs are poor fits.

Single-Page Constraint — Without the Chrome extension, you’re confined to the current page—a sharp contrast to browser-use and Playwright, which support multi-page workflows natively.

LLM Quality Dependency — Agent accuracy is entirely tied to the quality of the connected LLM. Hook up a weak model, and it will click the wrong elements or perform unintended actions.

Early Stage — Since its late-2024 release, the project has grown quickly to over 9,300 GitHub stars. But even at version 1.5.7, the API continues to change. Traces of its browser-use origins are still being cleaned up—for example, data-browser-use-ignore was renamed to data-page-agent-ignore⁷.

Who Should Use It

page-agent is not a replacement for all web automation. For high-volume server-side scraping or E2E tests in a CI/CD pipeline, Playwright remains the right choice. For complex multi-site automation agents, browser-use is a better fit.

Where page-agent shines is clear: frontend developers who want to quickly embed an AI copilot in their web product. With a single npm package, you can add natural-language control to an existing SPA—no backend infrastructure changes required. The built-in human-in-the-loop UI provides a safety net for production use.

Alibaba’s experiment raises an intriguing question about the future of GUI agents. Must agents always control the browser from outside? Or can the web app itself become the agent? page-agent bets on the latter, and how far that direction can go remains an open question.

alibaba/page-agent GitHub repository, https://github.com/alibaba/page-agent ↩ ↩²
Show HN: PageAgent, A GUI agent that lives inside your web app, Hacker News, https://news.ycombinator.com/item?id=47264138 ↩
page-agent npm package, https://www.npmjs.com/package/page-agent ↩
NxCode, “Stagehand vs Browser Use vs Playwright: AI Browser Automation Compared (2026)”, https://www.nxcode.io/resources/news/stagehand-vs-browser-use-vs-playwright-ai-browser-automation-2026 ↩
Chrome Extension is Here!, GitHub Issue #129, https://github.com/alibaba/page-agent/issues/129 ↩
page-agent Terms and Privacy, https://github.com/alibaba/page-agent/blob/main/docs/terms-and-privacy.md ↩
page-agent Releases, https://github.com/alibaba/page-agent/releases ↩