Screenshot MCP Server for AI Agents

AI coding agents can write HTTP requests, parse JSON, and chain API calls. But they can't see what a webpage looks like. Ask Claude or Cursor to check whether a deploy broke the layout, generate an OG image preview, or verify that a cookie banner is actually hidden. The agent has no way to get a screenshot on its own. It needs a tool that speaks its protocol.

MCP (Model Context Protocol) is how AI agents call external tools. An MCP server exposes functions that the agent can discover and invoke without you writing glue code. The screenshotrun-mcp server gives any MCP-compatible client two tools: take_screenshot and check_usage. Install it once, set your API key, and the agent can capture any public URL with the same parameter control you'd get from a direct API call.

What the agent gets wrong without visual feedback

Code agents are text-native. They read DOM, parse CSS, analyze build logs. None of that tells them what the page actually renders. A component can pass every unit test and still be invisible because of a z-index conflict or an overflow: hidden on a parent div. The agent reports "looks good" because the HTML looks good. The user sees a broken page.

Visual verification closes that gap. With MCP, the agent takes a screenshot, sees the rendered result, and decides whether the layout matches expectations without you tabbing to a browser to check. Useful during development, but even more useful in automated flows: CI pipelines that need a visual diff, doc generators that need page thumbnails, monitoring scripts that need proof a deploy didn't break the hero section.

Two minutes from npm to a working tool

The server is published as screenshotrun-mcp on npm (v1.0.1). No global install needed. npx pulls and runs it on demand.

For Claude Desktop, add this block to your MCP config:

{
  "mcpServers": {
    "screenshotrun": {
      "command": "npx",
      "args": ["-y", "screenshotrun-mcp"],
      "env": {
        "SCREENSHOTRUN_API_KEY": "sr_live_your_key_here"
      }
    }
  }
}

Restart Claude Desktop and the agent picks up two tools: take_screenshot for capturing URLs, and check_usage for monitoring your quota. No SDK to import, no wrapper function to write.

For Claude Code, a single terminal command does the same thing:

claude mcp add screenshotrun npx -y screenshotrun-mcp -e SCREENSHOTRUN_API_KEY=sr_live_your_key_here

Cursor and Windsurf use the same JSON format. Drop it into .cursor/mcp.json or the equivalent config file. The MCP spec is shared across clients, so the config is nearly identical.

Every API parameter, available as a tool argument

Some MCP screenshot servers expose a minimal interface: one tool, a URL input, and maybe a format option. ScreenshotOne's MCP integration (checked June 2026) has a single render-website-screenshot tool. Good enough for basic captures, but the agent can't do anything the default settings don't cover.

screenshotrun-mcp passes through the full API surface. The agent can set viewport dimensions from 320 to 3840 pixels, choose between PNG, JPEG, WebP, and PDF output, enable full-page capture, emulate mobile or tablet devices, toggle dark mode, block ads and cookies, inject custom CSS or JavaScript (up to 10 KB each), wait for a CSS selector before capture, hide elements, click elements, and scroll to a specific position.

The interesting use cases need those options. Generating an OG card preview means setting width: 1200 and height: 630. Checking a mobile layout means passing device: "mobile". Verifying that a cookie banner was removed means enabling block_cookies (which is actually the default) and comparing the result. Without that control, the agent falls back to asking you to configure things manually. Which defeats the point.

The check_usage tool and why autonomous workflows need it

When an agent runs in a loop (capturing screenshots of 50 competitor sites, or generating thumbnails for a directory), it can burn through a quota fast. Most MCP screenshot servers don't give the agent a way to check how many credits are left. SnapRender's MCP (three tools, remote endpoint) doesn't expose usage data. Urlbox's MCP integration has unique features like MP4 video capture, but starts at $39/month with no free tier and no built-in quota check.

check_usage returns your current consumption so the agent can decide whether to continue or pause. I've found this prevents the most common batch failure: the agent captures 40 out of 50 URLs, hits the limit on the 41st, and the whole run fails with an opaque 429. With usage data available up front, the agent checks before starting and estimates whether the batch fits.

Where MCP adds value over a raw HTTP call

You could skip MCP entirely and have the agent construct HTTP requests to the screenshot API directly. That works. But there are real differences.

The agent discovers tools and their parameters automatically, so there's no prompt engineering to teach the API schema. Auth is handled by the config, which means the API key never appears in the conversation, never leaks into a code suggestion, never gets hardcoded into a generated script. And tool call validation catches parameter errors before they reach the server. A malformed JSON body in a raw HTTP request returns a 400 that the agent has to parse and retry. An MCP call with a wrong type gets caught at the client layer.

If you're already using MCP with other tools (database queries, file operations, Slack messaging), adding screenshots is one more entry in the same config file.

Workflows that work once the agent can see

After a deploy, the agent can capture a full-page screenshot of staging, compare it to the baseline, and flag pixel differences above a threshold. No dedicated visual testing framework needed. It takes the screenshot, diffs the images, and comments on the PR. I covered the broader approach in how developers use screenshot APIs, and MCP makes the screenshot step invisible to the workflow.

Doc sites benefit too. The agent captures live screenshots of the product, embeds them in the right pages, and updates them when the UI changes. No more stale screenshots from three releases ago.

Competitor monitoring is where I use it most. Capture a set of competitor homepages weekly, let the agent summarize visual changes (new pricing tiers, redesigned hero, added social proof), and compile a report. With custom viewport and device emulation, the agent checks both desktop and mobile versions in the same run.

OG image verification is another one. After deploying a blog post, the agent captures the page at 1200x630, compares the rendered og:image to what you intended, and confirms the social card looks right before you share the link. I wrote about how to generate Open Graph images with the API, and the MCP server makes it possible to verify them in the same conversation.

Limitations worth knowing about

The MCP server captures public URLs only. Pages behind authentication, VPN-gated dashboards, localhost during development: none of these work without exposing them to the public internet first. That's a limitation of any cloud-based screenshot API, not specific to MCP, but agents tend to assume they can access whatever you can, so the gap surprises people.

Speed is another factor. A screenshot takes 2-8 seconds depending on page complexity and format. In a conversation, that's barely noticeable. In a batch loop of 200 URLs, the agent needs to handle the latency, and you'll want to think about rate limiting strategies. There's no parallel capture in the MCP tool itself; the agent sends one request at a time.

I also don't have a clean answer for pages that require interaction sequences before the target state is visible. The click_selector parameter handles a single click (dismissing a modal, expanding a section), but multi-step flows like logging in, navigating a wizard, or scrolling through an infinite feed are beyond what one screenshot call can do. For those, you need a full Playwright script. I covered that tradeoff in the build-vs-buy comparison.

take_screenshot parameter reference

Every parameter from the REST API is available as a tool argument. Full spec in the API documentation.

Parameter	Type	Default	Notes
`url`	string	—	Required. Any public URL.
`format`	string	png	png, jpeg, webp, pdf
`width`	integer	1280	320–3840
`height`	integer	800	200–2160
`full_page`	boolean	false	Capture entire scrollable page
`quality`	integer	80	1–100 (jpeg/webp only)
`device`	string	desktop	desktop, mobile, tablet
`dark_mode`	boolean	false	Emulates `prefers-color-scheme: dark`
`block_ads`	boolean	false	Removes ad scripts and iframes
`block_cookies`	boolean	true	Hides consent banners
`delay`	integer	0	0–10 seconds wait before capture
`retina`	boolean	false	2x device pixel ratio
`css`	string	—	Inject custom CSS (max 10 KB)
`js`	string	—	Inject custom JavaScript (max 10 KB)
`selector`	string	—	Capture a specific element
`hide_selectors`	string	—	CSS selectors to hide before capture
`click_selector`	string	—	Click an element before capture
`scroll_to`	string	—	Scroll to element or position

check_usage takes no parameters and returns your current billing period consumption.

If you're already working with Claude Code or Cursor, adding the MCP server takes less time than reading this page. And if you run into edge cases with specific pages, check how to handle screenshot API errors in production or grab a Node.js screenshot example to test outside the agent context.

[cta heading="Give your AI agent eyes" button_text="Get your API key -- free tier included" button_url="/register" style="default"]