A Model Context Protocol (MCP) server implementation that integrates with WebScraping.AI for web data extraction capabilities.
- Question answering about web page content
 - Structured data extraction from web pages
 - HTML content retrieval with JavaScript rendering
 - Plain text extraction from web pages
 - CSS selector-based content extraction
 - Multiple proxy types (datacenter, residential) with country selection
 - JavaScript rendering using headless Chrome/Chromium
 - Concurrent request management with rate limiting
 - Custom JavaScript execution on target pages
 - Device emulation (desktop, mobile, tablet)
 - Account usage monitoring
 
env WEBSCRAPING_AI_API_KEY=your_api_key npx -y webscraping-ai-mcp# Clone the repository
git clone https://github.com/webscraping-ai/webscraping-ai-mcp-server.git
cd webscraping-ai-mcp-server
# Install dependencies
npm install
# Run
npm startNote: Requires Cursor version 0.45.6+
The WebScraping.AI MCP server can be configured in two ways in Cursor:
- 
Project-specific Configuration (recommended for team projects): Create a
.cursor/mcp.jsonfile in your project directory:{ "servers": { "webscraping-ai": { "type": "command", "command": "npx -y webscraping-ai-mcp", "env": { "WEBSCRAPING_AI_API_KEY": "your-api-key", "WEBSCRAPING_AI_CONCURRENCY_LIMIT": "5" } } } } - 
Global Configuration (for personal use across all projects): Create a
~/.cursor/mcp.jsonfile in your home directory with the same configuration format as above. 
If you are using Windows and are running into issues, try using
cmd /c "set WEBSCRAPING_AI_API_KEY=your-api-key && npx -y webscraping-ai-mcp"as the command.
This configuration will make the WebScraping.AI tools available to Cursor's AI agent automatically when relevant for web scraping tasks.
Add this to your claude_desktop_config.json:
{
  "mcpServers": {
    "mcp-server-webscraping-ai": {
      "command": "npx",
      "args": ["-y", "webscraping-ai-mcp"],
      "env": {
        "WEBSCRAPING_AI_API_KEY": "YOUR_API_KEY_HERE",
        "WEBSCRAPING_AI_CONCURRENCY_LIMIT": "5"
      }
    }
  }
}WEBSCRAPING_AI_API_KEY: Your WebScraping.AI API key- Required for all operations
 - Get your API key from WebScraping.AI
 
WEBSCRAPING_AI_CONCURRENCY_LIMIT: Maximum number of concurrent requests (default:5)WEBSCRAPING_AI_DEFAULT_PROXY_TYPE: Type of proxy to use (default:residential)WEBSCRAPING_AI_DEFAULT_JS_RENDERING: Enable/disable JavaScript rendering (default:true)WEBSCRAPING_AI_DEFAULT_TIMEOUT: Maximum web page retrieval time in ms (default:15000, max:30000)WEBSCRAPING_AI_DEFAULT_JS_TIMEOUT: Maximum JavaScript rendering time in ms (default:2000)
For standard usage:
# Required
export WEBSCRAPING_AI_API_KEY=your-api-key
# Optional - customize behavior (default values)
export WEBSCRAPING_AI_CONCURRENCY_LIMIT=5
export WEBSCRAPING_AI_DEFAULT_PROXY_TYPE=residential # datacenter or residential
export WEBSCRAPING_AI_DEFAULT_JS_RENDERING=true
export WEBSCRAPING_AI_DEFAULT_TIMEOUT=15000
export WEBSCRAPING_AI_DEFAULT_JS_TIMEOUT=2000Ask questions about web page content.
{
  "name": "webscraping_ai_question",
  "arguments": {
    "url": "https://example.com",
    "question": "What is the main topic of this page?",
    "timeout": 30000,
    "js": true,
    "js_timeout": 2000,
    "wait_for": ".content-loaded",
    "proxy": "datacenter",
    "country": "us"
  }
}Example response:
{
  "content": [
    {
      "type": "text",
      "text": "The main topic of this page is examples and documentation for HTML and web standards."
    }
  ],
  "isError": false
}Extract structured data from web pages based on instructions.
{
  "name": "webscraping_ai_fields",
  "arguments": {
    "url": "https://example.com/product",
    "fields": {
      "title": "Extract the product title",
      "price": "Extract the product price",
      "description": "Extract the product description"
    },
    "js": true,
    "timeout": 30000
  }
}Example response:
{
  "content": [
    {
      "type": "text",
      "text": {
        "title": "Example Product",
        "price": "$99.99",
        "description": "This is an example product description."
      }
    }
  ],
  "isError": false
}Get the full HTML of a web page with JavaScript rendering.
{
  "name": "webscraping_ai_html",
  "arguments": {
    "url": "https://example.com",
    "js": true,
    "timeout": 30000,
    "wait_for": "#content-loaded"
  }
}Example response:
{
  "content": [
    {
      "type": "text",
      "text": "<html>...[full HTML content]...</html>"
    }
  ],
  "isError": false
}Extract the visible text content from a web page.
{
  "name": "webscraping_ai_text",
  "arguments": {
    "url": "https://example.com",
    "js": true,
    "timeout": 30000
  }
}Example response:
{
  "content": [
    {
      "type": "text",
      "text": "Example Domain\nThis domain is for use in illustrative examples in documents..."
    }
  ],
  "isError": false
}Extract content from a specific element using a CSS selector.
{
  "name": "webscraping_ai_selected",
  "arguments": {
    "url": "https://example.com",
    "selector": "div.main-content",
    "js": true,
    "timeout": 30000
  }
}Example response:
{
  "content": [
    {
      "type": "text",
      "text": "<div class=\"main-content\">This is the main content of the page.</div>"
    }
  ],
  "isError": false
}Extract content from multiple elements using CSS selectors.
{
  "name": "webscraping_ai_selected_multiple",
  "arguments": {
    "url": "https://example.com",
    "selectors": ["div.header", "div.product-list", "div.footer"],
    "js": true,
    "timeout": 30000
  }
}Example response:
{
  "content": [
    {
      "type": "text",
      "text": [
        "<div class=\"header\">Header content</div>",
        "<div class=\"product-list\">Product list content</div>",
        "<div class=\"footer\">Footer content</div>"
      ]
    }
  ],
  "isError": false
}Get information about your WebScraping.AI account.
{
  "name": "webscraping_ai_account",
  "arguments": {}
}Example response:
{
  "content": [
    {
      "type": "text",
      "text": {
        "requests": 5000,
        "remaining": 4500,
        "limit": 10000,
        "resets_at": "2023-12-31T23:59:59Z"
      }
    }
  ],
  "isError": false
}The following options can be used with all scraping tools:
timeout: Maximum web page retrieval time in ms (15000 by default, maximum is 30000)js: Execute on-page JavaScript using a headless browser (true by default)js_timeout: Maximum JavaScript rendering time in ms (2000 by default)wait_for: CSS selector to wait for before returning the page contentproxy: Type of proxy, datacenter or residential (residential by default)country: Country of the proxy to use (US by default). Supported countries: us, gb, de, it, fr, ca, es, ru, jp, kr, incustom_proxy: Your own proxy URL in "http://user:password@host:port" formatdevice: Type of device emulation. Supported values: desktop, mobile, tableterror_on_404: Return error on 404 HTTP status on the target page (false by default)error_on_redirect: Return error on redirect on the target page (false by default)js_script: Custom JavaScript code to execute on the target page
The server provides robust error handling:
- Automatic retries for transient errors
 - Rate limit handling with backoff
 - Detailed error messages
 - Network resilience
 
Example error response:
{
  "content": [
    {
      "type": "text",
      "text": "API Error: 429 Too Many Requests"
    }
  ],
  "isError": true
}This server implements the Model Context Protocol, making it compatible with any MCP-enabled LLM platforms. You can configure your LLM to use these tools for web scraping tasks.
const { Claude } = require('@anthropic-ai/sdk');
const { Client } = require('@modelcontextprotocol/sdk/client/index.js');
const { StdioClientTransport } = require('@modelcontextprotocol/sdk/client/stdio.js');
const claude = new Claude({
  apiKey: process.env.ANTHROPIC_API_KEY
});
const transport = new StdioClientTransport({
  command: 'npx',
  args: ['-y', 'webscraping-ai-mcp'],
  env: {
    WEBSCRAPING_AI_API_KEY: 'your-api-key'
  }
});
const client = new Client({
  name: 'claude-client',
  version: '1.0.0'
});
await client.connect(transport);
// Now you can use Claude with WebScraping.AI tools
const tools = await client.listTools();
const response = await claude.complete({
  prompt: 'What is the main topic of example.com?',
  tools: tools
});# Clone the repository
git clone https://github.com/webscraping-ai/webscraping-ai-mcp-server.git
cd webscraping-ai-mcp-server
# Install dependencies
npm install
# Run tests
npm test
# Add your .env file
cp .env.example .env
# Start the inspector
npx @modelcontextprotocol/inspector node src/index.js- Fork the repository
 - Create your feature branch
 - Run tests: 
npm test - Submit a pull request
 
MIT License - see LICENSE file for details