Skip to content

Add --playwright-proxy option for /web using playwright with chromium #4184

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Open
wants to merge 4 commits into
base: main
Choose a base branch
from

Conversation

Zhaoyilunnn
Copy link

Hi Team,

Thanks a lot for the great work.

It seems that playwright with chromium does not use proxy be default. So this pull request introduces support for specifying a proxy server when using Playwright for web scraping. It adds a new --playwright-proxy command-line argument, allowing users to route Playwright browser traffic through a proxy. This is useful for users who need to access web content behind a firewall or want to anonymize their scraping activities (I guess this feature could be particularly useful for users in China).

Changes

• New Argument: Added --playwright-proxy to the CLI (in aider/args.py and aider/scrape.py) to allow users to specify a proxy URL for Playwright.
• Scraper Update: Updated the Scraper class (aider/scrape.py) to accept and use the proxy setting when launching Playwright browsers. If the argument is not provided, it falls back to the HTTP_PROXY/http_proxy environment variables.

The code changes are done by aider

@Copilot Copilot AI review requested due to automatic review settings June 8, 2025 04:32
@CLAassistant
Copy link

CLAassistant commented Jun 8, 2025

CLA assistant check
All committers have signed the CLA.

Copy link

@Copilot Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull Request Overview

This PR adds support for specifying a proxy server for Playwright Chromium by introducing a new --playwright-proxy argument. The changes update the CLI argument parsing in aider/args.py, propagate the proxy setting through aider/scrape.py, and wire it into the web command in aider/commands.py.

Reviewed Changes

Copilot reviewed 3 out of 3 changed files in this pull request and generated no comments.

File Description
aider/scrape.py Updated Scraper init to accept playwright_proxy; passed it to browser.new_context; and modified main() to support CLI args.
aider/commands.py Passed the playwright_proxy argument from the command line to the Scraper.
aider/args.py Added the --playwright-proxy argument to the parser with help text.
Comments suppressed due to low confidence (2)

aider/scrape.py:163

  • [nitpick] Consider adding a brief comment or enhancing the docstring to clarify the expected format for proxy URLs. This can help users avoid misconfigurations when providing the proxy parameter.
proxy_url = self.playwright_proxy or os.environ.get("HTTP_PROXY") or os.environ.get("http_proxy")

aider/scrape.py:89

  • [nitpick] Consider expanding the documentation to briefly mention the supported proxy URL formats and any potential limitations, to guide users on correct usage.
`playwright_proxy` - proxy server for Playwright Chromium (e.g., http://127.0.0.1:7890)

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants