PS Helper is a Python package developed by the Professional Services team.
It provides a set of helper libraries and command-line tools to speed up and standardize development workflows.
- Ready-to-use Python utilities for internal projects.
- CLI commands for common tasks (e.g., creating repository templates).
- Easy to install and extend.
You can install PS Helper in two ways:
Clone the repository and install it with pip
:
git clone https://github.com/bitmakerla/ps-helper.git
cd ps-helper
pip install -e .
This will install the package in editable mode, so any code changes will be reflected immediately.
You can install the package without cloning:
pip install git+https://github.com/bitmakerla/ps-helper.git
Check available commands:
ps-helper --help
Create a new project from the template:
ps-helper create-repo-template MyProject
Generate beautiful HTML reports from Scrapy metrics JSON files:
ps-helper create-report scrapy_stats.json
This will automatically create a report named scrapy_stats-report.html in the same directory as your metrics file.
Block unwanted URLs in your Scrapy projects with intelligent filtering.
- Add to your Scrapy project's
settings.py
:
DOWNLOADER_MIDDLEWARES = {
'ps_helper.url_blocker.URLBlockerMiddleware': 585,
}
# Configure words to block
URL_BLOCKER_WORDS = ['admin', 'login', '.css', '.js', 'api/']
URL_BLOCKER_MODE = 'partial' # or 'strict'
- Run your spider - unwanted URLs will be automatically filtered!
Blocks URLs containing the word as a substring:
URL_BLOCKER_MODE = 'partial'
URL_BLOCKER_WORDS = ['auth']
# Results:
# ❌ BLOCKED: site.com/authentication (contains 'auth')
# ❌ BLOCKED: site.com/auth (contains 'auth')
Blocks only exact word matches in URL components:
URL_BLOCKER_MODE = 'strict'
URL_BLOCKER_WORDS = ['auth']
# Results:
# ✅ ALLOWED: site.com/authentication ('auth' ≠ 'authentication')
# ❌ BLOCKED: site.com/auth ('auth' = 'auth')
# Required
URL_BLOCKER_WORDS = ['admin', 'login', '.pdf', 'tracking']
# Optional (with defaults)
URL_BLOCKER_MODE = 'partial' # 'partial' or 'strict'
URL_BLOCKER_CASE_SENSITIVE = False # Case sensitivity
URL_BLOCKER_LOG_BLOCKED = True # Show blocked URLs in logs