Prompt API - Scraper API - Python Package

pa-scraper is a python wrapper for scraper api with few more extra cream and sugar.

Requirements

You need to signup for Prompt API
You need to subscribe scraper api, test drive is free!!!
You need to set PROMPTAPI_TOKEN environment variable after subscription.

then;

$ pip install pa-scraper

Example Usage

Examples can be found here.

# examples/fetch.py

from scraper import Scraper

url = 'https://pypi.org/classifiers/'
scraper = Scraper(url)
response = scraper.get()

if response.get('error', None):
    # response['error']  returns error message
    # response['status'] returns http status code

    # Example: {'error': 'Not Found', 'status': 404}
    print(response)  # noqa: T001
else:
    data = response['result']['data']
    headers = response['result']['headers']
    url = response['result']['url']
    status = response['status']

    # print(data) # print fetched html, will be long :)

    print(headers)  # noqa: T001
    # {'Content-Length': '321322', 'Content-Type': 'text/html; charset=UTF-8', ... }

    print(status)  # noqa: T001
    # 200

    save_result = scraper.save('/tmp/my-data.html')  # noqa: S108

    if save_result.get('error', None):
        # save error occured...
        # add you code here...
        pass

    print(save_result)  # noqa: T001
    # {'file': '/tmp/my-data.html', 'size': 321322}

You can add url parameters for extra operations. Valid parameters are:

auth_password: for HTTP Realm auth password
auth_username: for HTTP Realm auth username
cookie: URL Encoded cookie header.
country: 2 character country code. If you wish to scrape from an IP address of a specific country.
referer: HTTP referer header
selector: CSS style selector path such as a.btn div li. If selector is enabled, returning result will be collection of data and saved file will be in .json format.

Here is an example with using url parameters and selector:

# examples/fetch_with_params.py

from scraper import Scraper

url = 'https://pypi.org/classifiers/'
scraper = Scraper(url)

fetch_params = dict(country='EE', selector='ul li button[data-clipboard-text]')
response = scraper.get(params=fetch_params)

if response.get('error', None):
    # response['error']  returns error message
    # response['status'] returns http status code

    # Example: {'error': 'Not Found', 'status': 404}
    print(response)  # noqa: T001
else:
    data = response['result']['data']
    headers = response['result']['headers']
    url = response['result']['url']
    status = response['status']

    # print(data)  # noqa: T001
    # ['<button class="button button--small margin-top margin-bottom copy-tooltip copy-tooltip-w" ...\n', ]

    print(len(data))  # noqa: T001
    # 734
    # we have an array...

    print(headers)  # noqa: T001
    # {'Content-Length': '321322', 'Content-Type': 'text/html; charset=UTF-8', ... }

    print(status)  # noqa: T001
    # 200

    save_result = scraper.save('/tmp/my-data.json')  # noqa: S108
    if save_result.get('error', None):
        # save error occured...
        # add you code here...
        pass
    print(save_result)  # noqa: T001
    # {'file': '/tmp/my-data.json', 'size': 174449}

Default timeout value is set to 10 seconds. You can change this while initializing the instance:

scraper = Scraper(url, timeout=50)  # 50 seconds timeout...

You can also add custom headers prefixed with X-. Example below shows how to add extra request headers and set default timeout:

# pylint: disable=C0103
from scraper import Scraper

if __name__ == '__main__':

    url = 'https://pypi.org/classifiers/'
    scraper = Scraper(url)

    fetch_params = dict(country='EE', selector='ul li button[data-clipboard-text]')
    custom_headers = {
        'X-Referer': 'https://www.google.com',
        'X-User-Agent': 'Mozilla/5.0 (Windows NT 10.0; Win64; x64; rv:70.0) Gecko/20100101 Firefox/70.0',
    }
    timeout = 30
    response = scraper.get(params=fetch_params, headers=custom_headers, timeout=timeout)

    if response.get('error', None):
        # response['error']  returns error message
        # response['status'] returns http status code

        # Example: {'error': 'Not Found', 'status': 404}
        print(response)  # noqa: T001
    else:
        data = response['result']['data']
        headers = response['result']['headers']
        url = response['result']['url']
        status = response['status']

        # print(data)  # noqa: T001
        # ['<button class="button button--small margin-top margin-bottom copy-tooltip copy-tooltip-w" ...\n', ]

        print(len(data))  # noqa: T001
        # 734

        print(headers)  # noqa: T001
        # {'Content-Length': '321322', 'Content-Type': 'text/html; charset=UTF-8', ... }

        print(status)  # noqa: T001
        # 200

        save_result = scraper.save('/tmp/my-data.json')  # noqa: S108
        if save_result.get('error', None):
            # save error occured...
            # add you code here...
            pass
        print(save_result)  # noqa: T001
        # {'file': '/tmp/my-data.json', 'size': 174449}

License

This project is licensed under MIT

Contributer(s)

Prompt API - Creator, maintainer

Contribute

All PR’s are welcome!

fork (https://github.com/promptapi/scraper-py/fork)
Create your branch (git checkout -b my-feature)
commit yours (git commit -am 'Add awesome features...')
push your branch (git push origin my-feature)
Than create a new Pull Request!

This project is intended to be a safe, welcoming space for collaboration, and contributors are expected to adhere to the code of conduct.

Name		Name	Last commit message	Last commit date
Latest commit History 26 Commits
examples		examples
src/scraper		src/scraper
tests		tests
.bumpversion.cfg		.bumpversion.cfg
.flake8		.flake8
.gitignore		.gitignore
.isort.cfg		.isort.cfg
.pylintrc		.pylintrc
.travis.yml		.travis.yml
CODE_OF_CONDUCT.md		CODE_OF_CONDUCT.md
LICENCE		LICENCE
README.md		README.md
Rakefile		Rakefile
pyproject.toml		pyproject.toml
requirements-lint.txt		requirements-lint.txt
setup.py		setup.py

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Repository files navigation

Prompt API - Scraper API - Python Package

Requirements

Example Usage

License

Contributer(s)

Contribute

About

Uh oh!

Releases

Packages

Languages

License

promptapi/scraper-py

Folders and files

Latest commit

History

Repository files navigation

Prompt API - Scraper API - Python Package

Requirements

Example Usage

License

Contributer(s)

Contribute

About

Topics

Resources

License

Code of conduct

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages