scraper_rb is a simple python wrapper for scraper-api.
- You need to signup for Prompt API
- You need to subscribe scraper-api, test drive is free!!!
- You need to set
PROMPTAPI_TOKENenvironment variable after subscription.
then;
$ gem install scraper_rbor; install from GitHub:
$ gem install scraper_rb --version "0.1.2" --source "https://rubygems.pkg.github.com/promptapi"Basic scraper:
require "scraper_rb"
s = ScraperRb.new('https://pypi.org/classifiers/') # no params
s.get
s.response
# {
# :headers=>{:"Content-Length"=>...},
# :url=>"https://pypi.org/classifiers/",
# :data=>"<!DOCTYPE html>\n<html> ...",
# }
s.response[:headers] # => return response headers
s.response[:data] # => return scraped html
s.save('/tmp/data.html') # => {:file=>"/tmp/data.html", :size=>321322}
# or
save_result = s.save('/tmp/data.html')
puts save_result[:error] if save_result.key?(:error) # we have a file errorYou can add url parameters for extra operations. Valid parameters are:
auth_password: for HTTP Realm auth passwordauth_username: for HTTP Realm auth usernamecookie: URL Encoded cookie header.country: 2 character country code. If you wish to scrape from an IP address of a specific country.referer: HTTP referer headerselector: CSS style selector path such asa.btn div li. Ifselectoris enabled, returning result will be collection of data and saved file will be in.jsonformat.
Here is an example with using url parameters and selector:
require "scraper_rb"
params = {country: 'EE', selector: 'ul li button[data-clipboard-text]'}
s = ScraperRb.new('https://pypi.org/classifiers/', params)
s.get
s.response[:headers] # => return response headers
s.response[:data] # => return an array, collection of given selector
s.response[:data].length # => 734
s.save('/tmp/test.json') # => {:file=>"/tmp/test.json", :size=>174449}
# or
save_result = s.save('/tmp/test.json')
puts save_result[:error] if save_result.key?(:error) # we have a file errorDefault timeout value is set to 10 seconds. You can change this while
initializing the instance:
s = ScraperRb.new('https://pypi.org/classifiers/', params={}, timeout=50)
# => 50 seconds timeout w/o params
s = ScraperRb.new('https://pypi.org/classifiers/', params={country: 'EE'}, timeout=50)
# => 50 seconds timeoutYou can add extra X- headers:
s = ScraperRb.new('https://pypi.org/classifiers/', headers={'X-Referer': 'https://www.google.com'})
# or
s = ScraperRb.new('https://pypi.org/classifiers/', params={country: 'EE'}, headers={'X-Referer': 'https://www.google.com'}, timeout=50)
# => 50 seconds timeoutheaders param is a Hash, you can add key/value data. Header keys must star
with X- prefix. More detail can found at Mozilla site.
After checking out the repo, run bin/setup to install dependencies. Then,
run rake test to run the tests. You can also run bin/console for an
interactive prompt that will allow you to experiment.
To install this gem onto your local machine, run bundle exec rake install.
To release a new version, update the version number in version.rb, and then
run bundle exec rake release, which will create a git tag for the version,
push git commits and tags, and push the .gem file to
rubygems.org
$ rake -T
rake build # Build bin_checker_rb-X.X.X.gem into the pkg directory
rake clean # Remove any temporary products
rake clobber # Remove any generated files
rake install # Build and install bin_checker_rb-X.X.X.gem into system gems
rake install:local # Build and install bin_checker_rb-X.X.X.gem into system gems without network access
rake release[remote] # Create tag v0.0.0 and build and push bin_checker_rb-X.X.X.gem to rubygems.org
rake test # Run tests- If you have
PROMPTAPI_TOKENyou’ll have real http request based tests available. - Set
RUBY_DEVELOPMENTto1for more verbose test results
This project is licensed under MIT
- Prompt API - Creator, maintainer
Bug reports and pull requests are welcome on GitHub:
fork(https://github.com/promptapi/scraper_rb/fork)- Create your
branch(git checkout -b my-feature) commityours (git commit -am 'Add awesome features...')pushyourbranch(git push origin my-feature)- Than create a new Pull Request!
This project is intended to be a safe, welcoming space for collaboration, and contributors are expected to adhere to the code of conduct.