PromptAPI struct is a simple golang wrapper for scraper api
with few more extra cream and sugar.
- You need to signup for Prompt API
- You need to subscribe scraper api, test drive is free!!!
- You need to set PROMPTAPI_TOKENenvironment variable after subscription.
then;
$ go get -u github.com/promptapi/scraper-go// main.go
package main
import (
	"fmt"
	"log"
	scraper "github.com/promptapi/scraper-go"
)
func main() {
	s := new(scraper.PromptAPI)
	params := &scraper.Params{
		URL:      "https://pypi.org/classifiers/",
		Country:  "EE",
	}
	extraHeaders := []*ExtraHeader{} // custom extra headers
	result := new(scraper.Result)
	err := s.Scrape(params, extraHeaders, result)
	if err != nil {
		log.Fatal(err)
	}
	fmt.Printf("Length of incoming data: %d\n", len(result.Data))
	fmt.Printf("Response headers: %v\n", result.Headers)
	fmt.Printf("Content-Length: %v\n", result.Headers["Content-Length"])
	fileSize, err := s.Save("/tmp/test.html", result)
	if err != nil {
		log.Fatal(err)
	}
	fmt.Printf("Size of /tmp/test.html -> %d bytes\n", fileSize)
}Run:
$ go run main.go 
Length of incoming data: 321322
Response headers: map[Accept-Ranges:bytes Content-Length:321322 Content-Security-Policy:base-uri 'self'; block-all-mixed-content; connect-src 'self' https://api.github.com/repos/ *.fastly-insights.com sentry.io https://api.pwnedpasswords.com https://2p66nmmycsj3.statuspage.io; default-src 'none'; font-src 'self' fonts.gstatic.com; form-action 'self'; frame-ancestors 'none'; frame-src 'none'; img-src 'self' https://warehouse-camo.ingress.cmh1.psfhosted.org/ www.google-analytics.com *.fastly-insights.com; script-src 'self' www.googletagmanager.com www.google-analytics.com *.fastly-insights.com https://cdn.ravenjs.com; style-src 'self' fonts.googleapis.com; worker-src *.fastly-insights.com Content-Type:text/html; charset=UTF-8 Date:Tue, 08 Sep 2020 19:10:24 GMT ETag:"1ea9p+Hscl37dEKelacPWw" Referrer-Policy:origin-when-cross-origin Strict-Transport-Security:max-age=31536000; includeSubDomains; preload Vary:Accept-Encoding, Cookie, Accept-Encoding X-Cache:MISS, HIT X-Cache-Hits:0, 1 X-Content-Type-Options:nosniff X-Frame-Options:deny X-Permitted-Cross-Domain-Policies:none X-Served-By:cache-bwi5127-BWI, cache-hhn4035-HHN X-Timer:S1599592224.395422,VS0,VE247 X-XSS-Protection:1; mode=block]
Content-Length: 321322
Size of /tmp/test.html -> 321322 bytesYou can add url parameters for extra operations. Valid parameters are:
- AuthPassword: for HTTP Realm auth password
- AuthUsername: for HTTP Realm auth username
- Cookie: URL Encoded cookie header.
- Country: 2 character country code. If you wish to scrape from an IP address of a specific country.
- Referer: HTTP referer header
- Selector: CSS style selector path such as- a.btn div li. If- Selectoris enabled, returning result will be collection of data and saved file will be in- .jsonformat.
Example with Selector:
// main.go
package main
import (
	"fmt"
	"log"
	scraper "github.com/promptapi/scraper-go"
)
func main() {
	s := new(scraper.PromptAPI)
	params := &scraper.Params{
		URL:      "https://pypi.org/classifiers/",
		Country:  "EE",
		Selector: "ul li button[data-clipboard-text]",
	}
	// add extra request headers
	extraHeaders := []*ExtraHeader{
		&ExtraHeader{
			name:  "X-Referer",
			value: "https://www.google.com",
		},
	}
	result := new(scraper.Result)
	err := s.Scrape(params, extraHeaders, result)
	if err != nil {
		log.Fatal(err)
	}
	fmt.Printf("Length of incoming data: %d\n", len(result.Data))
	fmt.Printf("Length of extracted data: %d\n", len(result.DataSelector))
	fmt.Printf("Response headers: %v\n", result.Headers)
	fmt.Printf("Content-Length: %v\n", result.Headers["Content-Length"])
	fileSize, err := s.Save("/tmp/test.json", result)
	if err != nil {
		log.Fatal(err)
	}
	fmt.Printf("Size of /tmp/test.json -> %d bytes\n", fileSize)
}Run:
$ go run main.go 
Length of incoming data: 0
Length of extracted data: 734
Response headers: map[Accept-Ranges:bytes Content-Length:321322 Content-Security-Policy:base-uri 'self'; block-all-mixed-content; connect-src 'self' https://api.github.com/repos/ *.fastly-insights.com sentry.io https://api.pwnedpasswords.com https://2p66nmmycsj3.statuspage.io; default-src 'none'; font-src 'self' fonts.gstatic.com; form-action 'self'; frame-ancestors 'none'; frame-src 'none'; img-src 'self' https://warehouse-camo.ingress.cmh1.psfhosted.org/ www.google-analytics.com *.fastly-insights.com; script-src 'self' www.googletagmanager.com www.google-analytics.com *.fastly-insights.com https://cdn.ravenjs.com; style-src 'self' fonts.googleapis.com; worker-src *.fastly-insights.com Content-Type:text/html; charset=UTF-8 Date:Tue, 08 Sep 2020 19:17:22 GMT ETag:"1ea9p+Hscl37dEKelacPWw" Referrer-Policy:origin-when-cross-origin Strict-Transport-Security:max-age=31536000; includeSubDomains; preload Vary:Accept-Encoding, Cookie, Accept-Encoding X-Cache:HIT, HIT X-Cache-Hits:1, 1 X-Content-Type-Options:nosniff X-Frame-Options:deny X-Permitted-Cross-Domain-Policies:none X-Served-By:cache-bwi5137-BWI, cache-bma1621-BMA X-Timer:S1599592641.178639,VS0,VE1512 X-XSS-Protection:1; mode=block]
Content-Length: 321322
Size of /tmp/test.json -> 173717 bytesLet’s see /tmp/test.json file:
[
  "<button class=\"button button--small margin-top margin-bottom copy-tooltip copy-tooltip-w\" data-clipboard-text=\"Development Status :: 1 - Planning\" data-tooltip-label=\"Copy to clipboard\" type=\"button\">\n Copy\n</button>\n",
  "<button class=\"button button--small margin-top margin-bottom copy-tooltip copy-tooltip-w\" data-clipboard-text=\"Development Status :: 2 - Pre-Alpha\" data-tooltip-label=\"Copy to clipboard\" type=\"button\">\n Copy\n</button>\n",
  "<button class=\"button button--small margin-top margin-bottom copy-tooltip copy-tooltip-w\" data-clipboard-text=\"Development Status :: 3 - Alpha\" data-tooltip-label=\"Copy to clipboard\" type=\"button\">\n Copy\n</button>\n",
  "<button class=\"button button--small margin-top margin-bottom copy-tooltip copy-tooltip-w\" data-clipboard-text=\"Development Status :: 4 - Beta\" data-tooltip-label=\"Copy to clipboard\" type=\"button\">\n Copy\n</button>\n",
  "<button class=\"button button--small margin-top margin-bottom copy-tooltip copy-tooltip-w\" data-clipboard-text=\"Development Status :: 5 - Production/Stable\" data-tooltip-label=\"Copy to clipboard\" type=\"button\">\n Copy\n</button>\n",
  "<button class=\"button button--small margin-top margin-bottom copy-tooltip copy-tooltip-w\" data-clipboard-text=\"Development Status :: 6 - Mature\" data-tooltip-label=\"Copy to clipboard\" type=\"button\">\n Copy\n</button>\n",
  "<button class=\"button button--small margin-top margin-bottom copy-tooltip copy-tooltip-w\" data-clipboard-text=\"Development Status :: 7 - Inactive\" data-tooltip-label=\"Copy to clipboard\" type=\"button\">\n Copy\n</button>\n",
  ,
  ,
  ,
  ,
  ,
]Available rake tasks:
$ rake -T
rake default                    # Default task, show avaliable tasks
rake release:check              # Do release check
rake release:publish[revision]  # Publish project with revision: major,minor,patch, default: patch
rake serve_doc[port]            # Run doc server
rake test[verbose]              # Run tests- Run tests: rake testorrake test[-v]
- Run doc server: rake serve_docorrake serve_doc[9000]
Release package (if you have write access):
- Commit your changes
- Run rake release:check
- If all goes ok, run rake release:publish
This project is licensed under MIT
- Prompt API - Creator, maintainer
All PR’s are welcome!
- fork(https://github.com/promptapi/scraper-go/fork)
- Create your branch(git checkout -b my-feature)
- commityours (- git commit -am 'Add awesome features...')
- pushyour- branch(- git push origin my-feature)
- Than create a new Pull Request!
This project is intended to be a safe, welcoming space for collaboration, and contributors are expected to adhere to the code of conduct.