diff --git a/README.md b/README.md index 94305aa..0edfd13 100644 --- a/README.md +++ b/README.md @@ -1,5 +1,90 @@ -*SYNOPSIS* -1. This script reads urls from 'pagespeed.txt' file. Load this file with full URLS. -2. Queries each url with the google pagespeed api. -3. Filters JSON results to only include desired metrics. -4. Metrics are saved to local .csv spreadsheet for analysis. +# Google Pagespeed API Bulk Query + +This Python3 script queries Google's PageSpeed Insights for a list of URLs, then prints selected results and saves to CSV. + +You can specify whether to test for Desktop or Mobile (it defaults to mobile). It is set to select only the performance Score, First Contentful Paint, and First Interactive values. You can easily change that. + +## Install + +This program requires Python 3. Assuming you have it, simply git clone or download this project and then run it from the command line. + +## Use + +### Setup + +List all the URLs on a single line in a txt file named `pagespeed.txt`. Assuming you're analyzing a single large website, your `sitemap.xml` is a good place to get each URL you want the search engines to care about. + +To avoid running afoul of Google's API rate limits, get an [API key from Google](https://console.developers.google.com/apis/credentials). + +Best practice is to add the key to your bash profile if you're on Mac or Linux. For example: + +```bash + $ nano ~/.bash_profile + ``` + and then add the following line: + ``` + export SPD_API_KEY=YOUR_API_KEY + ``` +Restart your terminal after you save it. + +If you're not a naturally paranoid person, you're not sharing this program, and you're not committing it to any repositories, you can just put the key directly into `pagespeed-api.py` as `SPD_API_KEY`. This is a bad practice and I don't recommend it. + +### Running it + +From the project root directory, to get Mobile results: +``` + $ python3 pagespeed-api.py +``` +``` + $ python3 pagespeed-api.py mobile +``` +To get Desktop results: +``` + $ python3 pagespeed-api.py desktop +``` + +You will have something like the following printed to your screen: +``` +Requesting https://www.googleapis.com/pagespeedonline/v5/runPagespeed?url=https://www.example.com&strategy=mobile&key=YOUR_API_KEY... +Requesting https://www.googleapis.com/pagespeedonline/v5/runPagespeed?url=https://www.example.com&strategy=mobile&key=YOUR_API_KEY... +URL ~ https://www.example.com/ +Score ~ 1.0 +First Contentful Paint ~ 0.8 s +First Interactive ~ 0.8 s +URL ~ https://www.example.com/ +Score ~ 1.0 +First Contentful Paint ~ 0.8 s +First Interactive ~ 0.8 s +``` +And you should have a file named `pagespeed-results-mobile-2019-08-21_23:33:59.csv` saved to the "results" directory. It will look like: + +``` +URL, Score, First Contentful Paint, First Interactive +https://www.example.com/,1.0,0.8 s,0.8 s +https://www.example.com/,1.0,0.8 s,0.8 s +https://www.example.com/,1.0,0.8 s,0.8 s +https://www.example.com/,1.0,0.8 s,0.8 s +https://www.example.com/,1.0,0.8 s,0.8 s +https://www.example.com/,1.0,0.8 s,0.8 s +https://www.example.com/,1.0,0.8 s,0.8 s +https://www.example.com/,1.0,0.8 s,0.8 s +https://www.example.com/,1.0,0.8 s,0.8 s +https://www.example.com/,1.0,0.8 s,0.8 s +``` + +## Credit / References + +This is a fork of [ibebeebz pagespeed project](https://github.com/ibebeebz/google-pagespeed-api-script). Many thanks to ibebeebz! + +### References + +These were helpful to me today: + +- [Guide to concurrency and parallelism](https://toptal.com/python/beginners-guide-to-concurrency-and-parallelism-in-python) from Toptal that really helped me. +- Google's [PageSpeed API docs](https://developers.google.com/speed/docs/insights/v5/get-started) + +### Fork differences + +The main reason I forked this project was because it was taking quite a while to query hundreds of pages, and I wanted to do it several times a day for mobile and desktop. + +So I added multithreading (most of the time spent is just waiting on Google's response), the ability to specify device, and stamping the csv output so it's unique. \ No newline at end of file diff --git a/pagespeed-api.py b/pagespeed-api.py index f8efefe..b68ae86 100644 --- a/pagespeed-api.py +++ b/pagespeed-api.py @@ -1,54 +1,73 @@ import requests +import sys +import os +from time import localtime, strftime +from concurrent.futures import ThreadPoolExecutor +from pathlib import Path -# Documentation: https://developers.google.com/speed/docs/insights/v5/get-started - -# JSON paths: https://developers.google.com/speed/docs/insights/v4/reference/pagespeedapi/runpagespeed +# Set in your bash profile, get from Google: https://console.developers.google.com/apis/credentials +SPD_API_KEY = os.environ.get('SPD_API_KEY') -# Populate 'pagespeed.txt' file with URLs to query against API. -with open('pagespeed.txt') as pagespeedurls: - download_dir = 'pagespeed-results.csv' - file = open(download_dir, 'w') - content = pagespeedurls.readlines() - content = [line.rstrip('\n') for line in content] - - columnTitleRow = "URL, First Contentful Paint, First Interactive\n" - file.write(columnTitleRow) +# Documentation: https://developers.google.com/speed/docs/insights/v5/get-started +def main(strategy="mobile"): + try: + strategy = sys.argv[1] + except IndexError: + print("You can pass 'mobile' or 'desktop' as parameter. Running mobile by default.") + # Pull URLS from 'pagespeed.txt' to query against API. + with open('pagespeed.txt') as pagespeedurls: + stamp = strftime("%Y-%m-%d_at_%H.%M.%S", localtime()) + csv_out = Path("results/") + download_dir = csv_out / f'{strategy}-{stamp}.csv' + file = open(download_dir, 'w') + content = pagespeedurls.readlines() + content = [line.rstrip('\n') for line in content] + columnTitleRow = "URL, Score, First Contentful Paint, First Interactive\n" # CSV header + file.write(columnTitleRow) - # This is the google pagespeed api url structure, using for loop to insert each url in .txt file - for line in content: - # If no "strategy" parameter is included, the query by default returns desktop data. - x = f'https://www.googleapis.com/pagespeedonline/v5/runPagespeed?url={line}&strategy=mobile' - print(f'Requesting {x}...') - r = requests.get(x) - final = r.json() - - try: - urlid = final['id'] - split = urlid.split('?') # This splits the absolute url from the api key parameter - urlid = split[0] # This reassigns urlid to the absolute url - ID = f'URL ~ {urlid}' - ID2 = str(urlid) - urlfcp = final['lighthouseResult']['audits']['first-contentful-paint']['displayValue'] - FCP = f'First Contentful Paint ~ {str(urlfcp)}' - FCP2 = str(urlfcp) - urlfi = final['lighthouseResult']['audits']['interactive']['displayValue'] - FI = f'First Interactive ~ {str(urlfi)}' - FI2 = str(urlfi) - except KeyError: - print(f' One or more keys not found {line}.') - - try: - row = f'{ID2},{FCP2},{FI2}\n' - file.write(row) - except NameError: - print(f' Failing because of KeyError {line}.') - file.write(f' & Failing because of nonexistant Key ~ {line}.' + '\n') - - try: - print(ID) - print(FCP) - print(FI) - except NameError: - print(f' Failing because of KeyError {line}.') + def get_speed(line): + # Query API. + x = f'https://www.googleapis.com/pagespeedonline/v5/runPagespeed?url={line}&strategy={strategy}&key={SPD_API_KEY}' + print(f'Requesting {x}...') + r = requests.get(x) + final = r.json() + + try: + urlid = final['id'] + split = urlid.split('?') # This splits the absolute url from the api key parameter + urlid = split[0] # This reassigns urlid to the absolute url + ID = f'URL ~ {urlid}' + ID2 = str(urlid) + # JSON paths: https://developers.google.com/speed/docs/insights/v4/reference/pagespeedapi/runpagespeed + urlfcp = final['lighthouseResult']['audits']['first-contentful-paint']['displayValue'] + FCP = f'First Contentful Paint ~ {str(urlfcp)}' + FCP2 = str(urlfcp[:-2]) + urlfi = final['lighthouseResult']['audits']['interactive']['displayValue'] + FI = f'First Interactive ~ {str(urlfi)}' + FI2 = str(urlfi[:-2]) + urlscore = final['lighthouseResult']['categories']['performance']['score'] + SC = f'Score ~ {str(urlscore)}' + SC2 = str(urlscore) + except KeyError: + print(f' One or more keys not found {line}.') + + try: + row = f'{ID2},{SC2},{FCP2},{FI2}\n' + file.write(row) + except NameError: + print(f' Failing because of KeyError {line}.') + file.write(f' & Failing because of nonexistant Key ~ {line}.' + '\n') + + try: + print(ID) + print(SC) + print(FCP) + print(FI) + except NameError: + print(f' Failing because of KeyError {line}.') + with ThreadPoolExecutor() as executor: # Make multithreaded, 5x your processors by default + executor.map(get_speed, content) - file.close() + file.close() +if __name__ == '__main__': + main() diff --git a/pagespeed.txt b/pagespeed.txt index 101dd77..07f917a 100644 --- a/pagespeed.txt +++ b/pagespeed.txt @@ -1 +1,10 @@ -https://stores.uscellular.com \ No newline at end of file +https://www.example.com +https://www.example.com +https://www.example.com +https://www.example.com +https://www.example.com +https://www.example.com +https://www.example.com +https://www.example.com +https://www.example.com +https://www.example.com \ No newline at end of file diff --git a/results/pagespeed-results-desktop-2019-08-22_00:20:38.csv b/results/pagespeed-results-desktop-2019-08-22_00:20:38.csv new file mode 100644 index 0000000..7d494a3 --- /dev/null +++ b/results/pagespeed-results-desktop-2019-08-22_00:20:38.csv @@ -0,0 +1,11 @@ +URL, Score, First Contentful Paint, First Interactive +https://www.example.com/,1.0,0.2 s,0.2 s +https://www.example.com/,1.0,0.2 s,0.2 s +https://www.example.com/,1.0,0.2 s,0.2 s +https://www.example.com/,1.0,0.2 s,0.2 s +https://www.example.com/,1.0,0.2 s,0.2 s +https://www.example.com/,1.0,0.2 s,0.2 s +https://www.example.com/,1.0,0.2 s,0.2 s +https://www.example.com/,1.0,0.2 s,0.2 s +https://www.example.com/,1.0,0.2 s,0.2 s +https://www.example.com/,1.0,0.2 s,0.2 s diff --git a/results/pagespeed-results-mobile-2019-08-22_00:18:16.csv b/results/pagespeed-results-mobile-2019-08-22_00:18:16.csv new file mode 100644 index 0000000..27da00e --- /dev/null +++ b/results/pagespeed-results-mobile-2019-08-22_00:18:16.csv @@ -0,0 +1,11 @@ +URL, Score, First Contentful Paint, First Interactive + & Failing because of nonexistant Key ~ https://www.example.com. + & Failing because of nonexistant Key ~ https://www.example.com. + & Failing because of nonexistant Key ~ https://www.example.com. + & Failing because of nonexistant Key ~ https://www.example.com. + & Failing because of nonexistant Key ~ https://www.example.com. + & Failing because of nonexistant Key ~ https://www.example.com. + & Failing because of nonexistant Key ~ https://www.example.com. + & Failing because of nonexistant Key ~ https://www.example.com. + & Failing because of nonexistant Key ~ https://www.example.com. + & Failing because of nonexistant Key ~ https://www.example.com. diff --git a/results/pagespeed-results-mobile-2019-08-22_00:19:56.csv b/results/pagespeed-results-mobile-2019-08-22_00:19:56.csv new file mode 100644 index 0000000..3370668 --- /dev/null +++ b/results/pagespeed-results-mobile-2019-08-22_00:19:56.csv @@ -0,0 +1,11 @@ +URL, Score, First Contentful Paint, First Interactive +https://www.example.com/,1.0,0.8 s,0.8 s +https://www.example.com/,1.0,0.8 s,0.8 s +https://www.example.com/,1.0,0.8 s,0.8 s +https://www.example.com/,1.0,0.8 s,0.8 s +https://www.example.com/,1.0,0.8 s,0.8 s +https://www.example.com/,1.0,0.8 s,0.8 s +https://www.example.com/,1.0,0.8 s,0.8 s +https://www.example.com/,1.0,0.8 s,0.8 s +https://www.example.com/,1.0,0.8 s,0.8 s +https://www.example.com/,1.0,0.8 s,0.8 s