From d98bc3b2249fc942418ef1e08d663d26b07c36f1 Mon Sep 17 00:00:00 2001
From: Damian <d@damianstewart.com>
Date: Wed, 15 Oct 2025 14:39:54 +0200
Subject: [PATCH 01/22] polish cdx_toolkit example

---
 Makefile | 8 ++++----
 1 file changed, 4 insertions(+), 4 deletions(-)

diff --git a/Makefile b/Makefile
index 8d6d7a4..3e58cb7 100644
--- a/Makefile
+++ b/Makefile
@@ -36,12 +36,12 @@ extract:
 	@echo "hint: python -m json.tool extraction.json"
 
 cdx_toolkit:
-	@echo look up this capture in the comoncrawl cdx index
-	#cdxt --cc --from 20240518015810 --to 20240518015810 iter an.wikipedia.org/wiki/Escopete
-	cdxt --limit 1 --crawl CC-MAIN-2024-22 --from 20240518015810 --to 20240518015810 iter an.wikipedia.org/wiki/Escopete
+	@echo look up this capture in the comoncrawl cdx index for CC-MAIN-2024-22, returning only the first match
+	cdxt --limit 1 --crawl CC-MAIN-2024-22 iter an.wikipedia.org/wiki/Escopete
 	@echo
-	@echo extract the content from the commoncrawl s3 bucket
+	@echo cleanup previous work
 	rm -f TEST-000000.extracted.warc.gz
+	@echo extract the content from the commoncrawl s3 bucket, using the timestamp from above
 	cdxt --limit 1 --crawl CC-MAIN-2024-22 --from 20240518015810 --to 20240518015810 warc an.wikipedia.org/wiki/Escopete
 	@echo
 	@echo index this new warc

From 5621551035c38d16869a7eade1fba02f3acbdd95 Mon Sep 17 00:00:00 2001
From: Damian <d@damianstewart.com>
Date: Wed, 15 Oct 2025 14:51:32 +0200
Subject: [PATCH 02/22] wip edits

---
 README.md | 28 +++++++++++++++-------------
 1 file changed, 15 insertions(+), 13 deletions(-)

diff --git a/README.md b/README.md
index d6d1d70..d3dd959 100644
--- a/README.md
+++ b/README.md
@@ -87,15 +87,15 @@ You'll see four records total, with the start of each record marked with the hea
 
 ### WET
 
-WET (WARC Encapsulated Text) files only contain the body text of web pages extracted from the HTML and exclude any HTML code, images, or other media. This makes them useful for text analysis and natural language processing (NLP) tasks.
+WET (WARC Encapsulated Text) files only contain the body text of web pages parsed from the HTML and exclude any HTML code, images, or other media. This makes them useful for text analysis and natural language processing (NLP) tasks.
 
 Open `whirlwind.warc.wet`: this is the WET derived from our original WARC. We can see that it's still in WARC format with two records: 
 1) a `warcinfo` record.
-2) a `conversion` record: the extracted text with the HTTP headers removed.
+2) a `conversion` record: the parsed text with HTTP headers removed.
 
 ### WAT
 
-WAT (Web ARChive Timestamp) files contain metadata associated with the crawled web pages (e.g. parsed data from the HTTP response headers, links extracted from HTML pages, server response codes etc.). They are useful for analysis that requires understanding the structure of the web.
+WAT (Web ARChive Timestamp) files contain metadata associated with the crawled web pages (e.g. parsed data from the HTTP response headers, links recovered from HTML pages, server response codes etc.). They are useful for analysis that requires understanding the structure of the web.
 
 Open `whirlwind.warc.wat`: this is the WAT derived from our original WARC. Like the WET file, it's also in WARC format. It contains two records:
 1) a `warcinfo` record.
@@ -217,9 +217,9 @@ For each of these records, there's one text line in the index - yes, it's a flat
 
 What is the purpose of this funky format? It's done this way because these flat files (300 gigabytes total per crawl) can be sorted on the primary key using any out-of-core sort utility e.g. the standard Linux `sort`, or one of the Hadoop-based out-of-core sort functions.
 
-The JSON blob has enough information to extract individual records: it says which warc file the record is in, and the offset and length of the record. We'll use that in the next section.
+The JSON blob has enough information to cleanly isolate the raw data of a single record: it defines which WARC file the record is in, and the byte offset and length of the record within this file. We'll use that in the next section.
 
-## Task 4: Use the CDXJ index to extract raw content from the local WARC, WET, and WAT 
+## Task 4: Use the CDXJ index to extract a subset of raw content from the local WARC, WET, and WAT 
 
 Normally, compressed files aren't random access. However, the WARC files use a trick to make this possible, which is that every record needs to be separately compressed. The `gzip` compression utility supports this, but it's rarely used.
 
@@ -350,21 +350,23 @@ The output looks like this:
   <summary>Click to view output</summary>
 
 ```
-look up this capture in the comoncrawl cdx index for CC-MAIN-2024-22, returning only the first match:
-cdxt --limit 1 --crawl CC-MAIN-2024-22 --from 20240518015810 --to 20240518015810 iter an.wikipedia.org/wiki/Escopete
+look up this capture in the comoncrawl cdx index for CC-MAIN-2024-22, returning only the first match
+$ cdxt --limit 1 --crawl CC-MAIN-2024-22 iter an.wikipedia.org/wiki/Escopete
 status 200, timestamp 20240518015810, url https://an.wikipedia.org/wiki/Escopete
 
-extract the content from the commoncrawl s3 bucket
-rm -f TEST-000000.extracted.warc.gz
-cdxt --cc --from 20240518015810 --to 20240518015810 warc an.wikipedia.org/wiki/Escopete
+cleanup previous work, if any
+$ rm -f TEST-000000.extracted.warc.gz
+retrieve the content from the commoncrawl s3 bucket, restricting to the timestamp we were given above
+$ cdxt --cc --from 20240518015810 --to 20240518015810 warc an.wikipedia.org/wiki/Escopete
+data is written to TEST-<n>.extracted.warc.gz where <n> starts at 000000 and counts upward if a file already exists at 000000
 
 index this new warc
-cdxj-indexer TEST-000000.extracted.warc.gz  > TEST-000000.extracted.warc.cdxj
-cat TEST-000000.extracted.warc.cdxj
+$ cdxj-indexer TEST-000000.extracted.warc.gz  > TEST-000000.extracted.warc.cdxj
+$ cat TEST-000000.extracted.warc.cdxj
 org,wikipedia,an)/wiki/escopete 20240518015810 {"url": "https://an.wikipedia.org/wiki/Escopete", "mime": "text/html", "status": "200", "digest": "sha1:RY7PLBUFQNI2FFV5FTUQK72W6SNPXLQU", "length": "17455", "offset": "379", "filename": "TEST-000000.extracted.warc.gz"}
 
 iterate this new warc
-python ./warcio-iterator.py TEST-000000.extracted.warc.gz
+$ python ./warcio-iterator.py TEST-000000.extracted.warc.gz
   WARC-Type: warcinfo
   WARC-Type: response
     WARC-Target-URI https://an.wikipedia.org/wiki/Escopete

From 1354bddc534dd601e1231103fd7a2c46a84bd51b Mon Sep 17 00:00:00 2001
From: Damian Stewart <ot@damianstewart.com>
Date: Wed, 15 Oct 2025 14:51:37 +0200
Subject: [PATCH 03/22] wip edits 2

---
 CC-MAIN-2024-22.warc.paths.gz      | Bin 817 -> 844 bytes
 README.md                          |  11 +-
 notebooks/warcio_experiments.ipynb | 923 +++++++++++++++++++++++++++++
 3 files changed, 929 insertions(+), 5 deletions(-)
 create mode 100644 notebooks/warcio_experiments.ipynb

diff --git a/CC-MAIN-2024-22.warc.paths.gz b/CC-MAIN-2024-22.warc.paths.gz
index 0ff536d75299e54bb5edb342fa040d3a0743fadb..4099c937498315ba83082f995ada7387e218c4db 100644
GIT binary patch
delta 46
zcmdnUc7{z=zMF&NMgH>)24-hxU0+8}KV2gOBNJUCBfav(qGY{-#FC6+hK*e6%m7J#
B4SoOs

delta 19
YcmX@ZwvmlXzMF#q1elmNs;V;s04W*+O8@`>

diff --git a/README.md b/README.md
index d6d1d70..d64c668 100644
--- a/README.md
+++ b/README.md
@@ -350,18 +350,19 @@ The output looks like this:
   <summary>Click to view output</summary>
 
 ```
-look up this capture in the comoncrawl cdx index for CC-MAIN-2024-22, returning only the first match:
-cdxt --limit 1 --crawl CC-MAIN-2024-22 --from 20240518015810 --to 20240518015810 iter an.wikipedia.org/wiki/Escopete
+look up this capture in the comoncrawl cdx index for CC-MAIN-2024-22, returning only the first match
+cdxt --limit 1 --crawl CC-MAIN-2024-22 iter an.wikipedia.org/wiki/Escopete
 status 200, timestamp 20240518015810, url https://an.wikipedia.org/wiki/Escopete
 
-extract the content from the commoncrawl s3 bucket
+cleanup previous work
 rm -f TEST-000000.extracted.warc.gz
-cdxt --cc --from 20240518015810 --to 20240518015810 warc an.wikipedia.org/wiki/Escopete
+extract the content from the commoncrawl s3 bucket, using the timestamp from above
+cdxt --limit 1 --crawl CC-MAIN-2024-22 --from 20240518015810 --to 20240518015810 warc an.wikipedia.org/wiki/Escopete
 
 index this new warc
 cdxj-indexer TEST-000000.extracted.warc.gz  > TEST-000000.extracted.warc.cdxj
 cat TEST-000000.extracted.warc.cdxj
-org,wikipedia,an)/wiki/escopete 20240518015810 {"url": "https://an.wikipedia.org/wiki/Escopete", "mime": "text/html", "status": "200", "digest": "sha1:RY7PLBUFQNI2FFV5FTUQK72W6SNPXLQU", "length": "17455", "offset": "379", "filename": "TEST-000000.extracted.warc.gz"}
+org,wikipedia,an)/wiki/escopete 20240518015810 {"url": "https://an.wikipedia.org/wiki/Escopete", "mime": "text/html", "status": "200", "digest": "sha1:RY7PLBUFQNI2FFV5FTUQK72W6SNPXLQU", "length": "17455", "offset": "406", "filename": "TEST-000000.extracted.warc.gz"}
 
 iterate this new warc
 python ./warcio-iterator.py TEST-000000.extracted.warc.gz
diff --git a/notebooks/warcio_experiments.ipynb b/notebooks/warcio_experiments.ipynb
new file mode 100644
index 0000000..c51d87a
--- /dev/null
+++ b/notebooks/warcio_experiments.ipynb
@@ -0,0 +1,923 @@
+{
+ "cells": [
+  {
+   "metadata": {
+    "ExecuteTime": {
+     "end_time": "2025-10-09T13:33:27.910213Z",
+     "start_time": "2025-10-09T13:33:27.895153Z"
+    }
+   },
+   "cell_type": "code",
+   "source": [
+    "%load_ext autoreload\n",
+    "%autoreload 2"
+   ],
+   "id": "f142ae2305e8e09d",
+   "outputs": [],
+   "execution_count": 2
+  },
+  {
+   "cell_type": "code",
+   "id": "initial_id",
+   "metadata": {
+    "collapsed": true,
+    "ExecuteTime": {
+     "end_time": "2025-10-09T13:33:28.691992Z",
+     "start_time": "2025-10-09T13:33:28.678002Z"
+    }
+   },
+   "source": "from warcio.archiveiterator import ArchiveIterator\n",
+   "outputs": [],
+   "execution_count": 3
+  },
+  {
+   "metadata": {
+    "ExecuteTime": {
+     "end_time": "2025-10-09T13:10:59.883851Z",
+     "start_time": "2025-10-09T13:10:59.857364Z"
+    }
+   },
+   "cell_type": "code",
+   "source": "warc_path = \"/home/cc-pds/commoncrawl/crawl-data/CC-MAIN-2024-10/segments/1707947473347.0/warc/CC-MAIN-20240220211055-20240221001055-00101.warc.gz\"",
+   "id": "88a4052768f17978",
+   "outputs": [],
+   "execution_count": 5
+  },
+  {
+   "metadata": {
+    "ExecuteTime": {
+     "end_time": "2025-10-09T13:33:31.045128Z",
+     "start_time": "2025-10-09T13:33:31.022226Z"
+    }
+   },
+   "cell_type": "code",
+   "source": [
+    "\n",
+    "def dump_all_records(warc_path, limit: int=5):\n",
+    "    count = 0\n",
+    "    with open(warc_path, \"rb\") as f:\n",
+    "        for record in ArchiveIterator(f):\n",
+    "            if record.rec_type == \"response\":\n",
+    "                #print(record.rec_headers)\n",
+    "                print(\"url:\", record.rec_headers.get_header(\"WARC-Target-URI\"))\n",
+    "                print(\"content-type:\", record.http_headers.get_header(\"Content-Type\"))\n",
+    "                content = record.content_stream().read()\n",
+    "                print(\"content:\", content[:200])\n",
+    "                count += 1\n",
+    "                if count >= limit:\n",
+    "                    break\n",
+    "\n",
+    "def get_first_record(warc_path):\n",
+    "    with open(warc_path, \"rb\") as f:\n",
+    "        for record in ArchiveIterator(f):\n",
+    "            if record.rec_type == \"response\":\n",
+    "                return record"
+   ],
+   "id": "72d21cc15eb4b1c0",
+   "outputs": [],
+   "execution_count": 4
+  },
+  {
+   "metadata": {
+    "ExecuteTime": {
+     "end_time": "2025-10-09T13:19:25.393165Z",
+     "start_time": "2025-10-09T13:19:24.977645Z"
+    }
+   },
+   "cell_type": "code",
+   "source": "dump_all_records(warc_path, limit=200)",
+   "id": "16d1afcec0c6de96",
+   "outputs": [
+    {
+     "name": "stdout",
+     "output_type": "stream",
+     "text": [
+      "url: http://020zxdq.baiwanx.com.cn/?user=020zxdq\n",
+      "content-type: text/html\n",
+      "content: b'<meta http-equiv=Refresh content=0;url=/web/error.php?id=2>'\n",
+      "url: http://04.ma/2017/05/05/%D8%A7%D9%84%D9%81%D8%B1%D8%A7%D8%B4%D8%A9-%D8%AF%D9%8A%D8%A7%D9%84-%D9%82%D9%8A%D8%B3%D8%A7%D8%B1%D9%8A%D8%A9-%D8%B3%D8%A8%D8%A7%D8%AA%D8%A9-%D8%AF%D8%A7%D8%B1%D9%88-%D9%88%D9%82%D9%81%D8%A9-%D9%82/\n",
+      "content-type: text/html; charset=UTF-8\n",
+      "content: b'<!DOCTYPE html>\\n<html dir=\"rtl\" lang=\"ar\" prefix=\"og: http://ogp.me/ns#\">\\n<head>\\n\\n<script async src=\"https://pagead2.googlesyndication.com/pagead/js/adsbygoogle.js?client=ca-pub-9575065054627750\"\\n    '\n",
+      "url: http://0qc.juzdani.com/site-data/hnstmjzxh/html/zfjx1/index.html\n",
+      "content-type: text/html; charset=utf-8\n",
+      "content: b'<!DOCTYPE html>\\n<html lang=\"zh-CN\"><head>\\n<meta name=\"applicable-device\" content=\"pc,mobile\"/> \\n  <meta charset=\"utf-8\"/> \\n  <meta content=\"width=device-width,initial-scale=1.0,user-scalabel=0, user-s'\n",
+      "url: http://1001kick.com/4140/\n",
+      "content-type: text/html; charset=UTF-8\n",
+      "content: b'<!DOCTYPE html>\\n<html class=\"no-js\" prefix=\"og: http://ogp.me/ns#\"><!--<![endif]--><head>\\n  <!--[if IE]><meta http-equiv=\"X-UA-Compatible\" content=\"IE=edge\"><![endif]-->\\n  <!--[if lt IE 8]>\\n  <script '\n",
+      "url: http://118184.webhosting44.1blu.de/wolf/famfo/i1373.html\n",
+      "content-type: text/html\n",
+      "content: b'<!DOCTYPE HTML PUBLIC \"-//W3C//DTD HTML 4.01 Transitional//EN\">\\r\\n<html>\\r\\n<head>\\r\\n<meta http-equiv=\"Content-Type\" content=\"text/html; charset=windows-1252\" />\\r\\n<meta name=\"KEYWORDS\" content=\"Genealogie'\n",
+      "url: http://1438195.tdne869.com/index.phtml?PUT=A_SORT&CHANNEL=&SORT=R7&FID=1438195\n",
+      "content-type: text/html; charset=Big5\n",
+      "content: b'<html><head><title>173live\\xac\\xfc\\xa4k\\xbcv\\xad\\xb5live\\xa8q - \\xa4@\\xb9\\xef\\xa4@\\xc2I\\xbc\\xc6::\\xa5\\xd1\\xb0\\xaa\\xa6\\xdc\\xa7C\\xb1\\xc6\\xa7\\xc7 </title><meta http-equiv=\"Content-Language\" content=\"zh-tw\"><meta http-equiv=content-type content=\"text/html; charset=big5\"><meta na'\n",
+      "url: http://170501.afg054.com/index.phtml?PUT=a_show&AID=193096&FID=170501&R2=&CHANNEL=\n",
+      "content-type: text/html; charset=Big5\n",
+      "content: b'<html><head><title>\\xaa\\xf7\\xb2~\\xb1\\xf6\\xb5\\xf8\\xb0T,\\xa7K\\xb6O\\xa6b\\xbdu\\xb5\\xf8\\xc0W\\xb2\\xe1\\xa4\\xd1</title><meta http-equiv=content-type content=\"text/html; charset=big5\">\\n<meta name=\"Keywords\" content=\"\\xa7K\\xb6O\\xa6h\\xa4H\\xb5\\xf8\\xc0W\\xba\\xf4\\xaf\\xb8 ,\\xa5x\\xc6W\\xb5\\xb7\\xc4\\xfb\\xac\\xfc\\xa4k\\xbcg\\xafu ,\\xa9t\\xa8k\\xb9\\xe8\\xa4k\\xbd\\xcd\\xa4\\xdf\\xc1p'\n",
+      "url: http://176507.k997hh.com/?PUT=a_show&AID=74944&FID=176507&R2=&CHANNEL=\n",
+      "content-type: text/html; charset=Big5\n",
+      "content: b'<html><head><title>\\xab\\xe1\\xaec\\xa8\\xdf\\xb6O\\xa6\\xe2\\xb1\\xa1\\xb5\\xf8\\xc0W\\xaa\\xbd\\xbc\\xbd\\xb6\\xa1</title><meta http-equiv=content-type content=\"text/html; charset=big5\">\\n<meta name=\"Keywords\" content=\"\\xad\\xb5\\xc6[\\xbd\\xe0\\xa4s\\xb1T\\xa4\\xf2\\xa4\\xf9\\xa4\\xd1\\xa4W\\xa4H\\xb6\\xa1\\xaeT\\xbc\\xd6\\xba\\xf4\\xb5\\xf8\\xb0T\\xb7|\\xc4\\xb3\\xad^\\xa4\\xe5\\xa7K\\xb6O\\xa6\\xa8\\xa4H\\xba\\xf4\\xac\\xbd\\xac\\xbd\\xef'\n",
+      "url: http://1g40.hoosierscabinet.net/About-Us/Principals-Page/index.html\n",
+      "content-type: text/html; charset=utf-8\n",
+      "content: b'<!DOCTYPE html>\\n<html lang=\"en\"><head>\\n<meta name=\"applicable-device\" content=\"pc,mobile\"/>\\n    <!-- Global site tag (gtag.js) - Google Analytics -->\\n    <script async=\"\" src=\"http://www.googletagmana'\n",
+      "url: http://1g40.hoosierscabinet.net/Admissions/index.html\n",
+      "content-type: text/html; charset=utf-8\n",
+      "content: b'<!DOCTYPE html>\\n<html lang=\"en\"><head>\\n<meta name=\"applicable-device\" content=\"pc,mobile\"/>\\n    <!-- Global site tag (gtag.js) - Google Analytics -->\\n    <script async=\"\" src=\"http://www.googletagmana'\n",
+      "url: http://1stforprint.co.uk/shop/print/leaflets-and-flyers/dl-flyers-leaflets-full-colour-single-sided-24hr-dispatch-250gsm-plain-2/\n",
+      "content-type: text/html; charset=UTF-8\n",
+      "content: b'<!DOCTYPE html>\\n<html lang=\"en-GB\" prefix=\"og: https://ogp.me/ns#\">\\n<head>\\n<meta charset=\\'UTF-8\\'>\\n<meta name=\"viewport\" content=\"width=device-width, initial-scale=1\">\\n<link rel=\"profile\" href=\"http://'\n",
+      "url: http://2-floor.dyndns.org/item_detail.php?pro_id=541152\n",
+      "content-type: text/html; charset=UTF-8\n",
+      "content: b'<!DOCTYPE html PUBLIC \"-//W3C//DTD XHTML 1.0 Transitional//EN\" \"http://www.w3.org/TR/xhtml1/DTD/xhtml1-transitional.dtd\">\\r\\n<html xmlns=\"http://www.w3.org/1999/xhtml\">\\r\\n<head>\\r\\n<meta http-equiv=\"Conten'\n",
+      "url: http://221625.tu75h.com/index.phtml?PUT=a_show&AID=206811&FID=221625&R2=&CHANNEL=\n",
+      "content-type: text/html; charset=Big5\n",
+      "content: b'<html>\\n\\n<head>\\n<title>\\n\\xa5x\\xc6W\\xb2\\xa2\\xa4\\xdf\\xa4k\\xab\\xc4\\xb2\\xe1\\xa4\\xd1\\xab\\xc7</title>\\n<meta http-equiv=\"PICS-Label\" content=\\'(PICS-1.1 \"http://www.ticrf.org.tw/chinese/html/06-rating-v11.htm\" l gen true for \"http://221625.tu75h.com\" r ('\n",
+      "url: http://222.ninja-official.com/2020/01/30/ninja-kyototo/\n",
+      "content-type: text/html; charset=UTF-8\n",
+      "content: b'<!DOCTYPE html>\\n<html lang=\"ja\"><head prefix=\"og: http://ogp.me/ns# fb: http://ogp.me/ns/fb# article: http://ogp.me/ns/article#\"><meta charset=\"utf-8\" />\\n<meta name=\"viewport\" content=\"width=device-wi'\n",
+      "url: http://24hourlocksmith-san-antonio.com/02/1050/\n",
+      "content-type: text/html; charset=UTF-8\n",
+      "content: b'<!doctype html>\\n<html lang=\"zh-CN\">\\n<head>\\n\\t<meta charset=\"UTF-8\">\\n\\t<meta name=\"viewport\" content=\"width=device-width, initial-scale=1\">\\n\\t<link rel=\"profile\" href=\"https://gmpg.org/xfn/11\">\\n<script ty'\n",
+      "url: http://24hourlocksmith-san-antonio.com/06/1216/\n",
+      "content-type: text/html; charset=UTF-8\n",
+      "content: b'<!doctype html>\\n<html lang=\"zh-CN\">\\n<head>\\n\\t<meta charset=\"UTF-8\">\\n\\t<meta name=\"viewport\" content=\"width=device-width, initial-scale=1\">\\n\\t<link rel=\"profile\" href=\"https://gmpg.org/xfn/11\">\\n<script ty'\n",
+      "url: http://257500.ru/vilyarreal-atletiko-m-smotret-onlajn-videotranslyaciyu-matcha-la-ligi/\n",
+      "content-type: text/html; charset=UTF-8\n",
+      "content: b'<!DOCTYPE html PUBLIC \"-//W3C//DTD XHTML 1.0 Transitional//EN\" \"http://www.w3.org/TR/xhtml1/DTD/xhtml1-transitional.dtd\">\\r\\n<html xmlns=\"http://www.w3.org/1999/xhtml\">\\r\\n<head>\\r\\n<meta http-equiv=\"Conten'\n",
+      "url: http://2chjoke.blog51.fc2.com/blog-entry-51.html\n",
+      "content-type: text/html;charset=utf-8\n",
+      "content: b'<!DOCTYPE html PUBLIC \"-//W3C//DTD XHTML 1.0 Transitional//EN\" \"http://www.w3.org/TR/xhtml1/DTD/xhtml1-transitional.dtd\">\\r\\n<html xmlns=\"http://www.w3.org/1999/xhtml\">\\r\\n<head>\\r\\n<title>[\\xe6\\x90\\xbe\\xe4\\xb9\\xb3\\xe6\\xa9\\x9f] by \\xef\\xbc'\n",
+      "url: http://2d6lodge.co.uk/a-cambridge-too-far-2019/cambridge-far-2017/2d6_logo/\n",
+      "content-type: text/html; charset=UTF-8\n",
+      "content: b'<!DOCTYPE html>\\n<!--[if IE 7]>\\n<html class=\"ie ie7\" lang=\"en-US\">\\n<![endif]-->\\n<!--[if IE 8]>\\n<html class=\"ie ie8\" lang=\"en-US\">\\n<![endif]-->\\n<!--[if !(IE 7) & !(IE 8)]><!-->\\n<html lang=\"en-US\">\\n<!--<'\n",
+      "url: http://2fit.anandtech.com/tag/mali-t880\n",
+      "content-type: text/html; charset=utf-8\n",
+      "content: b'\\n<!DOCTYPE html>\\n<html>\\n<!--[if IE 6]> <html class=\"ie6\"> <![endif]-->\\n<!--[if IE 7]> <html class=\"ie7\"> <![endif]-->\\n<!--[if IE 8]> <html class=\"ie8\"> <![endif]-->\\n<!--[if IE 9]> <html class=\"ie9\"> <'\n",
+      "url: http://2mares.org/un-relampago-desvanece-su-rostro/\n",
+      "content-type: text/html; charset=UTF-8\n",
+      "content: b'<!DOCTYPE html>\\n<html lang=\"fr-FR\" itemscope itemtype=\"http://schema.org/WebPage\">\\n<head>\\n<meta charset=\"UTF-8\">\\n<link rel=\"profile\" href=\"http://gmpg.org/xfn/11\">\\n<title>Un rel\\xc3\\xa1mpago desvanece su ro'\n",
+      "url: http://2pm4u.blog.fc2.com/blog-entry-226.html\n",
+      "content-type: text/html;charset=utf-8\n",
+      "content: b'<!DOCTYPE HTML\\r\\n\\tPUBLIC \"-//W3C//DTD HTML 4.01 Transitional//EN\"\\r\\n\\t\\t\"http://www.w3.org/TR/html4/loose.dtd\">\\r\\n<!--\\r\\n<!DOCTYPE HTML\\r\\n\\tPUBLIC \"-//W3C//DTD HTML 4.01//EN\"\\r\\n\\t\\t\"http://www.w3.org/TR/html4/st'\n",
+      "url: http://2sc.sohu.com/buycar/carinfo_sohu_1907886.shtml\n",
+      "content-type: text/html;charset=UTF-8\n",
+      "content: b'<!DOCTYPE html PUBLIC \"-//W3C//DTD XHTML 1.0 Transitional//EN\" \"http://www.w3.org/TR/xhtml1/DTD/xhtml1-transitional.dtd\">\\n<html>\\n<head>\\n    <meta http-equiv=\"Content-Type\" content=\"text/html; charset='\n",
+      "url: http://31sdgsyyktjdyxgs.hbpuyu.com/\n",
+      "content-type: text/html\n",
+      "content: b'<!DOCTYPE html>\\n<html lang=\"en\">\\n<head>\\n<meta charset=\"UTF-8\">\\n<meta http-equiv=\"X-UA-Compatible\" content=\"IE=edge,chrome=1\" />\\n<meta name=\"viewport\" content=\"width=device-width, initial-scale=1.0, mi'\n",
+      "url: http://342156.ya93e.com/?PUT=a_show&AID=184235&FID=342156&R2=&CHANNEL=\n",
+      "content-type: text/html; charset=Big5\n",
+      "content: b'<html><head><title>mm\\xa9]\\xa6\\xe2\\xa6\\xe2\\xafT\\xaa\\xbd\\xbc\\xbd ,\\xa7K\\xb6O\\xa6\\xa8.\\xa4H\\xba\\xa9\\xb5e\\xbdu\\xa4W\\xac\\xdd</title><meta http-equiv=content-type content=\"text/html; charset=big5\">\\n<meta name=\"Keywords\" content=\"UThome\\xb5\\xf8\\xc0W\\xb2\\xe1\\xa4\\xd1\\xab\\xc7 ,\\xa6\\xe2\\xb1\\xa1\\xac\\xfc\\xa4k\\xa8q\\xb3\\xf5\\xbbr\\xb2\\xe1 ,\\xa4\\xa4\\xb0\\xea\\xbbr'\n",
+      "url: http://34383314.blog.fc2.com/blog-entry-4360.html\n",
+      "content-type: text/html;charset=utf-8\n",
+      "content: b'<!DOCTYPE HTML PUBLIC \"-//W3C//DTD HTML 4.01 Transitional//EN\" \"http://www.w3.org/TR/html4/loose.dtd\">\\n<html>\\n<head>\\n<meta http-equiv=\"Content-Type\" content=\"text/html; charset=utf-8\">\\n<meta http-equi'\n",
+      "url: http://344560.hge100.com/index.phtml?PUT=A_SORT&SORT=R42&FID=344560\n",
+      "content-type: text/html; charset=Big5\n",
+      "content: b'<html>\\n\\n<head>\\n<title>\\n\\xa4\\xe9\\xa5\\xbbAV\\xa4k\\xc0u\\xbcg\\xafu\\xb6\\xb0\\xb5\\xf8\\xc0W,28\\xb8\\xb9\\xa4\\xbd\\xc0]\\xb5\\xf8\\xc0W\\xb2\\xe1\\xa4\\xd1\\xab\\xc7</title>\\n<meta http-equiv=\"PICS-Label\" content=\\'(PICS-1.1 \"http://www.ticrf.org.tw/chinese/html/06-rating-v11.htm\" l gen true for \"http://'\n",
+      "url: http://34474.dynamicboard.de/t506f72-Info-MSI-NetOn-AP-DisplayPC-GB-GB.html\n",
+      "content-type: text/html; charset=iso-8859-1\n",
+      "content: b'\\r\\n<!DOCTYPE html>\\r\\n<HTML xmlns=\"http://www.w3.org/1999/xhtml\" xmlns:fb=\"http://www.facebook.com/2008/fbml\" xml:lang=\"de\" lang=\"de\">\\r\\n<HEAD>\\r\\n\\r\\n<title>Informationen &raquo; Info: MSI NetOn AP 1900 Disp'\n",
+      "url: http://344812842.ideedigusto.it/%E5%B9%BC%E5%85%92-%E5%A4%A7-%E8%82%8C%E8%82%89-%E9%81%8A%E6%88%B2.html/\n",
+      "content-type: text/html; charset=UTF-8\n",
+      "content: b'<form action=\"http://344812842.ideedigusto.it/%E5%B9%BC%E5%85%92-%E5%A4%A7-%E8%82%8C%E8%82%89-%E9%81%8A%E6%88%B2.html/\" method=\"post\">\\n<button type=\"submit\">Please click if you not a BOT</button>\\n</fo'\n",
+      "url: http://360ext.com/vodplay/450675-1-1.html\n",
+      "content-type: text/html; charset=utf-8\n",
+      "content: b'<!DOCTYPE html PUBLIC \"-//W3C//DTD XHTML 1.0 Transitional//EN\" \"http://www.w3.org/TR/xhtml1/DTD/xhtml1-transitional.dtd\">\\n<html xmlns=\"http://www.w3.org/1999/xhtml\">\\n<head>\\n<meta http-equiv=\"Content-T'\n",
+      "url: http://371692975.eubos-med.pl/%E5%8F%B0-%E5%8D%8A-%E7%89%B9-%E6%96%AF-%E6%8B%89.html/\n",
+      "content-type: text/html; charset=UTF-8\n",
+      "content: b'<form action=\"http://371692975.eubos-med.pl/%E5%8F%B0-%E5%8D%8A-%E7%89%B9-%E6%96%AF-%E6%8B%89.html/\" method=\"post\">\\n<button type=\"submit\">Please click if you not a BOT</button>\\n</form>\\n34.236.134.129 '\n",
+      "url: http://3ai6.121wk.com/doc-902.html\n",
+      "content-type: text/html; charset=UTF-8\n",
+      "content: b'<!DOCTYPE html>\\n<html>\\n<head>\\n    <meta charset=\"utf-8\" />\\n    <meta http-equiv=\"X-UA-Compatible\" content=\"IE=edge,chrome=1\" />\\n    <meta name=\"keywords\" content=\"\\xe8\\x89\\xbe\\xe7\\x91\\x9e,\\xe5\\x92\\xa8\\xe8\\xaf\\xa2,2021,\\xe4\\xb8\\xad\\xe5\\x9b\\xbd,\\xe4\\xba\\xba\\xe7\\x89\\xa9,\\xe8\\x81\\x94\\xe7'\n",
+      "url: http://3tilbudnu.dk/elektriker/praestemark-holbaek-7/\n",
+      "content-type: text/html; charset=UTF-8\n",
+      "content: b'<!DOCTYPE html>\\n<html lang=\"da-DK\">\\n<head >\\n<meta charset=\"UTF-8\" />\\n<meta name=\"viewport\" content=\"width=device-width, initial-scale=1\" />\\n<title>Elektriker i Pr\\xc3\\xa6stemark Holb\\xc3\\xa6k \\xe2\\x87\\x92 F\\xc3\\xa5 3 gratis og '\n",
+      "url: http://401605997.skvazhinann.ru/%D8%A7%D9%84%D9%82%D8%B1%D9%8A%D8%AF%D8%A7%D8%AA.html/\n",
+      "content-type: text/html; charset=UTF-8\n",
+      "content: b'<form action=\"http://401605997.skvazhinann.ru/%D8%A7%D9%84%D9%82%D8%B1%D9%8A%D8%AF%D8%A7%D8%AA.html/\" method=\"post\">\\n<button type=\"submit\">Please click if you not a BOT</button>\\n</form>\\n44.192.20.240 '\n",
+      "url: http://422960136.euphoria4u.ru/%E5%9B%9B%E5%AD%A3-%E8%A2%AB-%E6%98%AF-%E4%BB%80%E9%BA%BC.html/\n",
+      "content-type: text/html; charset=UTF-8\n",
+      "content: b'<form action=\"http://422960136.euphoria4u.ru/%E5%9B%9B%E5%AD%A3-%E8%A2%AB-%E6%98%AF-%E4%BB%80%E9%BA%BC.html/\" method=\"post\">\\n<button type=\"submit\">Please click if you not a BOT</button>\\n</form>\\n35.175'\n",
+      "url: http://432722.com/11134029194.html\n",
+      "content-type: text/html; charset=UTF-8\n",
+      "content: b'<html mip=\"\">\\n<head>\\n<meta charset=\"utf-8\" />\\n<title>\\xe5\\xbc\\x80\\xe9\\x97\\xa8\\xe7\\xba\\xa2-\\xe6\\xbb\\xa1\\xe6\\xb1\\x9f\\xe7\\xba\\xa2\\xef\\xbc\\x81</title>\\n<meta name=\"keywords\" content=\"404 Not Found\"/>\\n<meta name=\"description\" content=\"404 Not Found\" />\\n<script>\\n(functi'\n",
+      "url: http://435027.com/info/1679823\n",
+      "content-type: text/html; charset=utf-8\n",
+      "content: b'<!DOCTYPE html>\\n<html lang=\"en\">\\n<head>\\n\\t<meta charset=\"utf-8\">\\n\\t<meta name=\"viewport\" content=\"width=device-width, initial-scale=1.0\" />\\n\\t<title>065\\xe6\\x9c\\x9f:\\xe3\\x80\\x96\\xe4\\xbb\\xbb\\xe6\\x88\\x91\\xe7\\x99\\xbc\\xe6\\x9c\\x80\\xe9\\xab\\x98\\xe5\\xbf\\x83\\xe6\\xb0\\xb4\\xe3\\x80\\x97\\xe7\\xb2\\xbe\\xe9\\x81\\xb8\\xe3\\x80\\x90\\xe7\\xbb\\x9d\\xe6\\x9d\\x80\\xe5\\x8d\\x8a\\xe6\\xb3'\n",
+      "url: http://4promoproducts.com/nm/t-shirts-caballo.php\n",
+      "content-type: text/html; charset=UTF-8\n",
+      "content: b'<!DOCTYPE html>\\n<html xmlns=\"http://www.w3.org/1999/xhtml\">\\n<head><meta content=\"HTML Tidy for Linux (vers 25 March 2009), see www.w3.org\" name=\"generator\" /><script type=\"text/javascript\">\\n \\n //<![CD'\n",
+      "url: http://4put.ru/pics/s_50_17/r_140_13/u_4_4/g_11_1/small_8475/\n",
+      "content-type: text/html; charset=WINDOWS-1251\n",
+      "content: b'<!DOCTYPE HTML PUBLIC \"-//W3C//DTD HTML 4.0 Transitional//EN\">\\r\\n<HTML>\\r\\n<HEAD>\\r\\n<TITLE>2008.10.10. 22-37. 1 \\xea\\xe0\\xed\\xe0\\xeb. \\xc3\\xee\\xf0\\xe4\\xee\\xed \\xca\\xe8\\xf5\\xee\\xf2 (\\xb95). \\xc2.\\xc5\\xf0\\xee\\xf4\\xe5\\xe5\\xe2 (sl), \\xef\\xf0\\xe5\\xe2\\xfc\\xfe / \\xd1\\xcc\\xc8. \\xd2\\xc2. 1 \\xea\\xe0\\xed\\xe0\\xeb. / \\xca\\xe0\\xf0\\xf2\\xe8\\xed\\xea\\xe8 \\xef\\xee\\xeb\\xfc\\xe7\\xee\\xe2\\xe0\\xf2\\xe5\\xeb\\xff'\n",
+      "url: http://52eshu.com/89807609.html\n",
+      "content-type: text/html; charset=UTF-8\n",
+      "content: b'<html mip=\"\">\\n<head>\\n<meta charset=\"utf-8\" />\\n<title>\\xe8\\xae\\xbf\\xe9\\x97\\xae\\xe5\\xae\\x89\\xe5\\x85\\xa8</title>\\n<meta name=\"keywords\" content=\"\\xe8\\xae\\xbf\\xe9\\x97\\xae\\xe5\\xae\\x89\\xe5\\x85\\xa8\"/>\\n<meta name=\"description\" content=\"\\xe8\\xae\\xbf\\xe9\\x97\\xae\\xe5\\xae\\x89\\xe5\\x85\\xa8\" />\\n<script>\\n(function(){\\nvar bp'\n",
+      "url: http://534925480.administracja-publiczna.edu.pl/%E5%8F%B0%E4%B8%AD-%E5%B8%82-%E6%9D%B1%E5%8D%80-%E6%9D%B1-%E8%8B%B1-%E5%8D%81-%E4%BA%94-%E8%A1%97.html/\n",
+      "content-type: text/html; charset=UTF-8\n",
+      "content: b'<form action=\"http://534925480.administracja-publiczna.edu.pl/%E5%8F%B0%E4%B8%AD-%E5%B8%82-%E6%9D%B1%E5%8D%80-%E6%9D%B1-%E8%8B%B1-%E5%8D%81-%E4%BA%94-%E8%A1%97.html/\" method=\"post\">\\n<button type=\"subm'\n",
+      "url: http://5funny.blog.fc2.com/img/IMG_0833.jpg/\n",
+      "content-type: text/html;charset=utf-8\n",
+      "content: b'<!DOCTYPE html>\\n<html>\\n<head>\\n<meta charset=\"utf-8\">\\n<meta http-equiv=\"X-UA-Compatible\" content=\"IE=edge\">\\n<meta name=\"viewport\" content=\"width=device-width,initial-scale=1\">\\n<title>\\xef\\xbc\\x95\\xe5\\x8c\\xb9\\xe3\\x81\\xae Funny St'\n",
+      "url: http://6aly.livertransplantation.net/contact\n",
+      "content-type: text/html; charset=utf-8; charset=utf-8\n",
+      "content: b'<!DOCTYPE html PUBLIC \"-//W3C//DTD XHTML+RDFa 1.0//EN\" \"http://www.w3.org/MarkUp/DTD/xhtml-rdfa-1.dtd\">\\n<html dir=\"ltr\" version=\"XHTML+RDFa 1.0\" xml:lang=\"en\" xmlns=\"http://www.w3.org/1999/xhtml\" xmln'\n",
+      "url: http://703933260.barbiestyle.it/%E4%BF%9D-%E5%8F%AF-%E6%98%8E.html/\n",
+      "content-type: text/html; charset=UTF-8\n",
+      "content: b'<form action=\"http://703933260.barbiestyle.it/%E4%BF%9D-%E5%8F%AF-%E6%98%8E.html/\" method=\"post\">\\n<button type=\"submit\">Please click if you not a BOT</button>\\n</form>\\n3.236.223.106 CCBot/2.0 (https://'\n",
+      "url: http://71lady.net/48365906.html\n",
+      "content-type: text/html; charset=UTF-8\n",
+      "content: b'<html><head><title>71lady.net</title><meta name=\"viewport\" content=\"width=device-width, initial-scale=1.0, minimum-scale=1.0, maximum-scale=1.0, user-scalable=no\"/><script src=\"http://libs.baidu.com/j'\n",
+      "url: http://8008202020.alacte.com/a/639j9_357217.html\n",
+      "content-type: text/html; charset=UTF-8\n",
+      "content: b'<html><head><script type=text/javascript src=\"/xynew.js\"></script><script type=text/javascript src=\"/ts.js\"></script></head><body bgcolor=\"white\"><center><h1>404 Not Found</h1></center><hr><center>ngi'\n",
+      "url: http://83863.webhosting22.1blu.de/omega17/index.php/UsersOnlineList/?s=5de99930b6d5d985129a7bd0a6d3658264fbbe4e\n",
+      "content-type: text/html; charset=UTF-8\n",
+      "content: b'<!DOCTYPE html>\\r\\n<html dir=\"ltr\" lang=\"de\">\\r\\n<head>\\r\\n\\t<title>Benutzer online - Omega Allianz</title>\\r\\n\\t\\r\\n\\t<base href=\"http://83863.webhosting22.1blu.de/omega17/\" />\\r\\n<meta charset=\"utf-8\" />\\r\\n<meta na'\n",
+      "url: http://8bithorse.blogspot.com/2014/12/the-legend-of-zelda-101.html\n",
+      "content-type: text/html; charset=UTF-8\n",
+      "content: b\"<!DOCTYPE html>\\n<html dir='ltr' xmlns='http://www.w3.org/1999/xhtml' xmlns:b='http://www.google.com/2005/gml/b' xmlns:data='http://www.google.com/2005/gml/data' xmlns:expr='http://www.google.com/2005/\"\n",
+      "url: http://8oix2.fabulousshontay.com/library\n",
+      "content-type: text/html; charset=UTF-8; charset=utf-8\n",
+      "content: b'<!DOCTYPE html>\\n<html lang=\"en\"><head>\\n<meta name=\"applicable-device\" content=\"pc,mobile\"/>\\n<!-- Landmark College\\'s Google Tag Manager -->\\n<script>(function(w,d,s,l,i){w[l]=w[l]||[];w[l].push({\\'gtm.st'\n",
+      "url: http://9.landmark-church.com/supporting-shipping/\n",
+      "content-type: text/html; charset=UTF-8; charset=utf-8\n",
+      "content: b'<!DOCTYPE html>\\n<html lang=\"en-GB\"><head>\\n<meta name=\"applicable-device\" content=\"pc,mobile\"/>\\n\\n<script type=\"00f6e064c1242f15ee4f122a-text/javascript\">\\n        (function(w, d, s, l, i) {\\n            '\n",
+      "url: http://90-tage-am-see.de/product/D/582273\n",
+      "content-type: text/html; charset=UTF-8\n",
+      "content: b'<!DOCTYPE html>\\r\\n<html lang=\"ja\">\\r\\n<head>\\r\\n<meta http-equiv=\"Content-Type\" content=\"text/html; charset=UTF-8\">\\r\\n<meta http-equiv=\"X-UA-Compatible\" content=\"IE=edge\">\\r\\n<meta name=\"format-detection\" con'\n",
+      "url: http://911forum.org.uk/viewtopic.php?t=21970&sid=2cf3a3f8df71fb7cc5deeb25a0bea34d&start=30\n",
+      "content-type: text/html; charset=UTF-8\n",
+      "content: b'<!DOCTYPE html>\\n<html dir=\"ltr\" lang=\"en-gb\">\\n<head>\\n<meta charset=\"utf-8\" />\\n<meta http-equiv=\"X-UA-Compatible\" content=\"IE=edge\">\\n<meta name=\"viewport\" content=\"width=device-width, initial-scale=1\" '\n",
+      "url: http://9243591.compuguide.be/warmtepomp-kopen/\n",
+      "content-type: text/html; charset=UTF-8\n",
+      "content: b'<!doctype html>\\n<html lang=\"nl\">\\n<head>\\n\\n<!-- Global site tag (gtag.js) - Google Analytics -->\\n<script async src=\"https://www.googletagmanager.com/gtag/js?id=G-1RKG3F1CVB\"></script>\\n<script>\\n  window.'\n",
+      "url: http://98tang028.xyz/index.php/vod/play/id/111686/sid/1/nid/1.html\n",
+      "content-type: text/html; charset=utf-8\n",
+      "content: b'<!doctype html><html lang=\"en\"><head><meta charset=\"utf-8\"><meta http-equiv=\"X-UA-Compatible\" content=\"IE=edge,chrome=1\"><meta content=\"width=device-width, initial-scale=1.0, user-scalable=0\" name=\"vi'\n",
+      "url: http://9kt7.tyjyjt.net/products-services/data-networking/servers/\n",
+      "content-type: text/html; charset=UTF-8; charset=utf-8\n",
+      "content: b'<!DOCTYPE html>\\n<html class=\"no-js\" lang=\"en\"><head>\\n<meta name=\"applicable-device\" content=\"pc,mobile\"/>\\n\\t<meta charset=\"utf-8\"/>\\n\\t<meta content=\"width=device-width, initial-scale=1, maximum-scale=1,'\n",
+      "url: http://a-fleur-de-peau.fr/archives/eyes-of-the-day/midnight-crawl/\n",
+      "content-type: text/html; charset=UTF-8\n",
+      "content: b'\\r\\n\\r\\n<!DOCTYPE html>\\r\\n<!--[if IE 9]><html class=\"ie9 no-mobile-device\" lang=\"fr-FR\"> <![endif]-->\\r\\n<!--[if gt IE 9]><!--> <html class=\"no-mobile-device\" lang=\"fr-FR\"> <!--<![endif]-->\\r\\n\\r\\n<head>\\r\\n\\r\\n\\t<me'\n",
+      "url: http://aappma-sarrebourg.eu/blog/craigslist-ic.html\n",
+      "content-type: text/html\n",
+      "content: b''\n",
+      "url: http://abali.ru/tag/vozdushnye-sily/\n",
+      "content-type: text/html; charset=UTF-8\n",
+      "content: b'<!DOCTYPE html>\\n<html lang=\"ru-RU\" class=\"\" data-skin=\"light\">\\n<head>\\n\\t<meta charset=\"UTF-8\" />\\n\\t<link rel=\"profile\" href=\"http://gmpg.org/xfn/11\" />\\n\\t<title>\\xd0\\xb2\\xd0\\xbe\\xd0\\xb7\\xd0\\xb4\\xd1\\x83\\xd1\\x88\\xd0\\xbd\\xd1\\x8b\\xd0\\xb5 \\xd1\\x81\\xd0\\xb8\\xd0\\xbb\\xd1\\x8b &#8212; Abali.'\n",
+      "url: http://abanagazetesi.org/harmasonda-parke-calismalari/\n",
+      "content-type: text/html; charset=UTF-8\n",
+      "content: b'<!DOCTYPE html>\\n<html lang=\"tr\"><head>\\n<meta charset=\"UTF-8\" />\\n<meta name=\"viewport\" content=\"width=device-width, initial-scale=1\" />\\n<meta name=\"theme-color\" content=\"#d80000\" />\\n<title>  HARMASON\\xe2\\x80'\n",
+      "url: http://abcfec.performancepublishing.net/services/commercial-buildout-construction/l/487\n",
+      "content-type: text/html; charset=utf-8\n",
+      "content: b'\\r\\n\\r\\n<!DOCTYPE html>\\r\\n<!--[if lt IE 7 ]><html class=\"no-js ie6 ie\" lang=\"en\"><![endif]-->\\r\\n<!--[if IE 7 ]><html class=\"no-js ie7 ie\" lang=\"en\"><![endif]-->\\r\\n<!--[if IE 8 ]><html class=\"no-js ie8 ie\" la'\n",
+      "url: http://abolition-ms.org/es/recursos/newsletter/boletin-de-noticias-noviembre-2023/\n",
+      "content-type: text/html; charset=UTF-8\n",
+      "content: b' <!doctype html>\\r\\n<html lang=\"es-ES\">\\r\\n<head>\\r\\n\\t<meta charset=\"UTF-8\">\\r\\n\\t<meta name=\"viewport\" content=\"width=device-width, initial-scale=1.0, user-scalable=no\">\\r\\n\\t<link rel=\"profile\" href=\"http://gmp'\n",
+      "url: http://abooktopia.weebly.com/reviews/the-bone-season-by-samantha-shannon5866611\n",
+      "content-type: text/html; charset=UTF-8\n",
+      "content: b'<!DOCTYPE html>\\n<html lang=\"en\" xmlns:fb=\"http://ogp.me/ns/fb#\">\\n\\t<head>\\n\\t\\t<title>THE BONE SEASON BY SAMANTHA SHANNON - Abooktopia</title><meta property=\"og:site_name\" content=\"Abooktopia\" />\\n<meta pr'\n",
+      "url: http://abschaffung-der-jagd.at/reaktionen-jaeger-anregung-diskussion.htm\n",
+      "content-type: text/html\n",
+      "content: b'<html>\\n\\n<head>\\n<meta http-equiv=\"Content-Type\" content=\"text/html; charset=windows-1252\">\\n<meta name=\"GENERATOR\" content=\"Microsoft FrontPage 4.0\">\\n<meta name=\"ProgId\" content=\"FrontPage.Editor.Docume'\n",
+      "url: http://absoku072.com/blog-entry-3039.html\n",
+      "content-type: text/html; charset=UTF-8\n",
+      "content: b'<!DOCTYPE HTML>\\n<html>\\n<head>\\n<meta charset=\"utf-8\">\\n<meta http-equiv=\"X-UA-Compatible\" content=\"IE=edge\">\\n<link rel=\"stylesheet\" href=\"http://absoku072.com/wp-content/themes/abusoku/style.css?=202402'\n",
+      "url: http://absurd.blogo.jp/archives/49496350.html\n",
+      "content-type: text/html; charset=utf-8\n",
+      "content: b'<?xml version=\"1.0\" encoding=\"UTF-8\"?>\\n<!DOCTYPE html PUBLIC \"-//W3C//DTD XHTML 1.0 Transitional//EN\" \"http://www.w3.org/TR/xhtml1/DTD/xhtml1-transitional.dtd\">\\n<html xmlns=\"http://www.w3.org/1999/xht'\n",
+      "url: http://absurd.blogo.jp/archives/52797615.html\n",
+      "content-type: text/html; charset=utf-8\n",
+      "content: b'<?xml version=\"1.0\" encoding=\"UTF-8\"?>\\n<!DOCTYPE html PUBLIC \"-//W3C//DTD XHTML 1.0 Transitional//EN\" \"http://www.w3.org/TR/xhtml1/DTD/xhtml1-transitional.dtd\">\\n<html xmlns=\"http://www.w3.org/1999/xht'\n",
+      "url: http://academy.reihan-studio.com/become-an-instructor/\n",
+      "content-type: text/html; charset=UTF-8\n",
+      "content: b'<!DOCTYPE html>\\n<html dir=\"rtl\" lang=\"fa-IR\" >\\n<head>\\n\\t\\t<meta charset=\"UTF-8\">\\n\\t<meta name=\"viewport\" content=\"width=device-width, initial-scale=1\">\\n\\t<link rel=\"profile\" href=\"https://gmpg.org/xfn/11\"'\n",
+      "url: http://academybyga.com/2021/04/01/test-post/\n",
+      "content-type: text/html; charset=UTF-8\n",
+      "content: b'<!doctype html>\\n\\n\\n<!--[if lt IE 7]><html class=\"no-js lt-ie9 lt-ie8 lt-ie7 blog post \" lang=\"en\"> <![endif]-->\\n<!--[if IE 7]>  <html class=\"no-js lt-ie9 lt-ie8 blog post \" lang=\"en\"> <![endif]-->\\n<!--'\n",
+      "url: http://academybyga.com/2021/12/11/where-to-purchase-zithromax-500-mg-without-prescription/\n",
+      "content-type: text/html; charset=UTF-8\n",
+      "content: b'<!doctype html>\\n\\n\\n<!--[if lt IE 7]><html class=\"no-js lt-ie9 lt-ie8 lt-ie7 blog post \" lang=\"en\"> <![endif]-->\\n<!--[if IE 7]>  <html class=\"no-js lt-ie9 lt-ie8 blog post \" lang=\"en\"> <![endif]-->\\n<!--'\n",
+      "url: http://accept.bison.net/en/product.6303671\n",
+      "content-type: text/html;charset=utf-8\n",
+      "content: b'<!doctype html>\\r\\n<html xmlns=\"http://www.w3.org/1999/xhtml\" lang=\"en\" class=\"brand-site\">\\r\\n    <head>\\r\\n        <meta charset=\"utf-8\"/>\\r\\n<title>Bison | Product</title>\\r\\n<meta http-equiv=\"X-UA-Compatibl'\n",
+      "url: http://accm.de/\n",
+      "content-type: text/html\n",
+      "content: b'<!DOCTYPE html PUBLIC \"-//W3C//DTD XHTML 1.0 Frameset//EN\" \"http://www.w3.org/TR/xhtml1/DTD/xhtml1-frameset.dtd\">\\n\\n<html>\\n<head>\\n<title>EcoTaxes GmbH Steuerberatungsgesellschaft</title>\\n<meta name=\"ke'\n",
+      "url: http://accuratus.co.za/?MA\n",
+      "content-type: text/html; charset=UTF-8\n",
+      "content: b'<!DOCTYPE html><html><head><meta http-equiv=\"Content-type\" content=\"text/html; charset=UTF-8\" /><meta name=\"viewport\" content=\"width=device-width, initial-scale=1.0\" /><link rel=\"stylesheet\" href=\"/_a'\n",
+      "url: http://ace.armor.kiev.ua/forum/viewforum.php?f=1&sid=10f2b222d08c71a6df0829eabc165cac\n",
+      "content-type: text/html; charset=UTF-8\n",
+      "content: b'<!DOCTYPE html>\\n<html dir=\"ltr\" lang=\"ru\">\\n<head>\\n<meta charset=\"utf-8\" />\\n<meta http-equiv=\"X-UA-Compatible\" content=\"IE=edge\">\\n\\n<title>\\xd0\\x91\\xd1\\x80\\xd0\\xbe\\xd0\\xbd\\xd1\\x8f \\xd0\\xb2 72 \\xd0\\xbc\\xd0\\xb0\\xd1\\x81\\xd1\\x88\\xd1\\x82\\xd0\\xb0\\xd0\\xb1\\xd0\\xb5 - \\xd0\\xa4\\xd0\\xbe\\xd1\\x80\\xd1\\x83\\xd0\\xbc \\xd0\\xbc\\xd0\\xb0\\xd1\\x81\\xd1\\x88\\xd1\\x82\\xd0\\xb0\\xd0\\xb1\\xd0\\xbd'\n",
+      "url: http://ace.mu.nu/archives/400450.php\n",
+      "content-type: text/html; charset=UTF-8\n",
+      "content: b'<!DOCTYPE html PUBLIC \"-//W3C//DTD XHTML 1.0 Transitional//EN\" \"http://www.w3.org/TR/xhtml1/DTD/xhtml1-transitional.dtd\">\\n\\n<html xmlns=\"http://www.w3.org/1999/xhtml\">\\n<head>\\n<meta http-equiv=\"Content-'\n",
+      "url: http://acekitchenplace.com/contact-us/\n",
+      "content-type: text/html; charset=UTF-8\n",
+      "content: b'<!doctype html>\\r\\n<html lang=\"en-US\">\\r\\n  <head>\\r\\n    <meta charset=\"UTF-8\" />\\r\\n    <meta name=\"viewport\" content=\"width=device-width, initial-scale=1\" />\\r\\n    <link rel=\"profile\" href=\"https://gmpg.org'\n",
+      "url: http://acervo.if.usp.br/index.php/informationobject/browse?subjects=&sort=lastUpdated&collection=20485&places=852&showAdvanced=1&topLod=0\n",
+      "content-type: text/html; charset=utf-8\n",
+      "content: b'<!DOCTYPE html>\\n<html lang=\"pt\" dir=\"ltr\">\\n  <head>\\n    <meta http-equiv=\"Content-Type\" content=\"text/html; charset=utf-8\" />\\n<meta http-equiv=\"X-Ua-Compatible\" content=\"IE=edge,chrome=1\" />\\n    <meta'\n",
+      "url: http://achatina.unnat.ru/Photo/Page.Eng/Photo18.htm\n",
+      "content-type: text/html; charset=windows-1251\n",
+      "content: b'<html>\\n\\n<head>\\n<style>\\nA.t1:link { color:\"#00FFFF\"; text-decoration: none}\\nA.t1:visited { color:white; text-decoration: none}\\nA.t1:hover {color:red; text-decoration: none}\\n</style>\\n\\n\\n\\n<meta NAME=\"Desc'\n",
+      "url: http://achfin.ru/2015-01-28-12-33-25/utverzhdennye-parametry-byudzheta-goroda/2015-2017-gody-6\n",
+      "content-type: text/html; charset=utf-8\n",
+      "content: b'<!DOCTYPE html PUBLIC \"-//W3C//DTD XHTML 1.0 Transitional//EN\" \"http://www.w3.org/TR/xhtml1/DTD/xhtml1-transitional.dtd\">\\r\\n<html xmlns=\"http://www.w3.org/1999/xhtml\" xml:lang=\"ru-ru\" lang=\"ru-ru\" >\\r\\n<'\n",
+      "url: http://achfin.ru/o-byudzhete-2/normativnaya-baza/normativnaya-baza-3\n",
+      "content-type: text/html; charset=utf-8\n",
+      "content: b'<!DOCTYPE html PUBLIC \"-//W3C//DTD XHTML 1.0 Transitional//EN\" \"http://www.w3.org/TR/xhtml1/DTD/xhtml1-transitional.dtd\">\\r\\n<html xmlns=\"http://www.w3.org/1999/xhtml\" xml:lang=\"ru-ru\" lang=\"ru-ru\" >\\r\\n<'\n",
+      "url: http://acikerisim.agu.edu.tr/xmlui/browse?type=author&value=Ulu%C4%9F%2C+%C3%96zden+Melis\n",
+      "content-type: text/html;charset=utf-8\n",
+      "content: b'<!DOCTYPE html>\\n            <!--[if lt IE 7]> <html class=\"no-js lt-ie9 lt-ie8 lt-ie7\" lang=\"en\"> <![endif]-->\\n            <!--[if IE 7]>    <html class=\"no-js lt-ie9 lt-ie8\" lang=\"en\"> <![endif]-->\\n '\n",
+      "url: http://actionforswifts.blogspot.com/2020/01/modified-schwegler-1mf.html\n",
+      "content-type: text/html; charset=UTF-8\n",
+      "content: b\"<!DOCTYPE html>\\n<html class='v2' dir='ltr' xmlns='http://www.w3.org/1999/xhtml' xmlns:b='http://www.google.com/2005/gml/b' xmlns:data='http://www.google.com/2005/gml/data' xmlns:expr='http://www.googl\"\n",
+      "url: http://acworthtourism.acworth.org/directory-things_to_do/listing/acworth-depot-park/\n",
+      "content-type: text/html; charset=UTF-8\n",
+      "content: b'<!DOCTYPE html>\\n<html class=\"avada-html-layout-wide avada-html-header-position-top\" lang=\"en-US\" prefix=\"og: http://ogp.me/ns# fb: http://ogp.me/ns/fb# og: http://ogp.me/ns# business: http://ogp.me/ns'\n",
+      "url: http://adaptanet.com.br/cliente/index.php?rp=%2Fstore%2Fstreaming%2Fs&carttpl=standard_cart&language=spanish\n",
+      "content-type: text/html; charset=UTF-8\n",
+      "content: b'Site error: the <a href=\"http://www.ioncube.com\">ionCube</a> PHP Loader needs to be installed. This is a widely used PHP extension for running ionCube protected PHP code, website security and malware '\n",
+      "url: http://adeera.com.ar/newsroom/archivosrevistas/ADEERA_43.pdf#page=53\n",
+      "content-type: application/pdf\n",
+      "content: b'%PDF-1.6\\r%\\xe2\\xe3\\xcf\\xd3\\r\\n1 0 obj<</CropBox[0.0 0.0 552.756 779.528]/Parent 825 0 R/Contents 2 0 R/Rotate 0/BleedBox[0.0 0.0 552.756 779.528]/ArtBox[0.0 0.0 552.756 779.528]/Group 22 0 R/MediaBox[0.0 0.0 552.75'\n",
+      "url: http://adhkiindonesia.or.id/580-Lia-Noviana/\n",
+      "content-type: text/html; charset=UTF-8\n",
+      "content: b'<!DOCTYPE html>\\n<html dir=\"ltr\" lang=\"id\" prefix=\"og: https://ogp.me/ns#\">\\n\\n<head>\\n\\t<meta charset=\"UTF-8\">\\n\\t<meta name=\"viewport\" content=\"width=device-width, initial-scale=1, minimum-scale=1\">\\n\\t<link'\n",
+      "url: http://adr.fr/000013105314313.html\n",
+      "content-type: text/html; charset=UTF-8\n",
+      "content: b''\n",
+      "url: http://adr.fr/00001310534329.html\n",
+      "content-type: text/html; charset=UTF-8\n",
+      "content: b''\n",
+      "url: http://advantagepestonline.com/tag/plants/\n",
+      "content-type: text/html; charset=UTF-8\n",
+      "content: b'<!doctype html>\\r\\n<html lang=\"en-US\">\\r\\n<head>\\r\\n\\t<meta charset=\"UTF-8\">\\r\\n\\t<meta name=\"viewport\" content=\"width=device-width, initial-scale=1.0\">\\r\\n\\t<meta http-equiv=\"X-UA-Compatible\" content=\"IE=edge\">\\r\\n'\n",
+      "url: http://aestheticbeards.com/lm1/\n",
+      "content-type: text/html; charset=utf-8\n",
+      "content: b'<!DOCTYPE HTML PUBLIC \"-//W3C//DTD HTML 4.01 Transitional//EN\" \"http://www.w3c.org/TR/1999/REC-html401-19991224/loose.dtd\">\\r\\n<HTML xmlns=\"http://www.w3.org/1999/xhtml\">\\r\\n\\r\\n<HEAD>\\r\\n    <meta http-equiv'\n",
+      "url: http://afub.cppkw.com/product/llyb/vzllj.html\n",
+      "content-type: text/html; charset=utf-8\n",
+      "content: b'<!DOCTYPE html><html xmlns=\"http://www.w3.org/1999/xhtml\"><head>\\n<title>V\\xe9\\x94\\xa5\\xe6\\xb5\\x81\\xe9\\x87\\x8f\\xe8\\xae\\xa1\\xe5\\x8e\\x9f\\xe7\\x90\\x86,V\\xe9\\x94\\xa5\\xe5\\x9e\\x8b\\xe6\\xb5\\x81\\xe9\\x87\\x8f\\xe8\\xae\\xa1,\\xe5\\x86\\x85\\xe9\\x94\\xa5\\xe6\\xb5\\x81\\xe9\\x87\\x8f\\xe8\\xae\\xa1-\\xe6\\xb1\\x9f\\xe8\\x8b\\x8f&#28909;&#21338;RB88&#20307;&#32946;\\xe4\\xbb\\xaa\\xe8\\xa1\\xa8\\xe6\\x9c\\x89\\xe9\\x99\\x90\\xe5\\x85\\xac\\xe5\\x8f\\xb8</title>\\n<link '\n",
+      "url: http://agenciahabitatge.gencat.cat/wps/portal/serveis/convenis%20i%20contractacio/!ut/p/z0/fc09D4IwEAbgvwIDo7kLFoNjowlCIDFOtYspTYUqaflo0J9vSVjxtsu9z3vAgQE3YtaNcNoa0fn9zg-P9HrMLimJS7wVBGmSn7M8JXtEAgVwH8CNobg0xGN1qhrgvXDtTpunBfZR9Ur_dHuqX8PAKXBpjVNfB0y0UooIvY9wUuOs9BShv87K6CnQwRIchXRCarvxe2XAtlj_5nXSzSWlYfgD96k18Q!!/\n",
+      "content-type: text/html; charset=UTF-8\n",
+      "content: b'<!DOCTYPE html>\\n<html lang=\"en\" >\\n<head>\\n<!-- Google Tag Manager -->\\n<script>(function(w,d,s,l,i){w[l]=w[l]||[];w[l].push({\\'gtm.start\\':\\nnew Date().getTime(),event:\\'gtm.js\\'});var f=d.getElementsByTagNa'\n",
+      "url: http://agnenterprises.com/product/optical-bench-metal-double-rod-agn-make-s-s-rod-1-5meter-long-half-shaper-riders/\n",
+      "content-type: text/html; charset=UTF-8\n",
+      "content: b'<!DOCTYPE html>\\n<html lang=\"en-US\">\\n<head>\\n\\t<meta charset=\"UTF-8\">\\n\\t<link rel=\"profile\" href=\"http://gmpg.org/xfn/11\">\\n\\t<link rel=\"pingback\" href=\"http://agnenterprises.com/xmlrpc.php\">\\n\\n\\t\\t\\t<script>wi'\n",
+      "url: http://agorapatos.com/2020/04/14/coronavirus-camara-dos-deputados-aprova-apoio-a-estados-e-municipios/\n",
+      "content-type: text/html; charset=UTF-8\n",
+      "content: b'<!doctype html>\\n<html lang=\"pt-BR\">\\n<head>\\n\\t<meta charset=\"UTF-8\">\\n\\t<meta name=\"viewport\" content=\"width=device-width, initial-scale=1, shrink-to-fit=no\">\\n\\t<link rel=\"profile\" href=\"https://gmpg.org/x'\n",
+      "url: http://agro-product.ru/index.php?option=com_adsmanager&page=show_category&catid=114&order=0&expand=0&Itemid=39\n",
+      "content-type: text/html\n",
+      "content: b'<!DOCTYPE html PUBLIC \"-//W3C//DTD XHTML 1.0 Transitional//EN\" \\n\"http://www.w3.org/TR/xhtml1/DTD/xhtml1-transitional.dtd\">\\n<html xmlns=\"http://www.w3.org/1999/xhtml\" lang=\"ru\" \\nxml:lang=\"ru\"\\n<head>\\n<m'\n",
+      "url: http://ahirukacho.blog81.fc2.com/?mode=edit&rno=5230\n",
+      "content-type: text/html;charset=utf-8\n",
+      "content: b'<?xml version=\"1.0\" encoding=\"EUC-JP\"?>\\r\\n<!DOCTYPE html PUBLIC \"-//W3C//DTD XHTML 1.0 Transitional//EN\" \"http://www.w3.org/TR/xhtml1/DTD/xhtml1-transitional.dtd\">\\r\\n<html xmlns=\"http://www.w3.org/1999/'\n",
+      "url: http://ahshfshygzc.com/shfw/hyindex.xp?doAction=news&menuid=0045\n",
+      "content-type: text/html;charset=GBK\n",
+      "content: b'<!DOCTYPE html PUBLIC \"-//W3C//DTD XHTML 1.0 Transitional//EN\" \"http://www.w3.org/TR/xhtml1/DTD/xhtml1-transitional.dtd\">\\r\\n<html xmlns=\"http://www.w3.org/1999/xhtml\">\\r\\n<head>\\r\\n<meta http-equiv=\"Conten'\n",
+      "url: http://aindahing.info/diary/ain-dah-ing/winter-sale/\n",
+      "content-type: text/html; charset=UTF-8\n",
+      "content: b'\\n<!DOCTYPE html>\\n<html lang=\"ja\">\\n<head>\\n<!-- Basic Page Needs\\n================================================== -->\\n<meta charset=\"UTF-8\">\\n<meta name=\"viewport\" content=\"width=device-width, initial-'\n",
+      "url: http://airambulanceworld.com/medical-flight/arkansas/strong/\n",
+      "content-type: text/html; charset=UTF-8\n",
+      "content: b'<!DOCTYPE html>\\r\\n<!--[if lt IE 7]>      <html class=\"no-js lt-ie9 lt-ie8 lt-ie7\"> <![endif]-->\\r\\n<!--[if IE 7]>         <html class=\"no-js lt-ie9 lt-ie8\"> <![endif]-->\\r\\n<!--[if IE 8]>         <html cla'\n",
+      "url: http://airstation734.blog.fc2.com/blog-entry-1360.html\n",
+      "content-type: text/html;charset=utf-8\n",
+      "content: b'<!DOCTYPE html PUBLIC \"-//W3C//DTD XHTML 1.0 Transitional//EN\" \"http://www.w3.org/TR/xhtml1/DTD/xhtml1-transitional.dtd\">\\r\\n<html xmlns=\"http://www.w3.org/1999/xhtml\"  >\\r\\n<head>\\r\\n<meta http-equiv=\"Cont'\n",
+      "url: http://airstation734.blog.fc2.com/blog-entry-2141.html\n",
+      "content-type: text/html;charset=utf-8\n",
+      "content: b'<!DOCTYPE html PUBLIC \"-//W3C//DTD XHTML 1.0 Transitional//EN\" \"http://www.w3.org/TR/xhtml1/DTD/xhtml1-transitional.dtd\">\\r\\n<html xmlns=\"http://www.w3.org/1999/xhtml\"  >\\r\\n<head>\\r\\n<meta http-equiv=\"Cont'\n",
+      "url: http://aisaikamasa.blog91.fc2.com/blog-entry-201.html\n",
+      "content-type: text/html;charset=utf-8\n",
+      "content: b'<!DOCTYPE html PUBLIC \"-//W3C//DTD XHTML 1.0 Transitional//EN\" \"http://www.w3.org/TR/xhtml1/DTD/xhtml1-transitional.dtd\">\\r\\n<html xmlns=\"http://www.w3.org/1999/xhtml\" xml:lang=\"ja\" lang=\"ja\">\\r\\n<head>\\r\\n'\n",
+      "url: http://aiuas.cn/En_Ct_index_gci_16.html\n",
+      "content-type: text/html; charset=utf-8\n",
+      "content: b'<!DOCTYPE html><!-- saved from url=(0035)http://www.coldec.cn/index.do?store --><html lang=\"zh-cn\"><head><!--  --><meta http-equiv=\"Content-Type\" content=\"text/html; charset=UTF-8\"><meta name=\"viewpor'\n",
+      "url: http://ajansahiska.com/sayfa/genel-bilgi-surgun-i6.html\n",
+      "content-type: text/html; charset=utf-8\n",
+      "content: b'<!doctype html><html dir=\"ltr\" lang=\"tr\" itemid=\"http://ajansahiska.com/sayfa/genel-bilgi-surgun-i6.html\" itemscope=\"\" itemtype=\"http://schema.org/NewsArticle\" xmlns:og=\"http://opengraphprotocol.org/s'\n",
+      "url: http://ajboudoir.blogspot.com/2010/11/evolution-girls.html\n",
+      "content-type: text/html; charset=UTF-8\n",
+      "content: b\"<!DOCTYPE html>\\n<html class='v2' dir='ltr' lang='en'>\\n<head>\\n<link href='https://www.blogger.com/static/v1/widgets/3566091532-css_bundle_v2.css' rel='stylesheet' type='text/css'/>\\n<meta content='width\"\n",
+      "url: http://ajboudoir.blogspot.com/2013/04/couple-boudoir-photography-stony-plain.html\n",
+      "content-type: text/html; charset=UTF-8\n",
+      "content: b\"<!DOCTYPE html>\\n<html class='v2' dir='ltr' lang='en'>\\n<head>\\n<link href='https://www.blogger.com/static/v1/widgets/3566091532-css_bundle_v2.css' rel='stylesheet' type='text/css'/>\\n<meta content='width\"\n",
+      "url: http://ajstovesphotography.co.uk/albums/Mr---Mrs-Routledge-Wedding/297343/Nikky-and-Neils-wedding-a37\n",
+      "content-type: text/html; charset=utf-8\n",
+      "content: b'\\r\\n\\r\\n<!DOCTYPE html>\\r\\n<html lang=\"en-us\">\\r\\n<head><title>Nikky and Neils wedding-a37.jpg | Mr & Mrs Routledge Wedding | AJ.Stoves Photography</title>\\r\\n    <!--meta-->\\r\\n    <meta name=\"viewport\" content='\n",
+      "url: http://akabane.cocolog-nifty.com/hotcafe/2013/12/post-c92f.html\n",
+      "content-type: text/html; charset=UTF-8\n",
+      "content: b'<!DOCTYPE html PUBLIC \"-//W3C//DTD XHTML 1.0 Transitional//EN\"\\n\\t\"http://www.w3.org/TR/xhtml1/DTD/xhtml1-transitional.dtd\">\\n<html xmlns=\"http://www.w3.org/1999/xhtml\" id=\"sixapart-standard\">\\n<head>\\n\\t\\n\\t'\n",
+      "url: http://akihadai.ed.jp/akihadai/news/14298/\n",
+      "content-type: text/html; charset=UTF-8\n",
+      "content: b'<!DOCTYPE html>\\r\\n<html lang=\"ja\">\\r\\n\\r\\n\\r\\n<head>\\r\\n<!-- Google Tag Manager -->\\r\\n<script>(function(w,d,s,l,i){w[l]=w[l]||[];w[l].push({\\'gtm.start\\':\\r\\nnew Date().getTime(),event:\\'gtm.js\\'});var f=d.getElement'\n",
+      "url: http://akorda.info/kz/executive_office/schedule\n",
+      "content-type: text/html; charset=UTF-8\n",
+      "content: b'<!DOCTYPE html>\\n<html lang=\"kz\">\\n<head>\\n  <meta charset=\"UTF-8\">\\n  <meta name=\"viewport\" content=\"width=device-width, initial-scale=1, shrink-to-fit=no\">\\n  <meta http-equiv=\"x-ua-compatible\" content=\"'\n",
+      "url: http://akshskzzzx.yizhumao.com/ProductDetail/7273123.html\n",
+      "content-type: text/html;charset=utf-8\n",
+      "content: b'<!-- cache for /ProductDetail/7273123.html 2024-02-20 23:36:14-->\\r\\n<!DOCTYPE html>\\n<html><!--PHP-->\\n<head>\\n\\t<title>\\xe4\\xba\\xac\\xe4\\xb8\\x9c\\xe5\\xa4\\xa7\\xe8\\x8d\\xaf\\xe6\\x88\\xbf\\xe5\\x85\\xa5\\xe9\\xa9\\xbb\\xe5\\x85\\xa5\\xe5\\x8f\\xa3,\\xe4\\xba\\xac\\xe4\\xb8\\x9c\\xe5\\x8d\\x83\\xe5\\xb1\\xb1\\xe5\\x81\\xa5\\xe5\\xba\\xb7\\xe5\\x85\\xa5\\xe9\\xa9\\xbb</title>\\n\\t<meta name=\"keywords\" c'\n",
+      "url: http://alanandrews.net/live/zuqiu/22386.html\n",
+      "content-type: text/html; charset=utf-8\n",
+      "content: b'\\r\\n<!doctype html>\\r\\n<html>\\r\\n<head>\\r\\n<meta charset=\"utf-8\">\\r\\n<meta http-equiv=\"X-UA-Compatible\" content=\"IE=edge\" >\\r\\n<meta name=\"renderer\" content=\"webkit\">\\r\\n<title>2024\\xe5\\xb9\\xb402\\xe6\\x9c\\x8804\\xe6\\x97\\xa5\\xe6\\x98\\x9f\\xe6\\x9c\\x9f\\xe6\\x97\\xa5 \\xe8\\xb4\\xb9\\xe8\\x90\\xa8\\xe9\\x87\\x8c\\xe5'\n",
+      "url: http://alasdairstuart.com/tag/paseudopod/\n",
+      "content-type: text/html; charset=UTF-8\n",
+      "content: b'<!DOCTYPE html>\\r\\n<html lang=\"en-US\">\\r\\n<head>\\r\\n<meta charset=\"UTF-8\">\\r\\n<meta name=\"viewport\" content=\"width=device-width, initial-scale=1\">\\r\\n\\t <link rel=\"profile\" href=\"https://gmpg.org/xfn/11\"> \\r\\n\\t <m'\n",
+      "url: http://alba.selitondemo.ro/product/405/rochie-pentru-corp-sculpturat.html\n",
+      "content-type: text/html; charset=utf-8\n",
+      "content: b'<!DOCTYPE html>\\n<html lang=\"ro\">\\n<head>\\n\\t<meta http-equiv=\"Content-Type\" content=\"text/html; charset=utf-8\" />\\n<meta http-equiv=\"Content-Script-Type\" content=\"text/javascript\" />\\n<meta http-equiv=\"Con'\n",
+      "url: http://albadoors.ru/internet-magazin/product/dver-magnoliya-lyuks-73l-st-grafit-2000-800\n",
+      "content-type: text/html; charset=utf-8\n",
+      "content: b'\\n        <!doctype html>\\n<html lang=\"ru\">\\n<head>\\n<meta charset=\"utf-8\">\\n<meta name=\"robots\" content=\"all\" />\\n<title>\\xd0\\x9a\\xd1\\x83\\xd0\\xbf\\xd0\\xb8\\xd1\\x82\\xd1\\x8c \\xd0\\xb2 \\xd0\\x9d\\xd0\\xb8\\xd0\\xb6\\xd0\\xbd\\xd0\\xb5 \\xd0\\x9d\\xd0\\xbe\\xd0\\xb2\\xd0\\xb3\\xd0\\xbe\\xd1\\x80\\xd0\\xbe\\xd0\\xb4\\xd0\\xb5 - \\xd0\\x9c\\xd0\\xb5\\xd0\\xb6\\xd0\\xba\\xd0\\xbe\\xd0\\xbc\\xd0\\xbd\\xd0\\xb0\\xd1\\x82\\xd0\\xbd\\xd0\\xb0\\xd1\\x8f \\xd0\\xb4\\xd0\\xb2\\xd0\\xb5\\xd1\\x80\\xd1\\x8c '\n",
+      "url: http://albergueweb1.uva.es/guias/guias2122/55050/1/\n",
+      "content-type: text/html;charset=UTF-8\n",
+      "content: b'<!DOCTYPE HTML PUBLIC \"-//W3C//DTD HTML 3.2 Final//EN\">\\n<html>\\n <head>\\n  <title>Index of /guias/guias2122/55050/1</title>\\n </head>\\n <body>\\n<h1>Index of /guias/guias2122/55050/1</h1>\\n<pre><img src=\"/ic'\n",
+      "url: http://albinism.life/\n",
+      "content-type: text/html; charset=utf-8\n",
+      "content: b'<!doctype html>\\n<html data-adblockkey=\"MFwwDQYJKoZIhvcNAQEBBQADSwAwSAJBANDrp2lz7AOmADaN8tA50LsWcjLFyQFcb/P2Txc58oYOeILb3vBw7J6f4pamkAQVSQuqYsKx3YzdUHCvbVZvFUsCAwEAAQ==_GrU/PDdGnTPi+4NwAyrXdT3uKJnQvoKe'\n",
+      "url: http://alerugby.over-blog.net/tag/carnet%20noir/\n",
+      "content-type: text/html; charset=UTF-8\n",
+      "content: b'<!DOCTYPE html>\\n<html lang=\"fr\">\\n    <head>             \\n  \\n          \\n          \\n                                                                                                    \\n                 '\n",
+      "url: http://alexandravoronina.ru/prinimajte-reshenie-i-dejstvujte-intervyu-s-miloj-kolokolovoj/?replytocom=514\n",
+      "content-type: text/html; charset=utf-8\n",
+      "content: b'<!DOCTYPE html>\\n<html lang=\"ru-RU\" itemscope itemtype=\"http://schema.org/Product\" class=\"no-js\">\\n<head>\\n  <meta charset=\"utf-8\">\\n  <script type=\"text/javascript\">\\n  // <![CDATA[\\n  // < ![CDATA[\\n  var '\n",
+      "url: http://alexboom.de/a-most-humble-year-2021-in-review/\n",
+      "content-type: text/html; charset=UTF-8\n",
+      "content: b'<!DOCTYPE HTML>\\r\\n<html lang=\"de-DE\">\\r\\n<head>\\r\\n\\t<meta charset=\"UTF-8\">\\r\\n\\t<title>A most humble year: 2021 in review &#8211; Der Content mit dem Knalleffekt!</title>\\n<meta name=\\'robots\\' content=\\'max-imag'\n",
+      "url: http://alexnerygravura.blogspot.com/\n",
+      "content-type: text/html; charset=UTF-8\n",
+      "content: b\"<!DOCTYPE html>\\n<html dir='ltr' xmlns='http://www.w3.org/1999/xhtml' xmlns:b='http://www.google.com/2005/gml/b' xmlns:data='http://www.google.com/2005/gml/data' xmlns:expr='http://www.google.com/2005/\"\n",
+      "url: http://alicegame.xsrv.jp/hina/old_log.php?room_no=4742&reverse_log=on&heaven_only=on&icon=on&personal_result=on&time=on&db_no=5\n",
+      "content-type: text/html; charset=UTF-8\n",
+      "content: b'<!DOCTYPE html>\\n<html lang=\"ja\">\\n<head>\\n<meta charset=\"UTF-8\">\\n<title>[4742\\xe7\\x95\\xaa\\xe5\\x9c\\xb0] \\xe3\\x80\\x90\\xe9\\x9b\\x9b4062\\xe3\\x80\\x91\\xe3\\x82\\x84\\xe3\\x82\\x8b\\xe5\\xa4\\xab\\xe9\\x81\\x94\\xe3\\x81\\xae\\xe5\\xb8\\x8c\\xe6\\x9c\\x9b\\xe6\\xb4\\xbe\\xe7\\x94\\x9f\\xe8\\xb6\\x85\\xe9\\x97\\x87 - \\xe6\\xb1\\x9d\\xe3\\x81\\xaf\\xe4\\xba\\xba\\xe7\\x8b\\xbc\\xe3\\x81\\xaa\\xe3\\x82\\x8a\\xe3\\x82\\x84\\xef\\xbc\\x9f[\\xe9\\x81\\x8e\\xe5\\x8e\\xbb\\xe3\\x83\\xad\\xe3\\x82\\xb0]</title>\\n<link rel=\"stylesheet'\n",
+      "url: http://alicegame.xsrv.jp/hina/old_log.php?room_no=985&heaven_talk=on&heaven_only=on&add_role=on&time=on&icon=on&db_no=1\n",
+      "content-type: text/html; charset=UTF-8\n",
+      "content: b'<!DOCTYPE html>\\n<html lang=\"ja\">\\n<head>\\n<meta charset=\"UTF-8\">\\n<title>[985\\xe7\\x95\\xaa\\xe5\\x9c\\xb0] \\xe3\\x80\\x90\\xe9\\x9b\\x9b817\\xe3\\x80\\x91\\xe3\\x82\\x84\\xe3\\x82\\x8b\\xe5\\xa4\\xab\\xe3\\x81\\x9f\\xe3\\x81\\xa1\\xe3\\x81\\xae\\xe8\\xb6\\x85\\xe9\\x97\\x87\\xe9\\x8d\\x8b - \\xe6\\xb1\\x9d\\xe3\\x81\\xaf\\xe4\\xba\\xba\\xe7\\x8b\\xbc\\xe3\\x81\\xaa\\xe3\\x82\\x8a\\xe3\\x82\\x84\\xef\\xbc\\x9f[\\xe9\\x81\\x8e\\xe5\\x8e\\xbb\\xe3\\x83\\xad\\xe3\\x82\\xb0]</title>\\n<link rel=\"stylesheet\" href=\"'\n",
+      "url: http://aliyavaleeva.ru/?attachment_id=4302\n",
+      "content-type: text/html; charset=UTF-8\n",
+      "content: b'<!DOCTYPE html>\\n<html lang=\"ru-RU\">\\n<head>\\n\\t<meta charset=\"UTF-8\" />\\n\\t\\n\\t<title>_J2A0862 | aliyavaleeva.ru</title>\\n\\n\\t\\n\\t\\t\\t\\n\\t\\t\\t\\t\\t\\t<meta name=\"viewport\" content=\"width=device-width,initial-scale=1,user-sc'\n",
+      "url: http://allthink.com/2158292/conserje\n",
+      "content-type: text/html; charset=UTF-8\n",
+      "content: b'<!DOCTYPE html>\\r\\n<html lang=\"en\">\\r\\n<head>\\r\\n<title>Conserje (tt0149969)</title>\\r\\n<meta charset=\"utf-8\" />\\r\\n<meta name=\"viewport\" content=\"width=device-width, initial-scale=1.0\" />\\r\\n<link rel=\"icon\" typ'\n",
+      "url: http://almanaquedasirmandades.gal/efemeride/maruxa-mallo-2036-02-06/\n",
+      "content-type: text/html; charset=UTF-8\n",
+      "content: b'<!DOCTYPE html>\\n<html class=\"no-js\" lang=\"gl-ES\" xmlns:fb=\"https://www.facebook.com/2008/fbml\" xmlns:addthis=\"https://www.addthis.com/help/api-spec\" >\\n<head>\\n<meta charset=\"UTF-8\">\\n<meta name=\"viewpor'\n",
+      "url: http://almaz.zp.ua/product/53857/materinskaya-plata-sfm2-biostar-a58md-bulk-amd-a55.html\n",
+      "content-type: text/html; charset=windows-1251\n",
+      "content: b'<!DOCTYPE HTML PUBLIC \"-//W3C//DTD HTML 4.01 Transitional//EN\" \"http://www.w3.org/TR/html4/loose.dtd\">\\n<html>\\n<head>\\n<title>\\xca\\xf3\\xef\\xe8\\xf2\\xfc \\xcc\\xe0\\xf2\\xe5\\xf0\\xe8\\xed\\xf1\\xea\\xe0\\xff \\xcf\\xeb\\xe0\\xf2\\xe0 sFM2+ Biostar A58MD Bulk AMD A55, 2*DDR3, 4*SATAII,'\n",
+      "url: http://alpacafarmtrivia.herokuapp.com/questions/18791\n",
+      "content-type: text/html; charset=utf-8\n",
+      "content: b'<!DOCTYPE html>\\n<html>\\n<head>\\n  <title>AlpacaFarm</title>\\n  <link rel=\"stylesheet\" media=\"all\" href=\"/assets/application-b12c99378c13cc251766fb6bbdf0395b1c98c9238c81e6ed62689b4091eb9c8a.css\" data-turb'\n",
+      "url: http://alrashadmarine.com/yefm-57102seti\n",
+      "content-type: text/html;charset=UTF-8\n",
+      "content: b'<!DOCTYPE html>\\r\\n<html lang=\"ja\">\\r\\n<head>\\r\\n    <meta charset=\"utf-8\">\\r\\n    <meta content=\"no-cache\" http-equiv=\"Pragma\"/>\\r\\n    <meta content=\"no-store, no-cache, must-revalidate\" http-equiv=\"Cache-Con'\n",
+      "url: http://altawap.ru/forum/index.php?topic=3\n",
+      "content-type: text/html; charset=UTF-8\n",
+      "content: b'<!DOCTYPE html>\\n<html lang=\"en\">\\n<head>\\n<meta charset=\"utf-8\">\\n<meta http-equiv=\"X-UA-Compatible\" content=\"IE=edge\">\\n<meta name=\"viewport\" content=\"width=device-width, initial-scale=1.0, maximum-scale'\n",
+      "url: http://alternatywadlalukowa.pl/question/%D0%B1%D1%80%D0%BE%D1%88%D1%83-%D0%BA%D1%83%D1%80%D0%B8%D1%82%D1%8C-%D0%B8-%D0%BF%D0%B8%D1%82%D1%8C-%D0%BA%D0%B0%D0%B7%D0%B0%D1%87%D0%B5%D0%BD%D0%BA%D0%BE/\n",
+      "content-type: text/html; charset=UTF-8\n",
+      "content: b'<!DOCTYPE html>\\n<html lang=\"pl-PL\">\\n\\n<head>\\n\\t<meta charset=\\'UTF-8\\'>\\n\\t<meta name=\"viewport\" content=\"width=device-width, initial-scale=1\">\\n\\t<link rel=\"profile\" href=\"http://gmpg.org/xfn/11\">\\n\\t\\t<title>\\xd0'\n",
+      "url: http://alternatywadlalukowa.pl/question/%D0%BA%D0%BE%D0%BB%D0%BC%D0%B5-%D1%86%D0%B5%D0%BD%D0%B0-%D0%B8-%D0%B8%D0%BD%D1%81%D1%82%D1%80%D1%83%D0%BA%D1%86%D0%B8%D1%8F-%D0%BF%D0%BE-%D0%BF%D1%80%D0%B8%D0%BC%D0%B5%D0%BD%D0%B5%D0%BD%D0%B8%D1%8E/\n",
+      "content-type: text/html; charset=UTF-8\n",
+      "content: b'<!DOCTYPE html>\\n<html lang=\"pl-PL\">\\n\\n<head>\\n\\t<meta charset=\\'UTF-8\\'>\\n\\t<meta name=\"viewport\" content=\"width=device-width, initial-scale=1\">\\n\\t<link rel=\"profile\" href=\"http://gmpg.org/xfn/11\">\\n\\t\\t<title>\\xd0'\n",
+      "url: http://alternatywadlalukowa.pl/question/piano-di-dieta-vegan-detox-5-giorni/\n",
+      "content-type: text/html; charset=UTF-8\n",
+      "content: b'<!DOCTYPE html>\\n<html lang=\"pl-PL\">\\n\\n<head>\\n\\t<meta charset=\\'UTF-8\\'>\\n\\t<meta name=\"viewport\" content=\"width=device-width, initial-scale=1\">\\n\\t<link rel=\"profile\" href=\"http://gmpg.org/xfn/11\">\\n\\t\\t<title>P'\n",
+      "url: http://amamiguide.main.jp/buyer101/breastfeeding-stylish-pajamas-301964.html\n",
+      "content-type: text/html; charset=UTF-8\n",
+      "content: b'\\n\\n\\n\\n\\n\\n\\n\\n\\n\\n\\n'\n",
+      "url: http://amamiguide.main.jp/cateogry59/jade-3rd-row-175365.html\n",
+      "content-type: text/html; charset=UTF-8\n",
+      "content: b'\\n\\n\\n\\n\\n\\n\\n\\n\\n\\n\\n'\n",
+      "url: http://amamiguide.main.jp/module15/panasonic-switch-initialization-42260.html\n",
+      "content-type: text/html; charset=UTF-8\n",
+      "content: b'\\n\\n\\n\\n\\n\\n\\n\\n\\n\\n\\n'\n",
+      "url: http://amazonastotal.com.br/marca-de-beleza-cria-kits-de-presentes-para-brincadeiras-de-amigo-secreto/\n",
+      "content-type: text/html; charset=UTF-8\n",
+      "content: b'<!DOCTYPE html>\\r\\n<html lang=\"pt-BR\" xmlns:fb=\"https://www.facebook.com/2008/fbml\" xmlns:addthis=\"https://www.addthis.com/help/api-spec\" >\\r\\n<head>\\r\\n\\t\\t\\t<meta charset=\"UTF-8\" />\\r\\n\\t\\t<meta name=\"viewport\" '\n",
+      "url: http://amenohitorigoto.blog.fc2.com/blog-entry-3492.html\n",
+      "content-type: text/html;charset=utf-8\n",
+      "content: b'<!DOCTYPE html>\\r\\n<html >\\r\\n<head>\\r\\n<meta charset=\"utf-8\">\\r\\n<meta name=\"author\" content=\"\\xe3\\x81\\xaf\\xe3\\x82\\x8b\" />\\r\\n<meta name=\"description\" content=\"\\xe3\\x82\\xa2\\xe3\\x83\\xa1\\xe3\\x82\\xb7\\xe3\\x83\\xa7\\xe2\\x99\\x80\\xe3\\x81\\xae*\\xe3\\x81\\x82\\xe3\\x82\\x81* \\xef\\xbc\\x86 \\xe4\\xb8\\x89\\xe6\\xaf\\x9b\\xe7\\x8c\\xab\\xe3\\x83\\xaa\\xe3\\x83\\xaa\\xe3\\x83\\xa9\\xe3\\x83\\xa9 \\xe3\\x81\\xa8 \\xe9\\xa3\\xbc\\xe3\\x81\\x84\\xe4\\xb8\\xbb\\xe3'\n",
+      "url: http://amis.zoo-logique.org/forum/index.php?sid=afab564828c7d94b8cb36ac01d136333\n",
+      "content-type: text/html\n",
+      "content: b'<!DOCTYPE HTML PUBLIC \"-//W3C//DTD HTML 4.01 Transitional//EN\">\\n<html dir=\"LTR\">\\n<head>\\n<meta http-equiv=\"Content-Type\" content=\"text/html; charset=ISO-8859-1\">\\n<meta http-equiv=\"Content-Style-Type\" c'\n",
+      "url: http://amorebello.blogspot.com/2005/11/pictures-galore.html\n",
+      "content-type: text/html; charset=UTF-8\n",
+      "content: b\"<!DOCTYPE html>\\n<html class='v2' dir='ltr' lang='en-US'>\\n<head>\\n<link href='https://www.blogger.com/static/v1/widgets/3566091532-css_bundle_v2.css' rel='stylesheet' type='text/css'/>\\n<meta content='wi\"\n",
+      "url: http://analforum.net/viewforum.php?f=6&start=39750\n",
+      "content-type: text/html; charset=UTF-8\n",
+      "content: b'<!DOCTYPE html PUBLIC \"-//W3C//DTD XHTML 1.0 Strict//EN\" \"http://www.w3.org/TR/xhtml1/DTD/xhtml1-strict.dtd\">\\r\\n<html xmlns=\"http://www.w3.org/1999/xhtml\" dir=\"ltr\" lang=\"en-gb\" xml:lang=\"en-gb\">\\r\\n<hea'\n",
+      "url: http://analogical-dictionary.sensagent.com/ma214266/ML-en-en/\n",
+      "content-type: text/html;charset=UTF-8\n",
+      "content: b'\\r\\n\\r\\n\\r\\n\\r\\n\\r\\n\\r\\n\\r\\n\\r\\n\\r\\n\\r\\n\\r\\n\\r\\n\\r\\n\\r\\n\\r\\n\\r\\n\\r\\n\\r\\n\\r\\n\\r\\n\\r\\n\\r\\n\\r\\n\\r\\n\\r\\n\\r\\n\\r\\n\\r\\n\\r\\n\\r\\n<!DOCTYPE HTML PUBLIC \"-//W3C//DTD HTML 4.01 Transitional//EN\" \"http://www.w3.org/TR/html4/loose.dtd\">\\r\\n<html version=\"-//w3c//dtd html 4.01'\n",
+      "url: http://ando-travel.com.ua/preuve-en-tenant-bad-comme-avis-sur-un-blog-en/\n",
+      "content-type: text/html; charset=UTF-8\n",
+      "content: b'<!DOCTYPE html>\\n\\n<html lang=\"ru-RU\">\\n\\n<head itemscope=\"itemscope\" itemtype=\"https://schema.org/WebSite\" >\\n\\t<meta charset=\"UTF-8\">\\n\\t<meta name=\"viewport\" content=\"width=device-width, initial-scale=1\">\\n'\n",
+      "url: http://anearful.blogspot.com/2017/03/collapsing-into-nordic-affects.html\n",
+      "content-type: text/html; charset=UTF-8\n",
+      "content: b\"<!DOCTYPE html>\\n<html dir='ltr' xmlns='http://www.w3.org/1999/xhtml' xmlns:b='http://www.google.com/2005/gml/b' xmlns:data='http://www.google.com/2005/gml/data' xmlns:expr='http://www.google.com/2005/\"\n",
+      "url: http://animaths.com/tag/car-hire-usa-age/\n",
+      "content-type: text/html; charset=UTF-8\n",
+      "content: b'<!DOCTYPE html>\\n<html lang=\"en-US\">\\n<head>\\n\\t<meta charset=\"UTF-8\" />\\n\\t<meta name=\"viewport\" content=\"width=device-width, initial-scale=1\" />\\n<meta name=\\'robots\\' content=\\'max-image-preview:large\\' />\\n<t'\n",
+      "url: http://ankaemlak.com.tr/emlak/agaoglu-my-world-europe-satilik-2plus1-daire/\n",
+      "content-type: text/html; charset=UTF-8\n",
+      "content: b'<!DOCTYPE html>\\r\\n<html lang=\"tr\">\\r\\n<head>\\r\\n<meta charset=\"UTF-8\" />\\r\\n<meta name=\"viewport\" content=\"width=device-width, initial-scale=1,user-scalable=no\">\\r\\n<link rel=\"pingback\" href=\"http://ankaemlak.'\n",
+      "url: http://anmedio.pl/\n",
+      "content-type: text/html; charset=UTF-8\n",
+      "content: b'<!DOCTYPE html>\\r\\n\\r\\n<html lang=\"pl-PL\" class=\"no-js\">\\r\\n<head>\\r\\n\\t\\r\\n\\t<meta charset=\"UTF-8\">\\r\\n\\t\\r\\n\\t<meta name=\"viewport\" content=\"width=device-width, initial-scale=1, maximum-scale=1, user-scalable=0\" /><t'\n",
+      "url: http://annapurnapharmacy.com/drug/7406-aktive-sacro-lumbar-support-xxl\n",
+      "content-type: text/html; charset=UTF-8\n",
+      "content: b'<!DOCTYPE html>\\n<html lang=\"en\">\\n<head>\\n    <meta charset=\"utf-8\">\\n    <meta http-equiv=\"X-UA-Compatible\" content=\"IE=edge\">\\n    <meta name=\"viewport\" content=\"width=device-width, initial-scale=1\">\\n  '\n",
+      "url: http://annesnyder.org/2014/06/30/canaries-in-the-cultural-coal-mine/caneries-in-the-cultural-coalmine/\n",
+      "content-type: text/html; charset=UTF-8\n",
+      "content: b'<!DOCTYPE html>\\r\\n<html lang=\"en-US\">\\r\\n<head>\\r\\n\\r\\n\\t<meta charset=\"UTF-8\">\\r\\n\\t<meta http-equiv=\"X-UA-Compatible\" content=\"IE=edge\">\\r\\n\\t<meta name=\"viewport\" content=\"width=device-width, initial-scale=1\">\\r\\n'\n",
+      "url: http://anoia.pigaim.cat/manual/de/howto/htaccess.html\n",
+      "content-type: text/html\n",
+      "content: b'<?xml version=\"1.0\" encoding=\"ISO-8859-1\"?>\\n<!DOCTYPE html PUBLIC \"-//W3C//DTD XHTML 1.0 Strict//EN\" \"http://www.w3.org/TR/xhtml1/DTD/xhtml1-strict.dtd\">\\n<html xmlns=\"http://www.w3.org/1999/xhtml\" lan'\n",
+      "url: http://anokovcheg.ru/%D0%BF%D1%80%D0%BE%D0%B3%D1%80%D0%B0%D0%BC%D0%BC%D1%8B-%D0%BF%D1%80%D0%BE%D0%B5%D0%BA%D1%82%D1%8B-%D0%B0%D0%BA%D1%86%D0%B8%D0%B8/%D0%B0%D0%BA%D1%86%D0%B8%D0%B8-%D0%BF%D0%BE-%D1%81%D0%B1%D0%BE%D1%80%D1%83-%D0%BF%D0%BE%D0%BC%D0%BE%D1%89%D0%B8/\n",
+      "content-type: text/html; charset=UTF-8\n",
+      "content: b'        <!DOCTYPE html>\\n        <html lang=\"ru-RU\">\\n        \\n<head>\\n\\t\\t<meta charset=\"UTF-8\">\\n\\t\\t<meta name=\"viewport\" content=\"width=device-width, initial-scale=1, minimum-scale=1\">\\n\\t\\t<link rel=\"profil'\n",
+      "url: http://anokovcheg.ru/category/%D0%B0%D0%BA%D1%86%D0%B8%D0%B8-%D0%BF%D0%BE-%D1%81%D0%B1%D0%BE%D1%80%D1%83-%D0%BF%D0%BE%D0%BC%D0%BE%D1%89%D0%B8-2/\n",
+      "content-type: text/html; charset=UTF-8\n",
+      "content: b'        <!DOCTYPE html>\\n        <html lang=\"ru-RU\">\\n        \\n<head>\\n\\t\\t<meta charset=\"UTF-8\">\\n\\t\\t<meta name=\"viewport\" content=\"width=device-width, initial-scale=1, minimum-scale=1\">\\n\\t\\t<link rel=\"profil'\n",
+      "url: http://anoreksja.org.pl/viewtopic.php?f=17&p=2535590&sid=8b5b01432fb4334c6fa2170f66894cef\n",
+      "content-type: text/html; charset=UTF-8\n",
+      "content: b'<!DOCTYPE html PUBLIC \"-//W3C//DTD XHTML 1.0 Transitional//EN\" \"http://www.w3.org/TR/xhtml1/DTD/xhtml1-transitional.dtd\">\\n<html xmlns=\"http://www.w3.org/1999/xhtml\" dir=\"ltr\" lang=\"pl-PL\" xml:lang=\"pl'\n",
+      "url: http://anosaka.blog.fc2.com/blog-entry-309.html\n",
+      "content-type: text/html;charset=utf-8\n",
+      "content: b'<?xml version=\"1.0\" encoding=\"utf-8\"?>\\r\\n<!DOCTYPE html PUBLIC \"-//W3C//DTD XHTML 1.0 Transitional//EN\" \"http://www.w3.org/TR/xhtml1/DTD/xhtml1-transitional.dtd\">\\r\\n<html xmlns=\"http://www.w3.org/1999/x'\n",
+      "url: http://another-place.cocolog-nifty.com/field/2012/10/post-0bab.html\n",
+      "content-type: text/html; charset=UTF-8\n",
+      "content: b'<!DOCTYPE html PUBLIC \"-//W3C//DTD XHTML 1.0 Transitional//EN\" \"http://www.w3.org/TR/xhtml1/DTD/xhtml1-transitional.dtd\">\\n\\n<html xmlns=\"http://www.w3.org/1999/xhtml\">\\n<head>\\n<meta http-equiv=\"Content-'\n",
+      "url: http://anticult.minibird.jp/cgi/cgi09/light.cgi?res=7806\n",
+      "content-type: text/html\n",
+      "content: b'<!DOCTYPE HTML PUBLIC \"-//W3C//DTD HTML 4.01 Transitional//EN\">\\n<html lang=\"ja\">\\n<head>\\n<meta http-equiv=\"content-type\" content=\"text/html; charset=shift_jis\">\\n<meta http-equiv=\"content-script-type\" c'\n",
+      "url: http://antoinepoulain.com/html/work/cinema/shade/shade_09.htm\n",
+      "content-type: text/html\n",
+      "content: b'<!DOCTYPE HTML PUBLIC \"-//W3C//DTD HTML 4.01 Transitional//EN\"\\n\"http://www.w3.org/TR/html4/loose.dtd\">\\n<html><!-- InstanceBegin template=\"/Templates/base_noire.dwt\" codeOutsideHTMLIsLocked=\"false\" -->'\n",
+      "url: http://antoninosaggio.blogspot.com/2010/12/convegno-marcello-piacentini-161718.html\n",
+      "content-type: text/html; charset=UTF-8\n",
+      "content: b\"<!DOCTYPE html>\\n<html class='v2' dir='ltr' lang='en-GB'>\\n<head>\\n<link href='https://www.blogger.com/static/v1/widgets/3566091532-css_bundle_v2.css' rel='stylesheet' type='text/css'/>\\n<meta content='wi\"\n",
+      "url: http://aozemi.blog.fc2.com/blog-category-465.html\n",
+      "content-type: text/html;charset=utf-8\n",
+      "content: b'<!DOCTYPE HTML PUBLIC \"-//W3C//DTD HTML 4.01 Transitional//EN\" \"http://www.w3.org/TR/html4/loose.dtd\">\\r\\n<html>\\r\\n<head>\\r\\n\\r\\n<script type=\"text/x-mathjax-config\">\\r\\n  MathJax.Hub.Config({ tex2jax: { inlin'\n",
+      "url: http://aperos-musique-blesle.com/gyslain/\n",
+      "content-type: text/html; charset=UTF-8\n",
+      "content: b'<!DOCTYPE html>\\r\\n<!--[if IE 7]>\\r\\n<html class=\"ie ie7\" dir=\"ltr\" lang=\"fr-FR\" prefix=\"og: https://ogp.me/ns#\">\\r\\n<![endif]-->\\r\\n<!--[if IE 8]>\\r\\n<html class=\"ie ie8\" dir=\"ltr\" lang=\"fr-FR\" prefix=\"og: htt'\n",
+      "url: http://apiros.hu/2-fok-magas-vrnyoms-kezels-s-tpllkozs-826257.php\n",
+      "content-type: text/html; charset=UTF-8\n",
+      "content: b'<!DOCTYPE html>\\r\\n<html prefix=\"og: http://ogp.me/ns#\">\\r\\n<head>\\r\\n<meta http-equiv=\"content-type\" content=\"text/html; charset=UTF-8\">\\r\\n<meta charset=\"UTF-8\">\\r\\n<meta name=\"viewport\" content=\"width=device'\n",
+      "url: http://apocalypseblogger.apocalypseradio.com/2019/02/apocalypse-radio-five-hundred-and_17.html\n",
+      "content-type: text/html; charset=UTF-8\n",
+      "content: b'<!DOCTYPE html PUBLIC \"-//W3C//DTD XHTML 1.0 Strict//EN\" \"http://www.w3.org/TR/xhtml1/DTD/xhtml1-strict.dtd\">\\n\\n \\n\\n<html xmlns=\"http://www.w3.org/1999/xhtml\" xml:lang=\"en\" lang=\"en\">\\n\\n<head> \\n\\n  <title'\n",
+      "url: http://apolloonline.ru/events/blagotvoritelnaya-baraholka/attachment/1619533193_875138_75/\n",
+      "content-type: text/html; charset=UTF-8\n",
+      "content: b'<!DOCTYPE html>\\n<html>\\n<head>\\n    <meta charset=\"UTF-8\" />\\n    <meta http-equiv=\"X-UA-Compatible\" content=\"\">\\n    <meta name=\"viewport\" content=\"width=device-width, user-scalable=no\">\\n    <link rel=\"s'\n",
+      "url: http://app.cm-pontadelgada.pt/895?geo_article_id=7300&list_of=nearby_list&page_articles=1&page_nearby_list=240&page_opinion=1&page_related=1&page_suggestions=180\n",
+      "content-type: text/html; charset=utf-8\n",
+      "content: b'<!DOCTYPE HTML>\\n<html lang=\"pt-PT\">\\n\\n<head>\\n  <title>Casa da M\\xc3\\xa3e de Deus | Visit Ponta Delgada</title>\\n  <link rel=\"stylesheet\" type=\"text/css\" href=\"/assets/wm-smile/stylesheets/frontoffice/mandator'\n",
+      "url: http://aquadina.com/hakone/category/19216/\n",
+      "content-type: text/html\n",
+      "content: b'<!DOCTYPE html>\\n<html lang=\"ja\">\\n<head>\\n<meta charset=\"utf-8\">\\n<title>\\xe4\\xbb\\x99\\xe7\\x9f\\xb3\\xe5\\x8e\\x9f\\xe3\\x81\\xae\\xe3\\x81\\x82\\xe3\\x82\\x93\\xe3\\x81\\xbf\\xe3\\x81\\xa4\\xef\\xbc\\x88\\xe5\\x85\\xa8\\xe5\\xb8\\xad\\xe7\\xa6\\x81\\xe7\\x85\\x99\\xe3\\x83\\xbb\\xe5\\x88\\x86\\xe7\\x85\\x99\\xef\\xbc\\x89\\xef\\xbc\\x881\\xe4\\xbb\\xb6\\xef\\xbc\\x89 [\\xe3\\x82\\xa2\\xe3\\x82\\xaf\\xe3\\x82\\xa2\\xe3\\x83\\x87\\xe3\\x82\\xa3\\xe3\\x83\\xbc\\xe3\\x83\\x8a\\xe7\\xae\\xb1\\xe6\\xa0\\xb9\\xe7\\x89\\x88]</title><meta name=\"description\" con'\n",
+      "url: http://aquanaut.com/bin/trg/aquanaut.com/clubs/DSA\n",
+      "content-type: text/html; charset=\"utf-8\"\n",
+      "content: b'<!doctype html>\\n<html>\\n<head>\\n<title>Dive Station Aquaventure Sdn Bhd</title>\\n<meta name=\"keywords\" content=\"Aquanaut, dive club, Dive Station Aquaventure Sdn Bhd\">\\n<meta name=\"description\" content=\"A'\n",
+      "url: http://araki298.blog109.fc2.com/blog-entry-1321.html\n",
+      "content-type: text/html;charset=utf-8\n",
+      "content: b'<?xml version=\"1.0\" encoding=\"EUC-JP\"?>\\r\\n<!DOCTYPE html PUBLIC \"-//W3C//DTD XHTML 1.0 Transitional//EN\" \"http://www.w3.org/TR/xhtml1/DTD/xhtml1-transitional.dtd\">\\r\\n<html xmlns=\"http://www.w3.org/1999/'\n",
+      "url: http://arch-group.org/projects/123\n",
+      "content-type: text/html; charset=utf-8\n",
+      "content: b'<!DOCTYPE html>\\n<html lang=\\'en\\'>\\n<head>\\n<meta charset=\\'utf-8\\'>\\n<meta content=\\'IE=Edge,chrome=1\\' http-equiv=\\'X-UA-Compatible\\'>\\n<meta content=\\'width=device-width\\' name=\\'viewport\\'>\\n<link href=\"/favicon.p'\n",
+      "url: http://architekciplus.pl/archwiel19.html\n",
+      "content-type: text/html\n",
+      "content: b'<!DOCTYPE html>\\r\\n<html class=\"no-js\" lang=\"en\">\\r\\n<head>\\r\\n<meta charset=\"utf-8\">\\r\\n<!--[if IE]> <meta http-equiv=\"X-UA-Compatible\" content=\"IE=edge\"> <![endif]-->\\r\\n<title>ARCHITEKCIplus</title>\\r\\n<meta n'\n",
+      "url: http://architektura.info.pl/2021/09/17/wanna-symbol-komfortu-w-twojej-lazience/\n",
+      "content-type: text/html; charset=UTF-8\n",
+      "content: b'<!DOCTYPE html><html lang=\"pl-PL\"><head><meta charset=\"UTF-8\"><meta name=\"viewport\" content=\"width=device-width, initial-scale=1\"><link rel=\"stylesheet\" media=\"print\" onload=\"this.onload=null;this.med'\n",
+      "url: http://archive.poppytalk.com/2011/10/art-tutorial-drink-up-these-kitchen.html?showComment=1320577585029\n",
+      "content-type: text/html; charset=UTF-8\n",
+      "content: b\"<!DOCTYPE html>\\n<html class='v2 item' dir='ltr' xmlns='http://www.w3.org/1999/xhtml' xmlns:b='http://www.google.com/2005/gml/b' xmlns:data='http://www.google.com/2005/gml/data' xmlns:expr='http://www.\"\n",
+      "url: http://archive.poppytalk.com/2012/02/6-fall-2012-fashion-week-must-haves.html?showComment=1329509673641\n",
+      "content-type: text/html; charset=UTF-8\n",
+      "content: b\"<!DOCTYPE html>\\n<html class='v2 item' dir='ltr' xmlns='http://www.w3.org/1999/xhtml' xmlns:b='http://www.google.com/2005/gml/b' xmlns:data='http://www.google.com/2005/gml/data' xmlns:expr='http://www.\"\n",
+      "url: http://archive.urbc.ru/3738-post3738.html\n",
+      "content-type: text/html\n",
+      "content: b'<!DOCTYPE html>\\r\\n<html lang=\"ru\">\\r\\n<head>\\r\\n<meta http-equiv=\"Content-Type\" content=\"text/html; charset=windows-1251\" />\\r\\n<title>\\xc5\\xea\\xe0\\xf2\\xe5\\xf0\\xe8\\xed\\xe1\\xf3\\xf0\\xe3\\xf1\\xea\\xe0\\xff \\xf4\\xe8\\xf0\\xec\\xe0 \\xab\\xca\\xee\\xed\\xf4\\xe8\\xbb \\xe2\\xee\\xe7\\xe3\\xeb\\xe0\\xe2\\xe8\\xf2 \\xed\\xee\\xe2\\xf3\\xfe \\xf0\\xee\\xf1\\xf1\\xe8\\xe9\\xf1\\xea\\xf3\\xfe \\xea\\xee\\xed\\xe4\\xe8\\xf2\\xe5\\xf0\\xf1\\xea\\xf3\\xfe \\xea'\n",
+      "url: http://archive.urbc.ru/4247-post4247.html\n",
+      "content-type: text/html\n",
+      "content: b'<!DOCTYPE html>\\r\\n<html lang=\"ru\">\\r\\n<head>\\r\\n<meta http-equiv=\"Content-Type\" content=\"text/html; charset=windows-1251\" />\\r\\n<title>\\xcf\\xe0\\xec\\xff\\xf2\\xed\\xfb\\xe5 \\xe4\\xe0\\xf2\\xfb &raquo; \\xc8\\xed\\xf4\\xee\\xf0\\xec\\xe0\\xf6\\xe8\\xee\\xed\\xed\\xee-\\xe0\\xed\\xe0\\xeb\\xe8\\xf2\\xe8\\xf7\\xe5\\xf1\\xea\\xee\\xe5 \\xe0\\xe3\\xe5\\xed\\xf2\\xf1\\xf2\\xe2\\xee \\xab\\xd3\\xf0\\xe0\\xeb\\xc1\\xe8\\xe7\\xed\\xe5\\xf1\\xca'\n",
+      "url: http://archive.wn.com/2004/01/02/1400/employment.html\n",
+      "content-type: text/html\n",
+      "content: b'<table border=\"0\" bgcolor=\"#ffffff\" cellpadding=\"4\" cellspacing=\"0\" width=\"100%\" color=\"#ffffff\"><tr><td><table border=\"0\" bgcolor=\"#d0d0d0\" cellpadding=\"2\" cellspacing=\"2\" width=\"100%\" color=\"#d0d0d0'\n",
+      "url: http://archive2016.muenchener-biennale.de/en/about-us/presenter/\n",
+      "content-type: text/html; charset=utf-8\n",
+      "content: b'\\n\\n\\n\\n\\n\\n\\t\\t\\n    \\n            \\n            \\n        \\n<!DOCTYPE html>\\n<!--[if IE 8]> \\t       <html class=\"no-js lt-ie9\" lang=\"en-GB\" > <![endif]-->\\n<!--[if gt IE 8]><!--> <html class=\"no-js\" lang=\"en-GB\" >'\n",
+      "url: http://archiwum.ciop.pl/20641.html\n",
+      "content-type: text/html; charset=iso-8859-2\n",
+      "content: b'<HTML>\\n<HEAD>\\n<META HTTP-EQUIV=Content-type CONTENT=\\'text/html; charset=iso-8859-2\\'>\\n<META NAME=\"keywords\" CONTENT=\"bhp, ha\\xb3as, noise control, konferencja, referaty\">\\n<META NAME=\"description\" CONTENT='\n",
+      "url: http://archsa.org/wp-content/uploads/2022/06/Nebraska_bishops.pdf\n",
+      "content-type: application/pdf\n",
+      "content: b'%PDF-1.5\\r%\\xe2\\xe3\\xcf\\xd3\\r\\n24 0 obj\\r<</Linearized 1/L 17959/O 26/E 7094/N 3/T 17645/H [ 473 178]>>\\rendobj\\r                   \\r\\n37 0 obj\\r<</DecodeParms<</Columns 4/Predictor 12>>/Filter/FlateDecode/ID[<5B43B2E217'\n",
+      "url: http://arcorusticon.com/\n",
+      "content-type: text/html\n",
+      "content: b'<!DOCTYPE html>\\n<!--[if lt IE 7]>      <html class=\"no-js lt-ie9 lt-ie8 lt-ie7\"> <![endif]-->\\n<!--[if IE 7]>         <html class=\"no-js lt-ie9 lt-ie8\"> <![endif]-->\\n<!--[if IE 8]>         <html class='\n",
+      "url: http://areso.eus/2016/03/page/3/\n",
+      "content-type: text/html; charset=UTF-8\n",
+      "content: b'<!doctype html>\\r\\n\\r\\n<!--[if lt IE 7]><html lang=\"eu\" prefix=\"og: http://ogp.me/ns#\" class=\"no-js lt-ie9 lt-ie8 lt-ie7\"><![endif]-->\\r\\n<!--[if (IE 7)&!(IEMobile)]><html lang=\"eu\" prefix=\"og: http://ogp.m'\n",
+      "url: http://argentina-anime.com/Tema-Fate-stay-night-Unlimited-Blade-Works--3994\n",
+      "content-type: text/html; charset=UTF-8\n",
+      "content: b'<!DOCTYPE html PUBLIC \"-//W3C//DTD XHTML 1.0 Transitional//EN\" \"http://www.w3.org/TR/xhtml1/DTD/xhtml1-transitional.dtd\"><!-- start: showthread -->\\n<html xml:lang=\"es\" lang=\"es\" xmlns=\"http://www.w3.o'\n",
+      "url: http://argonauta.pl/tag/przepowiednie/\n",
+      "content-type: text/html; charset=UTF-8\n",
+      "content: b'<!doctype html>\\r\\n<html lang=\"pl-PL\">\\r\\n<head>\\r\\n\\t<meta charset=\"UTF-8\">\\r\\n\\t<meta name=\"viewport\" content=\"width=device-width, initial-scale=1\">\\r\\n\\t<link rel=\"profile\" href=\"https://gmpg.org/xfn/11\">\\r\\n\\r\\n\\t<'\n",
+      "url: http://ariosto.ru/page/937\n",
+      "content-type: text/html; charset=UTF-8\n",
+      "content: b'<!DOCTYPE html PUBLIC \"-//W3C//DTD XHTML 1.0 Transitional//EN\" \"http://www.w3.org/TR/xhtml1/DTD/xhtml1-transitional.dtd\">\\r\\n<html xmlns=\"http://www.w3.org/1999/xhtml\" lang=\"ru-RU\">\\r\\n<head>\\r\\n<!--TB JS -'\n",
+      "url: http://arkadiahurt.pl/291-pedzle-maestro\n",
+      "content-type: text/html; charset=utf-8\n",
+      "content: b'<!DOCTYPE HTML> <!--[if lt IE 7]><html class=\"no-js lt-ie9 lt-ie8 lt-ie7\" lang=\"pl-pl\"><![endif]--> <!--[if IE 7]><html class=\"no-js lt-ie9 lt-ie8 ie7\" lang=\"pl-pl\"><![endif]--> <!--[if IE 8]><html cl'\n",
+      "url: http://arkfurnitures.com/product/tripple/\n",
+      "content-type: text/html; charset=UTF-8\n",
+      "content: b'<!DOCTYPE html>\\n<html lang=\"en\" class=\"no-js\">\\n\\n<head>\\n\\n<meta charset=\"UTF-8\" />\\n<link rel=\"alternate\" hreflang=\"en\" href=\"http://arkfurnitures.com/shop/\"/>\\n<title>TRIPPLE &#8211; ARK FURNITURE</title'\n",
+      "url: http://armorique.blog.fc2.com/blog-entry-4214.html\n",
+      "content-type: text/html;charset=utf-8\n",
+      "content: b'<?xml version=\"1.0\" encoding=\"utf-8\"?><!DOCTYPE html PUBLIC \"-//W3C//DTD XHTML 1.0 Transitional//EN\" \"http://www.w3.org/TR/xhtml1/DTD/xhtml1-transitional.dtd\">\\r\\n<html  dir=\"ltr\" xmlns=\"http://www.w3.o'\n",
+      "url: http://arseniev-eparhia.ru/inocheskiy-postrig/\n",
+      "content-type: text/html; charset=UTF-8\n",
+      "content: b'<!DOCTYPE html PUBLIC \"-//W3C//DTD XHTML 1.0 Transitional//EN\" \"http://www.w3.org/TR/xhtml1/DTD/xhtml1-transitional.dtd\"><html xmlns=\"http://www.w3.org/1999/xhtml\" lang=\"ru-RU\"><head profile=\"http://g'\n",
+      "url: http://art-exlibris.net/person/6396\n",
+      "content-type: text/html; charset=UTF-8\n",
+      "content: b'<!DOCTYPE html PUBLIC \"-//W3C//DTD XHTML 1.0 Transitional//EN\" \"http://www.w3.org/TR/xhtml1/DTD/xhtml1-transitional.dtd\">\\n<html xmlns=\"http://www.w3.org/1999/xhtml\">\\n<head>\\n<meta http-equiv=\"Content-T'\n",
+      "url: http://artefaccio.blogspot.com/2016/03/sleeping-beauty-turquoise-copper-wire.html\n",
+      "content-type: text/html; charset=UTF-8\n",
+      "content: b\"<!DOCTYPE html>\\n<html class='v2' dir='ltr' lang='en-GB'>\\n<head>\\n<link href='https://www.blogger.com/static/v1/widgets/3566091532-css_bundle_v2.css' rel='stylesheet' type='text/css'/>\\n<meta content='wi\"\n",
+      "url: http://artem.kolesalux.ru/diski-ls-flowforming-wheels.html\n",
+      "content-type: text/html; charset=UTF-8\n",
+      "content: b'<!DOCTYPE HTML PUBLIC \"-//W3C//DTD HTML 4.01 Transitional//EN\">\\n<html><head><title>\\xd0\\x94\\xd0\\xb8\\xd1\\x81\\xd0\\xba\\xd0\\xb8 LS FlowForming \\xd0\\xba\\xd1\\x83\\xd0\\xbf\\xd0\\xb8\\xd1\\x82\\xd1\\x8c \\xd0\\xb0\\xd0\\xb2\\xd1\\x82\\xd0\\xbe\\xd0\\xbc\\xd0\\xbe\\xd0\\xb1\\xd0\\xb8\\xd0\\xbb\\xd1\\x8c\\xd0\\xbd\\xd1\\x8b\\xd0\\xb5 \\xd0\\xba\\xd0\\xbe\\xd0\\xbb\\xd0\\xb5\\xd1\\x81\\xd0\\xbd\\xd1\\x8b\\xd0\\xb5 \\xd0\\xbb\\xd0\\xb8\\xd1\\x82\\xd1\\x8b\\xd0\\xb5 \\xd0\\xb4\\xd0\\xb8\\xd1\\x81\\xd0\\xba\\xd0\\xb8 \\xd0\\xa4\\xd0\\x9b\\xd0\\x9e\\xd0\\xa3 \\xd0\\xa4\\xd0'\n",
+      "url: http://articles.ivymag.org/ivysubs/moreabo0_memo.html\n",
+      "content-type: text/html; charset=UTF-8\n",
+      "content: b'<HTML>\\n  <HEAD><script type=\"text/javascript\">(window.NREUM||(NREUM={})).init={privacy:{cookies_enabled:true},ajax:{deny_list:[\"bam.nr-data.net\"]},distributed_tracing:{enabled:true}};(window.NREUM||(N'\n",
+      "url: http://artofthinkingsmart.com/2012/02/\n",
+      "content-type: text/html\n",
+      "content: b'<!DOCTYPE html>\\n<html lang=\"en\">\\n<head>\\n    <meta charset=\"UTF-8\">\\n    <title>Captcha</title>\\n    <link rel=\"stylesheet\"\\n          href=\"https://maxcdn.bootstrapcdn.com/bootstrap/3.3.6/css/bootstrap.m'\n",
+      "url: http://arvidlone.com/product/organization36762?id=985\n",
+      "content-type: text/html; charset=UTF-8\n",
+      "content: b'<!DOCTYPE html>\\r\\n<html class=\"no-js\" lang=\"ja\">\\r\\n<head>\\r\\n<meta http-equiv=\"X-UA-Compatible\" content=\"IE=edge\">\\r\\n<meta charset=\"UTF-8\">\\r\\n<meta name=\"viewport\" content=\"width=device-width, initial-scale'\n",
+      "url: http://arzone.ning.com/gifts/gift/list?screenName=2gp9st530pcpk\n",
+      "content-type: text/html; charset=UTF-8\n",
+      "content: b'<!DOCTYPE html>\\n<html lang=\"en\" xmlns:og=\"http://ogp.me/ns#\">\\n    <head data-layout-view=\"default\">\\n<script>\\n    window.dataLayer = window.dataLayer || [];\\n        </script>\\n<!-- Google Tag Manager --'\n",
+      "url: http://asa-kensetsu.com/related/\n",
+      "content-type: text/html; charset=UTF-8\n",
+      "content: b'<!DOCTYPE html>\\r\\n<!--[if IE]>\\r\\n<meta http-equiv=\"X-UA-Compatible\" content=\"IE=Edge\">\\r\\n<![endif]-->\\r\\n<html xmlns:fb=\"http://ogp.me/ns/fb#\" lang=\"ja\">\\r\\n<head>\\r\\n<meta charset=\"UTF-8\" />\\r\\n<title>\\xe9\\x96\\xa2\\xe9\\x80\\xa3\\xe5\\x9b\\xa3'\n",
+      "url: http://asahi25881939.blog.fc2.com/blog-date-20130524.html\n",
+      "content-type: text/html;charset=utf-8\n",
+      "content: b'<!DOCTYPE HTML PUBLIC \"-//W3C//DTD HTML 4.01 Transitional//EN\" \"http://www.w3.org/TR/html4/loose.dtd\">\\n<html>\\n<head>\\n<meta http-equiv=\"content-type\" content=\"text/html; charset=utf-8\">\\n<meta http-equi'\n",
+      "url: http://asahi25881939.blog.fc2.com/blog-date-20140211.html\n",
+      "content-type: text/html;charset=utf-8\n",
+      "content: b'<!DOCTYPE HTML PUBLIC \"-//W3C//DTD HTML 4.01 Transitional//EN\" \"http://www.w3.org/TR/html4/loose.dtd\">\\n<html>\\n<head>\\n<meta http-equiv=\"content-type\" content=\"text/html; charset=utf-8\">\\n<meta http-equi'\n",
+      "url: http://ascelin.com/kort-blond-kapsel-2022/\n",
+      "content-type: text/html; charset=UTF-8\n",
+      "content: b'<!DOCTYPE html>\\n<html xmlns=\"http://www.w3.org/1999/xhtml\" dir=\"ltr\" lang=\"nl\" prefix=\"og: https://ogp.me/ns#\">\\n<head>\\n<meta http-equiv=\"Content-Type\" content=\"text/html; charset=UTF-8\">\\n<meta name=\"v'\n",
+      "url: http://asianteenytubes.net/moviehd/phthisic-jav-academy-tsun-fucks-saturated-file-accommodations-off-out-be-required-of-one-s-mind-mendicant/index.html\n",
+      "content-type: text/html; charset=UTF-8\n",
+      "content: b'<!DOCTYPE html>\\n<html lang=\"en\">\\n<head> <meta name=\"referrer\" content=\"unsafe-url\">\\n<meta charset=\"utf-8\">\\n<meta name=\"viewport\" content=\"width=device-width, initial-scale=1\">\\n<title>Asian porn video<'\n",
+      "url: http://asienveracruz.blogspot.com/2014/07/busca-veracruz-ser-sede-del-congreso.html\n",
+      "content-type: text/html; charset=UTF-8\n",
+      "content: b\"<!DOCTYPE html>\\n<html class='v2' dir='ltr' lang='es'>\\n<head>\\n<link href='https://www.blogger.com/static/v1/widgets/3566091532-css_bundle_v2.css' rel='stylesheet' type='text/css'/>\\n<meta content='width\"\n",
+      "url: http://asmo45.ru/news/vargashinskij_okrug_vargashinskaja_pchjolka/2024-01-15-6873\n",
+      "content-type: text/html; charset=UTF-8\n",
+      "content: b'<html>\\n<head>\\n<meta http-equiv=\"content-type\" content=\"text/html; charset=UTF-8\">\\n<title>\\xd0\\xa1\\xd0\\xbe\\xd0\\xb2\\xd0\\xb5\\xd1\\x82 \\xd0\\xbc\\xd1\\x83\\xd0\\xbd\\xd0\\xb8\\xd1\\x86\\xd0\\xb8\\xd0\\xbf\\xd0\\xb0\\xd0\\xbb\\xd1\\x8c\\xd0\\xbd\\xd1\\x8b\\xd1\\x85 \\xd0\\xbe\\xd0\\xb1\\xd1\\x80\\xd0\\xb0\\xd0\\xb7\\xd0\\xbe\\xd0\\xb2\\xd0\\xb0\\xd0\\xbd\\xd0\\xb8\\xd0\\xb9 \\xd0\\x9a\\xd1\\x83\\xd1\\x80\\xd0\\xb3\\xd0\\xb0\\xd0\\xbd\\xd1\\x81\\xd0\\xba\\xd0\\xbe\\xd0\\xb9 \\xd0\\xbe\\xd0\\xb1\\xd0\\xbb\\xd0\\xb0\\xd1\\x81\\xd1\\x82\\xd0\\xb8 - \\xd0\\x9d\\xd0\\xbe\\xd0\\xb2\\xd0\\xbe\\xd1\\x81\\xd1\\x82'\n"
+     ]
+    }
+   ],
+   "execution_count": 26
+  },
+  {
+   "metadata": {
+    "ExecuteTime": {
+     "end_time": "2025-10-09T13:33:46.725140Z",
+     "start_time": "2025-10-09T13:33:46.704415Z"
+    }
+   },
+   "cell_type": "code",
+   "source": [
+    "warc_path = './TEST-000000.extracted.warc.gz'\n",
+    "dump_all_records(warc_path, limit=5)"
+   ],
+   "id": "d1d433956ce0f3fa",
+   "outputs": [
+    {
+     "name": "stdout",
+     "output_type": "stream",
+     "text": [
+      "url: https://turux.at/\n",
+      "content-type: text/html\n",
+      "content: b'<!DOCTYPE html PUBLIC \"-//W3C//DTD XHTML 1.0 Transitional//EN\" \"http://www.w3.org/TR/xhtml1/DTD/xhtml1-transitional.dtd\">\\n<html xmlns=\"http://www.w3.org/1999/xhtml\">\\n<head>\\n<meta http-equiv=\"Content-T'\n",
+      "url: http://turux.at/\n",
+      "content-type: text/html; charset=iso-8859-1\n",
+      "content: b'<!DOCTYPE HTML PUBLIC \"-//IETF//DTD HTML 2.0//EN\">\\n<html><head>\\n<title>302 Found</title>\\n</head><body>\\n<h1>Found</h1>\\n<p>The document has moved <a href=\"https://turux.at/\">here</a>.</p>\\n<hr>\\n<address>'\n"
+     ]
+    }
+   ],
+   "execution_count": 8
+  },
+  {
+   "metadata": {
+    "ExecuteTime": {
+     "end_time": "2025-10-09T13:39:50.153808Z",
+     "start_time": "2025-10-09T13:39:49.714987Z"
+    }
+   },
+   "cell_type": "code",
+   "source": [
+    "import os\n",
+    "import json\n",
+    "import pandas as pd\n",
+    "\n",
+    "cdxj_path = os.path.splitext(warc_path)[0] + '.cdxj'\n",
+    "objects = []\n",
+    "with open(cdxj_path, 'rt') as f:\n",
+    "    for line in f:\n",
+    "        surl, timestamp, json_dict = line.split(' ', 2)\n",
+    "        data = json.loads(json_dict)\n",
+    "        data.update({'surl': surl, 'timestamp': timestamp})\n",
+    "        print(surl, timestamp, data)\n",
+    "        objects.append(data)\n",
+    "\n",
+    "df = pd.DataFrame.from_records(objects)\n"
+   ],
+   "id": "92ec566d3f2fc08e",
+   "outputs": [
+    {
+     "name": "stdout",
+     "output_type": "stream",
+     "text": [
+      "at,turux)/ 20250911025500 {'url': 'https://turux.at/', 'mime': 'text/html', 'status': '200', 'digest': 'sha1:23AM7B43CHDLUEZTG2SLWB5DF25YJJTX', 'length': '1993', 'offset': '358', 'filename': 'TEST-000000.extracted.warc.gz', 'surl': 'at,turux)/', 'timestamp': '20250911025500'}\n",
+      "at,turux)/ 20250911030852 {'url': 'http://turux.at/', 'mime': 'text/html', 'status': '302', 'digest': 'sha1:MPFPQGOFANQKEGZJCIZMGJPBYSFEYW6V', 'length': '813', 'offset': '2351', 'filename': 'TEST-000000.extracted.warc.gz', 'surl': 'at,turux)/', 'timestamp': '20250911030852'}\n"
+     ]
+    }
+   ],
+   "execution_count": 15
+  },
+  {
+   "metadata": {
+    "ExecuteTime": {
+     "end_time": "2025-10-09T13:39:53.792972Z",
+     "start_time": "2025-10-09T13:39:53.754838Z"
+    }
+   },
+   "cell_type": "code",
+   "source": "df",
+   "id": "7f698d6e6ee84795",
+   "outputs": [
+    {
+     "data": {
+      "text/plain": [
+       "                 url       mime status                                 digest  \\\n",
+       "0  https://turux.at/  text/html    200  sha1:23AM7B43CHDLUEZTG2SLWB5DF25YJJTX   \n",
+       "1   http://turux.at/  text/html    302  sha1:MPFPQGOFANQKEGZJCIZMGJPBYSFEYW6V   \n",
+       "\n",
+       "  length offset                       filename        surl       timestamp  \n",
+       "0   1993    358  TEST-000000.extracted.warc.gz  at,turux)/  20250911025500  \n",
+       "1    813   2351  TEST-000000.extracted.warc.gz  at,turux)/  20250911030852  "
+      ],
+      "text/html": [
+       "<div>\n",
+       "<style scoped>\n",
+       "    .dataframe tbody tr th:only-of-type {\n",
+       "        vertical-align: middle;\n",
+       "    }\n",
+       "\n",
+       "    .dataframe tbody tr th {\n",
+       "        vertical-align: top;\n",
+       "    }\n",
+       "\n",
+       "    .dataframe thead th {\n",
+       "        text-align: right;\n",
+       "    }\n",
+       "</style>\n",
+       "<table border=\"1\" class=\"dataframe\">\n",
+       "  <thead>\n",
+       "    <tr style=\"text-align: right;\">\n",
+       "      <th></th>\n",
+       "      <th>url</th>\n",
+       "      <th>mime</th>\n",
+       "      <th>status</th>\n",
+       "      <th>digest</th>\n",
+       "      <th>length</th>\n",
+       "      <th>offset</th>\n",
+       "      <th>filename</th>\n",
+       "      <th>surl</th>\n",
+       "      <th>timestamp</th>\n",
+       "    </tr>\n",
+       "  </thead>\n",
+       "  <tbody>\n",
+       "    <tr>\n",
+       "      <th>0</th>\n",
+       "      <td>https://turux.at/</td>\n",
+       "      <td>text/html</td>\n",
+       "      <td>200</td>\n",
+       "      <td>sha1:23AM7B43CHDLUEZTG2SLWB5DF25YJJTX</td>\n",
+       "      <td>1993</td>\n",
+       "      <td>358</td>\n",
+       "      <td>TEST-000000.extracted.warc.gz</td>\n",
+       "      <td>at,turux)/</td>\n",
+       "      <td>20250911025500</td>\n",
+       "    </tr>\n",
+       "    <tr>\n",
+       "      <th>1</th>\n",
+       "      <td>http://turux.at/</td>\n",
+       "      <td>text/html</td>\n",
+       "      <td>302</td>\n",
+       "      <td>sha1:MPFPQGOFANQKEGZJCIZMGJPBYSFEYW6V</td>\n",
+       "      <td>813</td>\n",
+       "      <td>2351</td>\n",
+       "      <td>TEST-000000.extracted.warc.gz</td>\n",
+       "      <td>at,turux)/</td>\n",
+       "      <td>20250911030852</td>\n",
+       "    </tr>\n",
+       "  </tbody>\n",
+       "</table>\n",
+       "</div>"
+      ]
+     },
+     "execution_count": 16,
+     "metadata": {},
+     "output_type": "execute_result"
+    }
+   ],
+   "execution_count": 16
+  },
+  {
+   "metadata": {
+    "ExecuteTime": {
+     "end_time": "2025-10-09T13:14:05.351832Z",
+     "start_time": "2025-10-09T13:14:05.333388Z"
+    }
+   },
+   "cell_type": "code",
+   "source": "r = get_first_record(warc_path)",
+   "id": "2b296a5741ca8045",
+   "outputs": [],
+   "execution_count": 10
+  },
+  {
+   "metadata": {
+    "ExecuteTime": {
+     "end_time": "2025-10-09T13:15:07.506053Z",
+     "start_time": "2025-10-09T13:15:07.485847Z"
+    }
+   },
+   "cell_type": "code",
+   "source": "r.content_stream().read()",
+   "id": "e7b2171bcad517f7",
+   "outputs": [
+    {
+     "data": {
+      "text/plain": [
+       "b'<meta http-equiv=Refresh content=0;url=/web/error.php?id=2>'"
+      ]
+     },
+     "execution_count": 18,
+     "metadata": {},
+     "output_type": "execute_result"
+    }
+   ],
+   "execution_count": 18
+  },
+  {
+   "metadata": {},
+   "cell_type": "code",
+   "outputs": [],
+   "execution_count": null,
+   "source": "",
+   "id": "f7293efd120ac1b4"
+  }
+ ],
+ "metadata": {
+  "kernelspec": {
+   "display_name": "Python 3",
+   "language": "python",
+   "name": "python3"
+  },
+  "language_info": {
+   "codemirror_mode": {
+    "name": "ipython",
+    "version": 2
+   },
+   "file_extension": ".py",
+   "mimetype": "text/x-python",
+   "name": "python",
+   "nbconvert_exporter": "python",
+   "pygments_lexer": "ipython2",
+   "version": "2.7.6"
+  }
+ },
+ "nbformat": 4,
+ "nbformat_minor": 5
+}

From cf8ff3adf0f973168085fbe95102d6a4dbf78daf Mon Sep 17 00:00:00 2001
From: Damian Stewart <ot@damianstewart.com>
Date: Wed, 15 Oct 2025 15:20:31 +0200
Subject: [PATCH 04/22] delete accidentally added file

---
 notebooks/warcio_experiments.ipynb | 923 -----------------------------
 1 file changed, 923 deletions(-)
 delete mode 100644 notebooks/warcio_experiments.ipynb

diff --git a/notebooks/warcio_experiments.ipynb b/notebooks/warcio_experiments.ipynb
deleted file mode 100644
index c51d87a..0000000
--- a/notebooks/warcio_experiments.ipynb
+++ /dev/null
@@ -1,923 +0,0 @@
-{
- "cells": [
-  {
-   "metadata": {
-    "ExecuteTime": {
-     "end_time": "2025-10-09T13:33:27.910213Z",
-     "start_time": "2025-10-09T13:33:27.895153Z"
-    }
-   },
-   "cell_type": "code",
-   "source": [
-    "%load_ext autoreload\n",
-    "%autoreload 2"
-   ],
-   "id": "f142ae2305e8e09d",
-   "outputs": [],
-   "execution_count": 2
-  },
-  {
-   "cell_type": "code",
-   "id": "initial_id",
-   "metadata": {
-    "collapsed": true,
-    "ExecuteTime": {
-     "end_time": "2025-10-09T13:33:28.691992Z",
-     "start_time": "2025-10-09T13:33:28.678002Z"
-    }
-   },
-   "source": "from warcio.archiveiterator import ArchiveIterator\n",
-   "outputs": [],
-   "execution_count": 3
-  },
-  {
-   "metadata": {
-    "ExecuteTime": {
-     "end_time": "2025-10-09T13:10:59.883851Z",
-     "start_time": "2025-10-09T13:10:59.857364Z"
-    }
-   },
-   "cell_type": "code",
-   "source": "warc_path = \"/home/cc-pds/commoncrawl/crawl-data/CC-MAIN-2024-10/segments/1707947473347.0/warc/CC-MAIN-20240220211055-20240221001055-00101.warc.gz\"",
-   "id": "88a4052768f17978",
-   "outputs": [],
-   "execution_count": 5
-  },
-  {
-   "metadata": {
-    "ExecuteTime": {
-     "end_time": "2025-10-09T13:33:31.045128Z",
-     "start_time": "2025-10-09T13:33:31.022226Z"
-    }
-   },
-   "cell_type": "code",
-   "source": [
-    "\n",
-    "def dump_all_records(warc_path, limit: int=5):\n",
-    "    count = 0\n",
-    "    with open(warc_path, \"rb\") as f:\n",
-    "        for record in ArchiveIterator(f):\n",
-    "            if record.rec_type == \"response\":\n",
-    "                #print(record.rec_headers)\n",
-    "                print(\"url:\", record.rec_headers.get_header(\"WARC-Target-URI\"))\n",
-    "                print(\"content-type:\", record.http_headers.get_header(\"Content-Type\"))\n",
-    "                content = record.content_stream().read()\n",
-    "                print(\"content:\", content[:200])\n",
-    "                count += 1\n",
-    "                if count >= limit:\n",
-    "                    break\n",
-    "\n",
-    "def get_first_record(warc_path):\n",
-    "    with open(warc_path, \"rb\") as f:\n",
-    "        for record in ArchiveIterator(f):\n",
-    "            if record.rec_type == \"response\":\n",
-    "                return record"
-   ],
-   "id": "72d21cc15eb4b1c0",
-   "outputs": [],
-   "execution_count": 4
-  },
-  {
-   "metadata": {
-    "ExecuteTime": {
-     "end_time": "2025-10-09T13:19:25.393165Z",
-     "start_time": "2025-10-09T13:19:24.977645Z"
-    }
-   },
-   "cell_type": "code",
-   "source": "dump_all_records(warc_path, limit=200)",
-   "id": "16d1afcec0c6de96",
-   "outputs": [
-    {
-     "name": "stdout",
-     "output_type": "stream",
-     "text": [
-      "url: http://020zxdq.baiwanx.com.cn/?user=020zxdq\n",
-      "content-type: text/html\n",
-      "content: b'<meta http-equiv=Refresh content=0;url=/web/error.php?id=2>'\n",
-      "url: http://04.ma/2017/05/05/%D8%A7%D9%84%D9%81%D8%B1%D8%A7%D8%B4%D8%A9-%D8%AF%D9%8A%D8%A7%D9%84-%D9%82%D9%8A%D8%B3%D8%A7%D8%B1%D9%8A%D8%A9-%D8%B3%D8%A8%D8%A7%D8%AA%D8%A9-%D8%AF%D8%A7%D8%B1%D9%88-%D9%88%D9%82%D9%81%D8%A9-%D9%82/\n",
-      "content-type: text/html; charset=UTF-8\n",
-      "content: b'<!DOCTYPE html>\\n<html dir=\"rtl\" lang=\"ar\" prefix=\"og: http://ogp.me/ns#\">\\n<head>\\n\\n<script async src=\"https://pagead2.googlesyndication.com/pagead/js/adsbygoogle.js?client=ca-pub-9575065054627750\"\\n    '\n",
-      "url: http://0qc.juzdani.com/site-data/hnstmjzxh/html/zfjx1/index.html\n",
-      "content-type: text/html; charset=utf-8\n",
-      "content: b'<!DOCTYPE html>\\n<html lang=\"zh-CN\"><head>\\n<meta name=\"applicable-device\" content=\"pc,mobile\"/> \\n  <meta charset=\"utf-8\"/> \\n  <meta content=\"width=device-width,initial-scale=1.0,user-scalabel=0, user-s'\n",
-      "url: http://1001kick.com/4140/\n",
-      "content-type: text/html; charset=UTF-8\n",
-      "content: b'<!DOCTYPE html>\\n<html class=\"no-js\" prefix=\"og: http://ogp.me/ns#\"><!--<![endif]--><head>\\n  <!--[if IE]><meta http-equiv=\"X-UA-Compatible\" content=\"IE=edge\"><![endif]-->\\n  <!--[if lt IE 8]>\\n  <script '\n",
-      "url: http://118184.webhosting44.1blu.de/wolf/famfo/i1373.html\n",
-      "content-type: text/html\n",
-      "content: b'<!DOCTYPE HTML PUBLIC \"-//W3C//DTD HTML 4.01 Transitional//EN\">\\r\\n<html>\\r\\n<head>\\r\\n<meta http-equiv=\"Content-Type\" content=\"text/html; charset=windows-1252\" />\\r\\n<meta name=\"KEYWORDS\" content=\"Genealogie'\n",
-      "url: http://1438195.tdne869.com/index.phtml?PUT=A_SORT&CHANNEL=&SORT=R7&FID=1438195\n",
-      "content-type: text/html; charset=Big5\n",
-      "content: b'<html><head><title>173live\\xac\\xfc\\xa4k\\xbcv\\xad\\xb5live\\xa8q - \\xa4@\\xb9\\xef\\xa4@\\xc2I\\xbc\\xc6::\\xa5\\xd1\\xb0\\xaa\\xa6\\xdc\\xa7C\\xb1\\xc6\\xa7\\xc7 </title><meta http-equiv=\"Content-Language\" content=\"zh-tw\"><meta http-equiv=content-type content=\"text/html; charset=big5\"><meta na'\n",
-      "url: http://170501.afg054.com/index.phtml?PUT=a_show&AID=193096&FID=170501&R2=&CHANNEL=\n",
-      "content-type: text/html; charset=Big5\n",
-      "content: b'<html><head><title>\\xaa\\xf7\\xb2~\\xb1\\xf6\\xb5\\xf8\\xb0T,\\xa7K\\xb6O\\xa6b\\xbdu\\xb5\\xf8\\xc0W\\xb2\\xe1\\xa4\\xd1</title><meta http-equiv=content-type content=\"text/html; charset=big5\">\\n<meta name=\"Keywords\" content=\"\\xa7K\\xb6O\\xa6h\\xa4H\\xb5\\xf8\\xc0W\\xba\\xf4\\xaf\\xb8 ,\\xa5x\\xc6W\\xb5\\xb7\\xc4\\xfb\\xac\\xfc\\xa4k\\xbcg\\xafu ,\\xa9t\\xa8k\\xb9\\xe8\\xa4k\\xbd\\xcd\\xa4\\xdf\\xc1p'\n",
-      "url: http://176507.k997hh.com/?PUT=a_show&AID=74944&FID=176507&R2=&CHANNEL=\n",
-      "content-type: text/html; charset=Big5\n",
-      "content: b'<html><head><title>\\xab\\xe1\\xaec\\xa8\\xdf\\xb6O\\xa6\\xe2\\xb1\\xa1\\xb5\\xf8\\xc0W\\xaa\\xbd\\xbc\\xbd\\xb6\\xa1</title><meta http-equiv=content-type content=\"text/html; charset=big5\">\\n<meta name=\"Keywords\" content=\"\\xad\\xb5\\xc6[\\xbd\\xe0\\xa4s\\xb1T\\xa4\\xf2\\xa4\\xf9\\xa4\\xd1\\xa4W\\xa4H\\xb6\\xa1\\xaeT\\xbc\\xd6\\xba\\xf4\\xb5\\xf8\\xb0T\\xb7|\\xc4\\xb3\\xad^\\xa4\\xe5\\xa7K\\xb6O\\xa6\\xa8\\xa4H\\xba\\xf4\\xac\\xbd\\xac\\xbd\\xef'\n",
-      "url: http://1g40.hoosierscabinet.net/About-Us/Principals-Page/index.html\n",
-      "content-type: text/html; charset=utf-8\n",
-      "content: b'<!DOCTYPE html>\\n<html lang=\"en\"><head>\\n<meta name=\"applicable-device\" content=\"pc,mobile\"/>\\n    <!-- Global site tag (gtag.js) - Google Analytics -->\\n    <script async=\"\" src=\"http://www.googletagmana'\n",
-      "url: http://1g40.hoosierscabinet.net/Admissions/index.html\n",
-      "content-type: text/html; charset=utf-8\n",
-      "content: b'<!DOCTYPE html>\\n<html lang=\"en\"><head>\\n<meta name=\"applicable-device\" content=\"pc,mobile\"/>\\n    <!-- Global site tag (gtag.js) - Google Analytics -->\\n    <script async=\"\" src=\"http://www.googletagmana'\n",
-      "url: http://1stforprint.co.uk/shop/print/leaflets-and-flyers/dl-flyers-leaflets-full-colour-single-sided-24hr-dispatch-250gsm-plain-2/\n",
-      "content-type: text/html; charset=UTF-8\n",
-      "content: b'<!DOCTYPE html>\\n<html lang=\"en-GB\" prefix=\"og: https://ogp.me/ns#\">\\n<head>\\n<meta charset=\\'UTF-8\\'>\\n<meta name=\"viewport\" content=\"width=device-width, initial-scale=1\">\\n<link rel=\"profile\" href=\"http://'\n",
-      "url: http://2-floor.dyndns.org/item_detail.php?pro_id=541152\n",
-      "content-type: text/html; charset=UTF-8\n",
-      "content: b'<!DOCTYPE html PUBLIC \"-//W3C//DTD XHTML 1.0 Transitional//EN\" \"http://www.w3.org/TR/xhtml1/DTD/xhtml1-transitional.dtd\">\\r\\n<html xmlns=\"http://www.w3.org/1999/xhtml\">\\r\\n<head>\\r\\n<meta http-equiv=\"Conten'\n",
-      "url: http://221625.tu75h.com/index.phtml?PUT=a_show&AID=206811&FID=221625&R2=&CHANNEL=\n",
-      "content-type: text/html; charset=Big5\n",
-      "content: b'<html>\\n\\n<head>\\n<title>\\n\\xa5x\\xc6W\\xb2\\xa2\\xa4\\xdf\\xa4k\\xab\\xc4\\xb2\\xe1\\xa4\\xd1\\xab\\xc7</title>\\n<meta http-equiv=\"PICS-Label\" content=\\'(PICS-1.1 \"http://www.ticrf.org.tw/chinese/html/06-rating-v11.htm\" l gen true for \"http://221625.tu75h.com\" r ('\n",
-      "url: http://222.ninja-official.com/2020/01/30/ninja-kyototo/\n",
-      "content-type: text/html; charset=UTF-8\n",
-      "content: b'<!DOCTYPE html>\\n<html lang=\"ja\"><head prefix=\"og: http://ogp.me/ns# fb: http://ogp.me/ns/fb# article: http://ogp.me/ns/article#\"><meta charset=\"utf-8\" />\\n<meta name=\"viewport\" content=\"width=device-wi'\n",
-      "url: http://24hourlocksmith-san-antonio.com/02/1050/\n",
-      "content-type: text/html; charset=UTF-8\n",
-      "content: b'<!doctype html>\\n<html lang=\"zh-CN\">\\n<head>\\n\\t<meta charset=\"UTF-8\">\\n\\t<meta name=\"viewport\" content=\"width=device-width, initial-scale=1\">\\n\\t<link rel=\"profile\" href=\"https://gmpg.org/xfn/11\">\\n<script ty'\n",
-      "url: http://24hourlocksmith-san-antonio.com/06/1216/\n",
-      "content-type: text/html; charset=UTF-8\n",
-      "content: b'<!doctype html>\\n<html lang=\"zh-CN\">\\n<head>\\n\\t<meta charset=\"UTF-8\">\\n\\t<meta name=\"viewport\" content=\"width=device-width, initial-scale=1\">\\n\\t<link rel=\"profile\" href=\"https://gmpg.org/xfn/11\">\\n<script ty'\n",
-      "url: http://257500.ru/vilyarreal-atletiko-m-smotret-onlajn-videotranslyaciyu-matcha-la-ligi/\n",
-      "content-type: text/html; charset=UTF-8\n",
-      "content: b'<!DOCTYPE html PUBLIC \"-//W3C//DTD XHTML 1.0 Transitional//EN\" \"http://www.w3.org/TR/xhtml1/DTD/xhtml1-transitional.dtd\">\\r\\n<html xmlns=\"http://www.w3.org/1999/xhtml\">\\r\\n<head>\\r\\n<meta http-equiv=\"Conten'\n",
-      "url: http://2chjoke.blog51.fc2.com/blog-entry-51.html\n",
-      "content-type: text/html;charset=utf-8\n",
-      "content: b'<!DOCTYPE html PUBLIC \"-//W3C//DTD XHTML 1.0 Transitional//EN\" \"http://www.w3.org/TR/xhtml1/DTD/xhtml1-transitional.dtd\">\\r\\n<html xmlns=\"http://www.w3.org/1999/xhtml\">\\r\\n<head>\\r\\n<title>[\\xe6\\x90\\xbe\\xe4\\xb9\\xb3\\xe6\\xa9\\x9f] by \\xef\\xbc'\n",
-      "url: http://2d6lodge.co.uk/a-cambridge-too-far-2019/cambridge-far-2017/2d6_logo/\n",
-      "content-type: text/html; charset=UTF-8\n",
-      "content: b'<!DOCTYPE html>\\n<!--[if IE 7]>\\n<html class=\"ie ie7\" lang=\"en-US\">\\n<![endif]-->\\n<!--[if IE 8]>\\n<html class=\"ie ie8\" lang=\"en-US\">\\n<![endif]-->\\n<!--[if !(IE 7) & !(IE 8)]><!-->\\n<html lang=\"en-US\">\\n<!--<'\n",
-      "url: http://2fit.anandtech.com/tag/mali-t880\n",
-      "content-type: text/html; charset=utf-8\n",
-      "content: b'\\n<!DOCTYPE html>\\n<html>\\n<!--[if IE 6]> <html class=\"ie6\"> <![endif]-->\\n<!--[if IE 7]> <html class=\"ie7\"> <![endif]-->\\n<!--[if IE 8]> <html class=\"ie8\"> <![endif]-->\\n<!--[if IE 9]> <html class=\"ie9\"> <'\n",
-      "url: http://2mares.org/un-relampago-desvanece-su-rostro/\n",
-      "content-type: text/html; charset=UTF-8\n",
-      "content: b'<!DOCTYPE html>\\n<html lang=\"fr-FR\" itemscope itemtype=\"http://schema.org/WebPage\">\\n<head>\\n<meta charset=\"UTF-8\">\\n<link rel=\"profile\" href=\"http://gmpg.org/xfn/11\">\\n<title>Un rel\\xc3\\xa1mpago desvanece su ro'\n",
-      "url: http://2pm4u.blog.fc2.com/blog-entry-226.html\n",
-      "content-type: text/html;charset=utf-8\n",
-      "content: b'<!DOCTYPE HTML\\r\\n\\tPUBLIC \"-//W3C//DTD HTML 4.01 Transitional//EN\"\\r\\n\\t\\t\"http://www.w3.org/TR/html4/loose.dtd\">\\r\\n<!--\\r\\n<!DOCTYPE HTML\\r\\n\\tPUBLIC \"-//W3C//DTD HTML 4.01//EN\"\\r\\n\\t\\t\"http://www.w3.org/TR/html4/st'\n",
-      "url: http://2sc.sohu.com/buycar/carinfo_sohu_1907886.shtml\n",
-      "content-type: text/html;charset=UTF-8\n",
-      "content: b'<!DOCTYPE html PUBLIC \"-//W3C//DTD XHTML 1.0 Transitional//EN\" \"http://www.w3.org/TR/xhtml1/DTD/xhtml1-transitional.dtd\">\\n<html>\\n<head>\\n    <meta http-equiv=\"Content-Type\" content=\"text/html; charset='\n",
-      "url: http://31sdgsyyktjdyxgs.hbpuyu.com/\n",
-      "content-type: text/html\n",
-      "content: b'<!DOCTYPE html>\\n<html lang=\"en\">\\n<head>\\n<meta charset=\"UTF-8\">\\n<meta http-equiv=\"X-UA-Compatible\" content=\"IE=edge,chrome=1\" />\\n<meta name=\"viewport\" content=\"width=device-width, initial-scale=1.0, mi'\n",
-      "url: http://342156.ya93e.com/?PUT=a_show&AID=184235&FID=342156&R2=&CHANNEL=\n",
-      "content-type: text/html; charset=Big5\n",
-      "content: b'<html><head><title>mm\\xa9]\\xa6\\xe2\\xa6\\xe2\\xafT\\xaa\\xbd\\xbc\\xbd ,\\xa7K\\xb6O\\xa6\\xa8.\\xa4H\\xba\\xa9\\xb5e\\xbdu\\xa4W\\xac\\xdd</title><meta http-equiv=content-type content=\"text/html; charset=big5\">\\n<meta name=\"Keywords\" content=\"UThome\\xb5\\xf8\\xc0W\\xb2\\xe1\\xa4\\xd1\\xab\\xc7 ,\\xa6\\xe2\\xb1\\xa1\\xac\\xfc\\xa4k\\xa8q\\xb3\\xf5\\xbbr\\xb2\\xe1 ,\\xa4\\xa4\\xb0\\xea\\xbbr'\n",
-      "url: http://34383314.blog.fc2.com/blog-entry-4360.html\n",
-      "content-type: text/html;charset=utf-8\n",
-      "content: b'<!DOCTYPE HTML PUBLIC \"-//W3C//DTD HTML 4.01 Transitional//EN\" \"http://www.w3.org/TR/html4/loose.dtd\">\\n<html>\\n<head>\\n<meta http-equiv=\"Content-Type\" content=\"text/html; charset=utf-8\">\\n<meta http-equi'\n",
-      "url: http://344560.hge100.com/index.phtml?PUT=A_SORT&SORT=R42&FID=344560\n",
-      "content-type: text/html; charset=Big5\n",
-      "content: b'<html>\\n\\n<head>\\n<title>\\n\\xa4\\xe9\\xa5\\xbbAV\\xa4k\\xc0u\\xbcg\\xafu\\xb6\\xb0\\xb5\\xf8\\xc0W,28\\xb8\\xb9\\xa4\\xbd\\xc0]\\xb5\\xf8\\xc0W\\xb2\\xe1\\xa4\\xd1\\xab\\xc7</title>\\n<meta http-equiv=\"PICS-Label\" content=\\'(PICS-1.1 \"http://www.ticrf.org.tw/chinese/html/06-rating-v11.htm\" l gen true for \"http://'\n",
-      "url: http://34474.dynamicboard.de/t506f72-Info-MSI-NetOn-AP-DisplayPC-GB-GB.html\n",
-      "content-type: text/html; charset=iso-8859-1\n",
-      "content: b'\\r\\n<!DOCTYPE html>\\r\\n<HTML xmlns=\"http://www.w3.org/1999/xhtml\" xmlns:fb=\"http://www.facebook.com/2008/fbml\" xml:lang=\"de\" lang=\"de\">\\r\\n<HEAD>\\r\\n\\r\\n<title>Informationen &raquo; Info: MSI NetOn AP 1900 Disp'\n",
-      "url: http://344812842.ideedigusto.it/%E5%B9%BC%E5%85%92-%E5%A4%A7-%E8%82%8C%E8%82%89-%E9%81%8A%E6%88%B2.html/\n",
-      "content-type: text/html; charset=UTF-8\n",
-      "content: b'<form action=\"http://344812842.ideedigusto.it/%E5%B9%BC%E5%85%92-%E5%A4%A7-%E8%82%8C%E8%82%89-%E9%81%8A%E6%88%B2.html/\" method=\"post\">\\n<button type=\"submit\">Please click if you not a BOT</button>\\n</fo'\n",
-      "url: http://360ext.com/vodplay/450675-1-1.html\n",
-      "content-type: text/html; charset=utf-8\n",
-      "content: b'<!DOCTYPE html PUBLIC \"-//W3C//DTD XHTML 1.0 Transitional//EN\" \"http://www.w3.org/TR/xhtml1/DTD/xhtml1-transitional.dtd\">\\n<html xmlns=\"http://www.w3.org/1999/xhtml\">\\n<head>\\n<meta http-equiv=\"Content-T'\n",
-      "url: http://371692975.eubos-med.pl/%E5%8F%B0-%E5%8D%8A-%E7%89%B9-%E6%96%AF-%E6%8B%89.html/\n",
-      "content-type: text/html; charset=UTF-8\n",
-      "content: b'<form action=\"http://371692975.eubos-med.pl/%E5%8F%B0-%E5%8D%8A-%E7%89%B9-%E6%96%AF-%E6%8B%89.html/\" method=\"post\">\\n<button type=\"submit\">Please click if you not a BOT</button>\\n</form>\\n34.236.134.129 '\n",
-      "url: http://3ai6.121wk.com/doc-902.html\n",
-      "content-type: text/html; charset=UTF-8\n",
-      "content: b'<!DOCTYPE html>\\n<html>\\n<head>\\n    <meta charset=\"utf-8\" />\\n    <meta http-equiv=\"X-UA-Compatible\" content=\"IE=edge,chrome=1\" />\\n    <meta name=\"keywords\" content=\"\\xe8\\x89\\xbe\\xe7\\x91\\x9e,\\xe5\\x92\\xa8\\xe8\\xaf\\xa2,2021,\\xe4\\xb8\\xad\\xe5\\x9b\\xbd,\\xe4\\xba\\xba\\xe7\\x89\\xa9,\\xe8\\x81\\x94\\xe7'\n",
-      "url: http://3tilbudnu.dk/elektriker/praestemark-holbaek-7/\n",
-      "content-type: text/html; charset=UTF-8\n",
-      "content: b'<!DOCTYPE html>\\n<html lang=\"da-DK\">\\n<head >\\n<meta charset=\"UTF-8\" />\\n<meta name=\"viewport\" content=\"width=device-width, initial-scale=1\" />\\n<title>Elektriker i Pr\\xc3\\xa6stemark Holb\\xc3\\xa6k \\xe2\\x87\\x92 F\\xc3\\xa5 3 gratis og '\n",
-      "url: http://401605997.skvazhinann.ru/%D8%A7%D9%84%D9%82%D8%B1%D9%8A%D8%AF%D8%A7%D8%AA.html/\n",
-      "content-type: text/html; charset=UTF-8\n",
-      "content: b'<form action=\"http://401605997.skvazhinann.ru/%D8%A7%D9%84%D9%82%D8%B1%D9%8A%D8%AF%D8%A7%D8%AA.html/\" method=\"post\">\\n<button type=\"submit\">Please click if you not a BOT</button>\\n</form>\\n44.192.20.240 '\n",
-      "url: http://422960136.euphoria4u.ru/%E5%9B%9B%E5%AD%A3-%E8%A2%AB-%E6%98%AF-%E4%BB%80%E9%BA%BC.html/\n",
-      "content-type: text/html; charset=UTF-8\n",
-      "content: b'<form action=\"http://422960136.euphoria4u.ru/%E5%9B%9B%E5%AD%A3-%E8%A2%AB-%E6%98%AF-%E4%BB%80%E9%BA%BC.html/\" method=\"post\">\\n<button type=\"submit\">Please click if you not a BOT</button>\\n</form>\\n35.175'\n",
-      "url: http://432722.com/11134029194.html\n",
-      "content-type: text/html; charset=UTF-8\n",
-      "content: b'<html mip=\"\">\\n<head>\\n<meta charset=\"utf-8\" />\\n<title>\\xe5\\xbc\\x80\\xe9\\x97\\xa8\\xe7\\xba\\xa2-\\xe6\\xbb\\xa1\\xe6\\xb1\\x9f\\xe7\\xba\\xa2\\xef\\xbc\\x81</title>\\n<meta name=\"keywords\" content=\"404 Not Found\"/>\\n<meta name=\"description\" content=\"404 Not Found\" />\\n<script>\\n(functi'\n",
-      "url: http://435027.com/info/1679823\n",
-      "content-type: text/html; charset=utf-8\n",
-      "content: b'<!DOCTYPE html>\\n<html lang=\"en\">\\n<head>\\n\\t<meta charset=\"utf-8\">\\n\\t<meta name=\"viewport\" content=\"width=device-width, initial-scale=1.0\" />\\n\\t<title>065\\xe6\\x9c\\x9f:\\xe3\\x80\\x96\\xe4\\xbb\\xbb\\xe6\\x88\\x91\\xe7\\x99\\xbc\\xe6\\x9c\\x80\\xe9\\xab\\x98\\xe5\\xbf\\x83\\xe6\\xb0\\xb4\\xe3\\x80\\x97\\xe7\\xb2\\xbe\\xe9\\x81\\xb8\\xe3\\x80\\x90\\xe7\\xbb\\x9d\\xe6\\x9d\\x80\\xe5\\x8d\\x8a\\xe6\\xb3'\n",
-      "url: http://4promoproducts.com/nm/t-shirts-caballo.php\n",
-      "content-type: text/html; charset=UTF-8\n",
-      "content: b'<!DOCTYPE html>\\n<html xmlns=\"http://www.w3.org/1999/xhtml\">\\n<head><meta content=\"HTML Tidy for Linux (vers 25 March 2009), see www.w3.org\" name=\"generator\" /><script type=\"text/javascript\">\\n \\n //<![CD'\n",
-      "url: http://4put.ru/pics/s_50_17/r_140_13/u_4_4/g_11_1/small_8475/\n",
-      "content-type: text/html; charset=WINDOWS-1251\n",
-      "content: b'<!DOCTYPE HTML PUBLIC \"-//W3C//DTD HTML 4.0 Transitional//EN\">\\r\\n<HTML>\\r\\n<HEAD>\\r\\n<TITLE>2008.10.10. 22-37. 1 \\xea\\xe0\\xed\\xe0\\xeb. \\xc3\\xee\\xf0\\xe4\\xee\\xed \\xca\\xe8\\xf5\\xee\\xf2 (\\xb95). \\xc2.\\xc5\\xf0\\xee\\xf4\\xe5\\xe5\\xe2 (sl), \\xef\\xf0\\xe5\\xe2\\xfc\\xfe / \\xd1\\xcc\\xc8. \\xd2\\xc2. 1 \\xea\\xe0\\xed\\xe0\\xeb. / \\xca\\xe0\\xf0\\xf2\\xe8\\xed\\xea\\xe8 \\xef\\xee\\xeb\\xfc\\xe7\\xee\\xe2\\xe0\\xf2\\xe5\\xeb\\xff'\n",
-      "url: http://52eshu.com/89807609.html\n",
-      "content-type: text/html; charset=UTF-8\n",
-      "content: b'<html mip=\"\">\\n<head>\\n<meta charset=\"utf-8\" />\\n<title>\\xe8\\xae\\xbf\\xe9\\x97\\xae\\xe5\\xae\\x89\\xe5\\x85\\xa8</title>\\n<meta name=\"keywords\" content=\"\\xe8\\xae\\xbf\\xe9\\x97\\xae\\xe5\\xae\\x89\\xe5\\x85\\xa8\"/>\\n<meta name=\"description\" content=\"\\xe8\\xae\\xbf\\xe9\\x97\\xae\\xe5\\xae\\x89\\xe5\\x85\\xa8\" />\\n<script>\\n(function(){\\nvar bp'\n",
-      "url: http://534925480.administracja-publiczna.edu.pl/%E5%8F%B0%E4%B8%AD-%E5%B8%82-%E6%9D%B1%E5%8D%80-%E6%9D%B1-%E8%8B%B1-%E5%8D%81-%E4%BA%94-%E8%A1%97.html/\n",
-      "content-type: text/html; charset=UTF-8\n",
-      "content: b'<form action=\"http://534925480.administracja-publiczna.edu.pl/%E5%8F%B0%E4%B8%AD-%E5%B8%82-%E6%9D%B1%E5%8D%80-%E6%9D%B1-%E8%8B%B1-%E5%8D%81-%E4%BA%94-%E8%A1%97.html/\" method=\"post\">\\n<button type=\"subm'\n",
-      "url: http://5funny.blog.fc2.com/img/IMG_0833.jpg/\n",
-      "content-type: text/html;charset=utf-8\n",
-      "content: b'<!DOCTYPE html>\\n<html>\\n<head>\\n<meta charset=\"utf-8\">\\n<meta http-equiv=\"X-UA-Compatible\" content=\"IE=edge\">\\n<meta name=\"viewport\" content=\"width=device-width,initial-scale=1\">\\n<title>\\xef\\xbc\\x95\\xe5\\x8c\\xb9\\xe3\\x81\\xae Funny St'\n",
-      "url: http://6aly.livertransplantation.net/contact\n",
-      "content-type: text/html; charset=utf-8; charset=utf-8\n",
-      "content: b'<!DOCTYPE html PUBLIC \"-//W3C//DTD XHTML+RDFa 1.0//EN\" \"http://www.w3.org/MarkUp/DTD/xhtml-rdfa-1.dtd\">\\n<html dir=\"ltr\" version=\"XHTML+RDFa 1.0\" xml:lang=\"en\" xmlns=\"http://www.w3.org/1999/xhtml\" xmln'\n",
-      "url: http://703933260.barbiestyle.it/%E4%BF%9D-%E5%8F%AF-%E6%98%8E.html/\n",
-      "content-type: text/html; charset=UTF-8\n",
-      "content: b'<form action=\"http://703933260.barbiestyle.it/%E4%BF%9D-%E5%8F%AF-%E6%98%8E.html/\" method=\"post\">\\n<button type=\"submit\">Please click if you not a BOT</button>\\n</form>\\n3.236.223.106 CCBot/2.0 (https://'\n",
-      "url: http://71lady.net/48365906.html\n",
-      "content-type: text/html; charset=UTF-8\n",
-      "content: b'<html><head><title>71lady.net</title><meta name=\"viewport\" content=\"width=device-width, initial-scale=1.0, minimum-scale=1.0, maximum-scale=1.0, user-scalable=no\"/><script src=\"http://libs.baidu.com/j'\n",
-      "url: http://8008202020.alacte.com/a/639j9_357217.html\n",
-      "content-type: text/html; charset=UTF-8\n",
-      "content: b'<html><head><script type=text/javascript src=\"/xynew.js\"></script><script type=text/javascript src=\"/ts.js\"></script></head><body bgcolor=\"white\"><center><h1>404 Not Found</h1></center><hr><center>ngi'\n",
-      "url: http://83863.webhosting22.1blu.de/omega17/index.php/UsersOnlineList/?s=5de99930b6d5d985129a7bd0a6d3658264fbbe4e\n",
-      "content-type: text/html; charset=UTF-8\n",
-      "content: b'<!DOCTYPE html>\\r\\n<html dir=\"ltr\" lang=\"de\">\\r\\n<head>\\r\\n\\t<title>Benutzer online - Omega Allianz</title>\\r\\n\\t\\r\\n\\t<base href=\"http://83863.webhosting22.1blu.de/omega17/\" />\\r\\n<meta charset=\"utf-8\" />\\r\\n<meta na'\n",
-      "url: http://8bithorse.blogspot.com/2014/12/the-legend-of-zelda-101.html\n",
-      "content-type: text/html; charset=UTF-8\n",
-      "content: b\"<!DOCTYPE html>\\n<html dir='ltr' xmlns='http://www.w3.org/1999/xhtml' xmlns:b='http://www.google.com/2005/gml/b' xmlns:data='http://www.google.com/2005/gml/data' xmlns:expr='http://www.google.com/2005/\"\n",
-      "url: http://8oix2.fabulousshontay.com/library\n",
-      "content-type: text/html; charset=UTF-8; charset=utf-8\n",
-      "content: b'<!DOCTYPE html>\\n<html lang=\"en\"><head>\\n<meta name=\"applicable-device\" content=\"pc,mobile\"/>\\n<!-- Landmark College\\'s Google Tag Manager -->\\n<script>(function(w,d,s,l,i){w[l]=w[l]||[];w[l].push({\\'gtm.st'\n",
-      "url: http://9.landmark-church.com/supporting-shipping/\n",
-      "content-type: text/html; charset=UTF-8; charset=utf-8\n",
-      "content: b'<!DOCTYPE html>\\n<html lang=\"en-GB\"><head>\\n<meta name=\"applicable-device\" content=\"pc,mobile\"/>\\n\\n<script type=\"00f6e064c1242f15ee4f122a-text/javascript\">\\n        (function(w, d, s, l, i) {\\n            '\n",
-      "url: http://90-tage-am-see.de/product/D/582273\n",
-      "content-type: text/html; charset=UTF-8\n",
-      "content: b'<!DOCTYPE html>\\r\\n<html lang=\"ja\">\\r\\n<head>\\r\\n<meta http-equiv=\"Content-Type\" content=\"text/html; charset=UTF-8\">\\r\\n<meta http-equiv=\"X-UA-Compatible\" content=\"IE=edge\">\\r\\n<meta name=\"format-detection\" con'\n",
-      "url: http://911forum.org.uk/viewtopic.php?t=21970&sid=2cf3a3f8df71fb7cc5deeb25a0bea34d&start=30\n",
-      "content-type: text/html; charset=UTF-8\n",
-      "content: b'<!DOCTYPE html>\\n<html dir=\"ltr\" lang=\"en-gb\">\\n<head>\\n<meta charset=\"utf-8\" />\\n<meta http-equiv=\"X-UA-Compatible\" content=\"IE=edge\">\\n<meta name=\"viewport\" content=\"width=device-width, initial-scale=1\" '\n",
-      "url: http://9243591.compuguide.be/warmtepomp-kopen/\n",
-      "content-type: text/html; charset=UTF-8\n",
-      "content: b'<!doctype html>\\n<html lang=\"nl\">\\n<head>\\n\\n<!-- Global site tag (gtag.js) - Google Analytics -->\\n<script async src=\"https://www.googletagmanager.com/gtag/js?id=G-1RKG3F1CVB\"></script>\\n<script>\\n  window.'\n",
-      "url: http://98tang028.xyz/index.php/vod/play/id/111686/sid/1/nid/1.html\n",
-      "content-type: text/html; charset=utf-8\n",
-      "content: b'<!doctype html><html lang=\"en\"><head><meta charset=\"utf-8\"><meta http-equiv=\"X-UA-Compatible\" content=\"IE=edge,chrome=1\"><meta content=\"width=device-width, initial-scale=1.0, user-scalable=0\" name=\"vi'\n",
-      "url: http://9kt7.tyjyjt.net/products-services/data-networking/servers/\n",
-      "content-type: text/html; charset=UTF-8; charset=utf-8\n",
-      "content: b'<!DOCTYPE html>\\n<html class=\"no-js\" lang=\"en\"><head>\\n<meta name=\"applicable-device\" content=\"pc,mobile\"/>\\n\\t<meta charset=\"utf-8\"/>\\n\\t<meta content=\"width=device-width, initial-scale=1, maximum-scale=1,'\n",
-      "url: http://a-fleur-de-peau.fr/archives/eyes-of-the-day/midnight-crawl/\n",
-      "content-type: text/html; charset=UTF-8\n",
-      "content: b'\\r\\n\\r\\n<!DOCTYPE html>\\r\\n<!--[if IE 9]><html class=\"ie9 no-mobile-device\" lang=\"fr-FR\"> <![endif]-->\\r\\n<!--[if gt IE 9]><!--> <html class=\"no-mobile-device\" lang=\"fr-FR\"> <!--<![endif]-->\\r\\n\\r\\n<head>\\r\\n\\r\\n\\t<me'\n",
-      "url: http://aappma-sarrebourg.eu/blog/craigslist-ic.html\n",
-      "content-type: text/html\n",
-      "content: b''\n",
-      "url: http://abali.ru/tag/vozdushnye-sily/\n",
-      "content-type: text/html; charset=UTF-8\n",
-      "content: b'<!DOCTYPE html>\\n<html lang=\"ru-RU\" class=\"\" data-skin=\"light\">\\n<head>\\n\\t<meta charset=\"UTF-8\" />\\n\\t<link rel=\"profile\" href=\"http://gmpg.org/xfn/11\" />\\n\\t<title>\\xd0\\xb2\\xd0\\xbe\\xd0\\xb7\\xd0\\xb4\\xd1\\x83\\xd1\\x88\\xd0\\xbd\\xd1\\x8b\\xd0\\xb5 \\xd1\\x81\\xd0\\xb8\\xd0\\xbb\\xd1\\x8b &#8212; Abali.'\n",
-      "url: http://abanagazetesi.org/harmasonda-parke-calismalari/\n",
-      "content-type: text/html; charset=UTF-8\n",
-      "content: b'<!DOCTYPE html>\\n<html lang=\"tr\"><head>\\n<meta charset=\"UTF-8\" />\\n<meta name=\"viewport\" content=\"width=device-width, initial-scale=1\" />\\n<meta name=\"theme-color\" content=\"#d80000\" />\\n<title>  HARMASON\\xe2\\x80'\n",
-      "url: http://abcfec.performancepublishing.net/services/commercial-buildout-construction/l/487\n",
-      "content-type: text/html; charset=utf-8\n",
-      "content: b'\\r\\n\\r\\n<!DOCTYPE html>\\r\\n<!--[if lt IE 7 ]><html class=\"no-js ie6 ie\" lang=\"en\"><![endif]-->\\r\\n<!--[if IE 7 ]><html class=\"no-js ie7 ie\" lang=\"en\"><![endif]-->\\r\\n<!--[if IE 8 ]><html class=\"no-js ie8 ie\" la'\n",
-      "url: http://abolition-ms.org/es/recursos/newsletter/boletin-de-noticias-noviembre-2023/\n",
-      "content-type: text/html; charset=UTF-8\n",
-      "content: b' <!doctype html>\\r\\n<html lang=\"es-ES\">\\r\\n<head>\\r\\n\\t<meta charset=\"UTF-8\">\\r\\n\\t<meta name=\"viewport\" content=\"width=device-width, initial-scale=1.0, user-scalable=no\">\\r\\n\\t<link rel=\"profile\" href=\"http://gmp'\n",
-      "url: http://abooktopia.weebly.com/reviews/the-bone-season-by-samantha-shannon5866611\n",
-      "content-type: text/html; charset=UTF-8\n",
-      "content: b'<!DOCTYPE html>\\n<html lang=\"en\" xmlns:fb=\"http://ogp.me/ns/fb#\">\\n\\t<head>\\n\\t\\t<title>THE BONE SEASON BY SAMANTHA SHANNON - Abooktopia</title><meta property=\"og:site_name\" content=\"Abooktopia\" />\\n<meta pr'\n",
-      "url: http://abschaffung-der-jagd.at/reaktionen-jaeger-anregung-diskussion.htm\n",
-      "content-type: text/html\n",
-      "content: b'<html>\\n\\n<head>\\n<meta http-equiv=\"Content-Type\" content=\"text/html; charset=windows-1252\">\\n<meta name=\"GENERATOR\" content=\"Microsoft FrontPage 4.0\">\\n<meta name=\"ProgId\" content=\"FrontPage.Editor.Docume'\n",
-      "url: http://absoku072.com/blog-entry-3039.html\n",
-      "content-type: text/html; charset=UTF-8\n",
-      "content: b'<!DOCTYPE HTML>\\n<html>\\n<head>\\n<meta charset=\"utf-8\">\\n<meta http-equiv=\"X-UA-Compatible\" content=\"IE=edge\">\\n<link rel=\"stylesheet\" href=\"http://absoku072.com/wp-content/themes/abusoku/style.css?=202402'\n",
-      "url: http://absurd.blogo.jp/archives/49496350.html\n",
-      "content-type: text/html; charset=utf-8\n",
-      "content: b'<?xml version=\"1.0\" encoding=\"UTF-8\"?>\\n<!DOCTYPE html PUBLIC \"-//W3C//DTD XHTML 1.0 Transitional//EN\" \"http://www.w3.org/TR/xhtml1/DTD/xhtml1-transitional.dtd\">\\n<html xmlns=\"http://www.w3.org/1999/xht'\n",
-      "url: http://absurd.blogo.jp/archives/52797615.html\n",
-      "content-type: text/html; charset=utf-8\n",
-      "content: b'<?xml version=\"1.0\" encoding=\"UTF-8\"?>\\n<!DOCTYPE html PUBLIC \"-//W3C//DTD XHTML 1.0 Transitional//EN\" \"http://www.w3.org/TR/xhtml1/DTD/xhtml1-transitional.dtd\">\\n<html xmlns=\"http://www.w3.org/1999/xht'\n",
-      "url: http://academy.reihan-studio.com/become-an-instructor/\n",
-      "content-type: text/html; charset=UTF-8\n",
-      "content: b'<!DOCTYPE html>\\n<html dir=\"rtl\" lang=\"fa-IR\" >\\n<head>\\n\\t\\t<meta charset=\"UTF-8\">\\n\\t<meta name=\"viewport\" content=\"width=device-width, initial-scale=1\">\\n\\t<link rel=\"profile\" href=\"https://gmpg.org/xfn/11\"'\n",
-      "url: http://academybyga.com/2021/04/01/test-post/\n",
-      "content-type: text/html; charset=UTF-8\n",
-      "content: b'<!doctype html>\\n\\n\\n<!--[if lt IE 7]><html class=\"no-js lt-ie9 lt-ie8 lt-ie7 blog post \" lang=\"en\"> <![endif]-->\\n<!--[if IE 7]>  <html class=\"no-js lt-ie9 lt-ie8 blog post \" lang=\"en\"> <![endif]-->\\n<!--'\n",
-      "url: http://academybyga.com/2021/12/11/where-to-purchase-zithromax-500-mg-without-prescription/\n",
-      "content-type: text/html; charset=UTF-8\n",
-      "content: b'<!doctype html>\\n\\n\\n<!--[if lt IE 7]><html class=\"no-js lt-ie9 lt-ie8 lt-ie7 blog post \" lang=\"en\"> <![endif]-->\\n<!--[if IE 7]>  <html class=\"no-js lt-ie9 lt-ie8 blog post \" lang=\"en\"> <![endif]-->\\n<!--'\n",
-      "url: http://accept.bison.net/en/product.6303671\n",
-      "content-type: text/html;charset=utf-8\n",
-      "content: b'<!doctype html>\\r\\n<html xmlns=\"http://www.w3.org/1999/xhtml\" lang=\"en\" class=\"brand-site\">\\r\\n    <head>\\r\\n        <meta charset=\"utf-8\"/>\\r\\n<title>Bison | Product</title>\\r\\n<meta http-equiv=\"X-UA-Compatibl'\n",
-      "url: http://accm.de/\n",
-      "content-type: text/html\n",
-      "content: b'<!DOCTYPE html PUBLIC \"-//W3C//DTD XHTML 1.0 Frameset//EN\" \"http://www.w3.org/TR/xhtml1/DTD/xhtml1-frameset.dtd\">\\n\\n<html>\\n<head>\\n<title>EcoTaxes GmbH Steuerberatungsgesellschaft</title>\\n<meta name=\"ke'\n",
-      "url: http://accuratus.co.za/?MA\n",
-      "content-type: text/html; charset=UTF-8\n",
-      "content: b'<!DOCTYPE html><html><head><meta http-equiv=\"Content-type\" content=\"text/html; charset=UTF-8\" /><meta name=\"viewport\" content=\"width=device-width, initial-scale=1.0\" /><link rel=\"stylesheet\" href=\"/_a'\n",
-      "url: http://ace.armor.kiev.ua/forum/viewforum.php?f=1&sid=10f2b222d08c71a6df0829eabc165cac\n",
-      "content-type: text/html; charset=UTF-8\n",
-      "content: b'<!DOCTYPE html>\\n<html dir=\"ltr\" lang=\"ru\">\\n<head>\\n<meta charset=\"utf-8\" />\\n<meta http-equiv=\"X-UA-Compatible\" content=\"IE=edge\">\\n\\n<title>\\xd0\\x91\\xd1\\x80\\xd0\\xbe\\xd0\\xbd\\xd1\\x8f \\xd0\\xb2 72 \\xd0\\xbc\\xd0\\xb0\\xd1\\x81\\xd1\\x88\\xd1\\x82\\xd0\\xb0\\xd0\\xb1\\xd0\\xb5 - \\xd0\\xa4\\xd0\\xbe\\xd1\\x80\\xd1\\x83\\xd0\\xbc \\xd0\\xbc\\xd0\\xb0\\xd1\\x81\\xd1\\x88\\xd1\\x82\\xd0\\xb0\\xd0\\xb1\\xd0\\xbd'\n",
-      "url: http://ace.mu.nu/archives/400450.php\n",
-      "content-type: text/html; charset=UTF-8\n",
-      "content: b'<!DOCTYPE html PUBLIC \"-//W3C//DTD XHTML 1.0 Transitional//EN\" \"http://www.w3.org/TR/xhtml1/DTD/xhtml1-transitional.dtd\">\\n\\n<html xmlns=\"http://www.w3.org/1999/xhtml\">\\n<head>\\n<meta http-equiv=\"Content-'\n",
-      "url: http://acekitchenplace.com/contact-us/\n",
-      "content-type: text/html; charset=UTF-8\n",
-      "content: b'<!doctype html>\\r\\n<html lang=\"en-US\">\\r\\n  <head>\\r\\n    <meta charset=\"UTF-8\" />\\r\\n    <meta name=\"viewport\" content=\"width=device-width, initial-scale=1\" />\\r\\n    <link rel=\"profile\" href=\"https://gmpg.org'\n",
-      "url: http://acervo.if.usp.br/index.php/informationobject/browse?subjects=&sort=lastUpdated&collection=20485&places=852&showAdvanced=1&topLod=0\n",
-      "content-type: text/html; charset=utf-8\n",
-      "content: b'<!DOCTYPE html>\\n<html lang=\"pt\" dir=\"ltr\">\\n  <head>\\n    <meta http-equiv=\"Content-Type\" content=\"text/html; charset=utf-8\" />\\n<meta http-equiv=\"X-Ua-Compatible\" content=\"IE=edge,chrome=1\" />\\n    <meta'\n",
-      "url: http://achatina.unnat.ru/Photo/Page.Eng/Photo18.htm\n",
-      "content-type: text/html; charset=windows-1251\n",
-      "content: b'<html>\\n\\n<head>\\n<style>\\nA.t1:link { color:\"#00FFFF\"; text-decoration: none}\\nA.t1:visited { color:white; text-decoration: none}\\nA.t1:hover {color:red; text-decoration: none}\\n</style>\\n\\n\\n\\n<meta NAME=\"Desc'\n",
-      "url: http://achfin.ru/2015-01-28-12-33-25/utverzhdennye-parametry-byudzheta-goroda/2015-2017-gody-6\n",
-      "content-type: text/html; charset=utf-8\n",
-      "content: b'<!DOCTYPE html PUBLIC \"-//W3C//DTD XHTML 1.0 Transitional//EN\" \"http://www.w3.org/TR/xhtml1/DTD/xhtml1-transitional.dtd\">\\r\\n<html xmlns=\"http://www.w3.org/1999/xhtml\" xml:lang=\"ru-ru\" lang=\"ru-ru\" >\\r\\n<'\n",
-      "url: http://achfin.ru/o-byudzhete-2/normativnaya-baza/normativnaya-baza-3\n",
-      "content-type: text/html; charset=utf-8\n",
-      "content: b'<!DOCTYPE html PUBLIC \"-//W3C//DTD XHTML 1.0 Transitional//EN\" \"http://www.w3.org/TR/xhtml1/DTD/xhtml1-transitional.dtd\">\\r\\n<html xmlns=\"http://www.w3.org/1999/xhtml\" xml:lang=\"ru-ru\" lang=\"ru-ru\" >\\r\\n<'\n",
-      "url: http://acikerisim.agu.edu.tr/xmlui/browse?type=author&value=Ulu%C4%9F%2C+%C3%96zden+Melis\n",
-      "content-type: text/html;charset=utf-8\n",
-      "content: b'<!DOCTYPE html>\\n            <!--[if lt IE 7]> <html class=\"no-js lt-ie9 lt-ie8 lt-ie7\" lang=\"en\"> <![endif]-->\\n            <!--[if IE 7]>    <html class=\"no-js lt-ie9 lt-ie8\" lang=\"en\"> <![endif]-->\\n '\n",
-      "url: http://actionforswifts.blogspot.com/2020/01/modified-schwegler-1mf.html\n",
-      "content-type: text/html; charset=UTF-8\n",
-      "content: b\"<!DOCTYPE html>\\n<html class='v2' dir='ltr' xmlns='http://www.w3.org/1999/xhtml' xmlns:b='http://www.google.com/2005/gml/b' xmlns:data='http://www.google.com/2005/gml/data' xmlns:expr='http://www.googl\"\n",
-      "url: http://acworthtourism.acworth.org/directory-things_to_do/listing/acworth-depot-park/\n",
-      "content-type: text/html; charset=UTF-8\n",
-      "content: b'<!DOCTYPE html>\\n<html class=\"avada-html-layout-wide avada-html-header-position-top\" lang=\"en-US\" prefix=\"og: http://ogp.me/ns# fb: http://ogp.me/ns/fb# og: http://ogp.me/ns# business: http://ogp.me/ns'\n",
-      "url: http://adaptanet.com.br/cliente/index.php?rp=%2Fstore%2Fstreaming%2Fs&carttpl=standard_cart&language=spanish\n",
-      "content-type: text/html; charset=UTF-8\n",
-      "content: b'Site error: the <a href=\"http://www.ioncube.com\">ionCube</a> PHP Loader needs to be installed. This is a widely used PHP extension for running ionCube protected PHP code, website security and malware '\n",
-      "url: http://adeera.com.ar/newsroom/archivosrevistas/ADEERA_43.pdf#page=53\n",
-      "content-type: application/pdf\n",
-      "content: b'%PDF-1.6\\r%\\xe2\\xe3\\xcf\\xd3\\r\\n1 0 obj<</CropBox[0.0 0.0 552.756 779.528]/Parent 825 0 R/Contents 2 0 R/Rotate 0/BleedBox[0.0 0.0 552.756 779.528]/ArtBox[0.0 0.0 552.756 779.528]/Group 22 0 R/MediaBox[0.0 0.0 552.75'\n",
-      "url: http://adhkiindonesia.or.id/580-Lia-Noviana/\n",
-      "content-type: text/html; charset=UTF-8\n",
-      "content: b'<!DOCTYPE html>\\n<html dir=\"ltr\" lang=\"id\" prefix=\"og: https://ogp.me/ns#\">\\n\\n<head>\\n\\t<meta charset=\"UTF-8\">\\n\\t<meta name=\"viewport\" content=\"width=device-width, initial-scale=1, minimum-scale=1\">\\n\\t<link'\n",
-      "url: http://adr.fr/000013105314313.html\n",
-      "content-type: text/html; charset=UTF-8\n",
-      "content: b''\n",
-      "url: http://adr.fr/00001310534329.html\n",
-      "content-type: text/html; charset=UTF-8\n",
-      "content: b''\n",
-      "url: http://advantagepestonline.com/tag/plants/\n",
-      "content-type: text/html; charset=UTF-8\n",
-      "content: b'<!doctype html>\\r\\n<html lang=\"en-US\">\\r\\n<head>\\r\\n\\t<meta charset=\"UTF-8\">\\r\\n\\t<meta name=\"viewport\" content=\"width=device-width, initial-scale=1.0\">\\r\\n\\t<meta http-equiv=\"X-UA-Compatible\" content=\"IE=edge\">\\r\\n'\n",
-      "url: http://aestheticbeards.com/lm1/\n",
-      "content-type: text/html; charset=utf-8\n",
-      "content: b'<!DOCTYPE HTML PUBLIC \"-//W3C//DTD HTML 4.01 Transitional//EN\" \"http://www.w3c.org/TR/1999/REC-html401-19991224/loose.dtd\">\\r\\n<HTML xmlns=\"http://www.w3.org/1999/xhtml\">\\r\\n\\r\\n<HEAD>\\r\\n    <meta http-equiv'\n",
-      "url: http://afub.cppkw.com/product/llyb/vzllj.html\n",
-      "content-type: text/html; charset=utf-8\n",
-      "content: b'<!DOCTYPE html><html xmlns=\"http://www.w3.org/1999/xhtml\"><head>\\n<title>V\\xe9\\x94\\xa5\\xe6\\xb5\\x81\\xe9\\x87\\x8f\\xe8\\xae\\xa1\\xe5\\x8e\\x9f\\xe7\\x90\\x86,V\\xe9\\x94\\xa5\\xe5\\x9e\\x8b\\xe6\\xb5\\x81\\xe9\\x87\\x8f\\xe8\\xae\\xa1,\\xe5\\x86\\x85\\xe9\\x94\\xa5\\xe6\\xb5\\x81\\xe9\\x87\\x8f\\xe8\\xae\\xa1-\\xe6\\xb1\\x9f\\xe8\\x8b\\x8f&#28909;&#21338;RB88&#20307;&#32946;\\xe4\\xbb\\xaa\\xe8\\xa1\\xa8\\xe6\\x9c\\x89\\xe9\\x99\\x90\\xe5\\x85\\xac\\xe5\\x8f\\xb8</title>\\n<link '\n",
-      "url: http://agenciahabitatge.gencat.cat/wps/portal/serveis/convenis%20i%20contractacio/!ut/p/z0/fc09D4IwEAbgvwIDo7kLFoNjowlCIDFOtYspTYUqaflo0J9vSVjxtsu9z3vAgQE3YtaNcNoa0fn9zg-P9HrMLimJS7wVBGmSn7M8JXtEAgVwH8CNobg0xGN1qhrgvXDtTpunBfZR9Ur_dHuqX8PAKXBpjVNfB0y0UooIvY9wUuOs9BShv87K6CnQwRIchXRCarvxe2XAtlj_5nXSzSWlYfgD96k18Q!!/\n",
-      "content-type: text/html; charset=UTF-8\n",
-      "content: b'<!DOCTYPE html>\\n<html lang=\"en\" >\\n<head>\\n<!-- Google Tag Manager -->\\n<script>(function(w,d,s,l,i){w[l]=w[l]||[];w[l].push({\\'gtm.start\\':\\nnew Date().getTime(),event:\\'gtm.js\\'});var f=d.getElementsByTagNa'\n",
-      "url: http://agnenterprises.com/product/optical-bench-metal-double-rod-agn-make-s-s-rod-1-5meter-long-half-shaper-riders/\n",
-      "content-type: text/html; charset=UTF-8\n",
-      "content: b'<!DOCTYPE html>\\n<html lang=\"en-US\">\\n<head>\\n\\t<meta charset=\"UTF-8\">\\n\\t<link rel=\"profile\" href=\"http://gmpg.org/xfn/11\">\\n\\t<link rel=\"pingback\" href=\"http://agnenterprises.com/xmlrpc.php\">\\n\\n\\t\\t\\t<script>wi'\n",
-      "url: http://agorapatos.com/2020/04/14/coronavirus-camara-dos-deputados-aprova-apoio-a-estados-e-municipios/\n",
-      "content-type: text/html; charset=UTF-8\n",
-      "content: b'<!doctype html>\\n<html lang=\"pt-BR\">\\n<head>\\n\\t<meta charset=\"UTF-8\">\\n\\t<meta name=\"viewport\" content=\"width=device-width, initial-scale=1, shrink-to-fit=no\">\\n\\t<link rel=\"profile\" href=\"https://gmpg.org/x'\n",
-      "url: http://agro-product.ru/index.php?option=com_adsmanager&page=show_category&catid=114&order=0&expand=0&Itemid=39\n",
-      "content-type: text/html\n",
-      "content: b'<!DOCTYPE html PUBLIC \"-//W3C//DTD XHTML 1.0 Transitional//EN\" \\n\"http://www.w3.org/TR/xhtml1/DTD/xhtml1-transitional.dtd\">\\n<html xmlns=\"http://www.w3.org/1999/xhtml\" lang=\"ru\" \\nxml:lang=\"ru\"\\n<head>\\n<m'\n",
-      "url: http://ahirukacho.blog81.fc2.com/?mode=edit&rno=5230\n",
-      "content-type: text/html;charset=utf-8\n",
-      "content: b'<?xml version=\"1.0\" encoding=\"EUC-JP\"?>\\r\\n<!DOCTYPE html PUBLIC \"-//W3C//DTD XHTML 1.0 Transitional//EN\" \"http://www.w3.org/TR/xhtml1/DTD/xhtml1-transitional.dtd\">\\r\\n<html xmlns=\"http://www.w3.org/1999/'\n",
-      "url: http://ahshfshygzc.com/shfw/hyindex.xp?doAction=news&menuid=0045\n",
-      "content-type: text/html;charset=GBK\n",
-      "content: b'<!DOCTYPE html PUBLIC \"-//W3C//DTD XHTML 1.0 Transitional//EN\" \"http://www.w3.org/TR/xhtml1/DTD/xhtml1-transitional.dtd\">\\r\\n<html xmlns=\"http://www.w3.org/1999/xhtml\">\\r\\n<head>\\r\\n<meta http-equiv=\"Conten'\n",
-      "url: http://aindahing.info/diary/ain-dah-ing/winter-sale/\n",
-      "content-type: text/html; charset=UTF-8\n",
-      "content: b'\\n<!DOCTYPE html>\\n<html lang=\"ja\">\\n<head>\\n<!-- Basic Page Needs\\n================================================== -->\\n<meta charset=\"UTF-8\">\\n<meta name=\"viewport\" content=\"width=device-width, initial-'\n",
-      "url: http://airambulanceworld.com/medical-flight/arkansas/strong/\n",
-      "content-type: text/html; charset=UTF-8\n",
-      "content: b'<!DOCTYPE html>\\r\\n<!--[if lt IE 7]>      <html class=\"no-js lt-ie9 lt-ie8 lt-ie7\"> <![endif]-->\\r\\n<!--[if IE 7]>         <html class=\"no-js lt-ie9 lt-ie8\"> <![endif]-->\\r\\n<!--[if IE 8]>         <html cla'\n",
-      "url: http://airstation734.blog.fc2.com/blog-entry-1360.html\n",
-      "content-type: text/html;charset=utf-8\n",
-      "content: b'<!DOCTYPE html PUBLIC \"-//W3C//DTD XHTML 1.0 Transitional//EN\" \"http://www.w3.org/TR/xhtml1/DTD/xhtml1-transitional.dtd\">\\r\\n<html xmlns=\"http://www.w3.org/1999/xhtml\"  >\\r\\n<head>\\r\\n<meta http-equiv=\"Cont'\n",
-      "url: http://airstation734.blog.fc2.com/blog-entry-2141.html\n",
-      "content-type: text/html;charset=utf-8\n",
-      "content: b'<!DOCTYPE html PUBLIC \"-//W3C//DTD XHTML 1.0 Transitional//EN\" \"http://www.w3.org/TR/xhtml1/DTD/xhtml1-transitional.dtd\">\\r\\n<html xmlns=\"http://www.w3.org/1999/xhtml\"  >\\r\\n<head>\\r\\n<meta http-equiv=\"Cont'\n",
-      "url: http://aisaikamasa.blog91.fc2.com/blog-entry-201.html\n",
-      "content-type: text/html;charset=utf-8\n",
-      "content: b'<!DOCTYPE html PUBLIC \"-//W3C//DTD XHTML 1.0 Transitional//EN\" \"http://www.w3.org/TR/xhtml1/DTD/xhtml1-transitional.dtd\">\\r\\n<html xmlns=\"http://www.w3.org/1999/xhtml\" xml:lang=\"ja\" lang=\"ja\">\\r\\n<head>\\r\\n'\n",
-      "url: http://aiuas.cn/En_Ct_index_gci_16.html\n",
-      "content-type: text/html; charset=utf-8\n",
-      "content: b'<!DOCTYPE html><!-- saved from url=(0035)http://www.coldec.cn/index.do?store --><html lang=\"zh-cn\"><head><!--  --><meta http-equiv=\"Content-Type\" content=\"text/html; charset=UTF-8\"><meta name=\"viewpor'\n",
-      "url: http://ajansahiska.com/sayfa/genel-bilgi-surgun-i6.html\n",
-      "content-type: text/html; charset=utf-8\n",
-      "content: b'<!doctype html><html dir=\"ltr\" lang=\"tr\" itemid=\"http://ajansahiska.com/sayfa/genel-bilgi-surgun-i6.html\" itemscope=\"\" itemtype=\"http://schema.org/NewsArticle\" xmlns:og=\"http://opengraphprotocol.org/s'\n",
-      "url: http://ajboudoir.blogspot.com/2010/11/evolution-girls.html\n",
-      "content-type: text/html; charset=UTF-8\n",
-      "content: b\"<!DOCTYPE html>\\n<html class='v2' dir='ltr' lang='en'>\\n<head>\\n<link href='https://www.blogger.com/static/v1/widgets/3566091532-css_bundle_v2.css' rel='stylesheet' type='text/css'/>\\n<meta content='width\"\n",
-      "url: http://ajboudoir.blogspot.com/2013/04/couple-boudoir-photography-stony-plain.html\n",
-      "content-type: text/html; charset=UTF-8\n",
-      "content: b\"<!DOCTYPE html>\\n<html class='v2' dir='ltr' lang='en'>\\n<head>\\n<link href='https://www.blogger.com/static/v1/widgets/3566091532-css_bundle_v2.css' rel='stylesheet' type='text/css'/>\\n<meta content='width\"\n",
-      "url: http://ajstovesphotography.co.uk/albums/Mr---Mrs-Routledge-Wedding/297343/Nikky-and-Neils-wedding-a37\n",
-      "content-type: text/html; charset=utf-8\n",
-      "content: b'\\r\\n\\r\\n<!DOCTYPE html>\\r\\n<html lang=\"en-us\">\\r\\n<head><title>Nikky and Neils wedding-a37.jpg | Mr & Mrs Routledge Wedding | AJ.Stoves Photography</title>\\r\\n    <!--meta-->\\r\\n    <meta name=\"viewport\" content='\n",
-      "url: http://akabane.cocolog-nifty.com/hotcafe/2013/12/post-c92f.html\n",
-      "content-type: text/html; charset=UTF-8\n",
-      "content: b'<!DOCTYPE html PUBLIC \"-//W3C//DTD XHTML 1.0 Transitional//EN\"\\n\\t\"http://www.w3.org/TR/xhtml1/DTD/xhtml1-transitional.dtd\">\\n<html xmlns=\"http://www.w3.org/1999/xhtml\" id=\"sixapart-standard\">\\n<head>\\n\\t\\n\\t'\n",
-      "url: http://akihadai.ed.jp/akihadai/news/14298/\n",
-      "content-type: text/html; charset=UTF-8\n",
-      "content: b'<!DOCTYPE html>\\r\\n<html lang=\"ja\">\\r\\n\\r\\n\\r\\n<head>\\r\\n<!-- Google Tag Manager -->\\r\\n<script>(function(w,d,s,l,i){w[l]=w[l]||[];w[l].push({\\'gtm.start\\':\\r\\nnew Date().getTime(),event:\\'gtm.js\\'});var f=d.getElement'\n",
-      "url: http://akorda.info/kz/executive_office/schedule\n",
-      "content-type: text/html; charset=UTF-8\n",
-      "content: b'<!DOCTYPE html>\\n<html lang=\"kz\">\\n<head>\\n  <meta charset=\"UTF-8\">\\n  <meta name=\"viewport\" content=\"width=device-width, initial-scale=1, shrink-to-fit=no\">\\n  <meta http-equiv=\"x-ua-compatible\" content=\"'\n",
-      "url: http://akshskzzzx.yizhumao.com/ProductDetail/7273123.html\n",
-      "content-type: text/html;charset=utf-8\n",
-      "content: b'<!-- cache for /ProductDetail/7273123.html 2024-02-20 23:36:14-->\\r\\n<!DOCTYPE html>\\n<html><!--PHP-->\\n<head>\\n\\t<title>\\xe4\\xba\\xac\\xe4\\xb8\\x9c\\xe5\\xa4\\xa7\\xe8\\x8d\\xaf\\xe6\\x88\\xbf\\xe5\\x85\\xa5\\xe9\\xa9\\xbb\\xe5\\x85\\xa5\\xe5\\x8f\\xa3,\\xe4\\xba\\xac\\xe4\\xb8\\x9c\\xe5\\x8d\\x83\\xe5\\xb1\\xb1\\xe5\\x81\\xa5\\xe5\\xba\\xb7\\xe5\\x85\\xa5\\xe9\\xa9\\xbb</title>\\n\\t<meta name=\"keywords\" c'\n",
-      "url: http://alanandrews.net/live/zuqiu/22386.html\n",
-      "content-type: text/html; charset=utf-8\n",
-      "content: b'\\r\\n<!doctype html>\\r\\n<html>\\r\\n<head>\\r\\n<meta charset=\"utf-8\">\\r\\n<meta http-equiv=\"X-UA-Compatible\" content=\"IE=edge\" >\\r\\n<meta name=\"renderer\" content=\"webkit\">\\r\\n<title>2024\\xe5\\xb9\\xb402\\xe6\\x9c\\x8804\\xe6\\x97\\xa5\\xe6\\x98\\x9f\\xe6\\x9c\\x9f\\xe6\\x97\\xa5 \\xe8\\xb4\\xb9\\xe8\\x90\\xa8\\xe9\\x87\\x8c\\xe5'\n",
-      "url: http://alasdairstuart.com/tag/paseudopod/\n",
-      "content-type: text/html; charset=UTF-8\n",
-      "content: b'<!DOCTYPE html>\\r\\n<html lang=\"en-US\">\\r\\n<head>\\r\\n<meta charset=\"UTF-8\">\\r\\n<meta name=\"viewport\" content=\"width=device-width, initial-scale=1\">\\r\\n\\t <link rel=\"profile\" href=\"https://gmpg.org/xfn/11\"> \\r\\n\\t <m'\n",
-      "url: http://alba.selitondemo.ro/product/405/rochie-pentru-corp-sculpturat.html\n",
-      "content-type: text/html; charset=utf-8\n",
-      "content: b'<!DOCTYPE html>\\n<html lang=\"ro\">\\n<head>\\n\\t<meta http-equiv=\"Content-Type\" content=\"text/html; charset=utf-8\" />\\n<meta http-equiv=\"Content-Script-Type\" content=\"text/javascript\" />\\n<meta http-equiv=\"Con'\n",
-      "url: http://albadoors.ru/internet-magazin/product/dver-magnoliya-lyuks-73l-st-grafit-2000-800\n",
-      "content-type: text/html; charset=utf-8\n",
-      "content: b'\\n        <!doctype html>\\n<html lang=\"ru\">\\n<head>\\n<meta charset=\"utf-8\">\\n<meta name=\"robots\" content=\"all\" />\\n<title>\\xd0\\x9a\\xd1\\x83\\xd0\\xbf\\xd0\\xb8\\xd1\\x82\\xd1\\x8c \\xd0\\xb2 \\xd0\\x9d\\xd0\\xb8\\xd0\\xb6\\xd0\\xbd\\xd0\\xb5 \\xd0\\x9d\\xd0\\xbe\\xd0\\xb2\\xd0\\xb3\\xd0\\xbe\\xd1\\x80\\xd0\\xbe\\xd0\\xb4\\xd0\\xb5 - \\xd0\\x9c\\xd0\\xb5\\xd0\\xb6\\xd0\\xba\\xd0\\xbe\\xd0\\xbc\\xd0\\xbd\\xd0\\xb0\\xd1\\x82\\xd0\\xbd\\xd0\\xb0\\xd1\\x8f \\xd0\\xb4\\xd0\\xb2\\xd0\\xb5\\xd1\\x80\\xd1\\x8c '\n",
-      "url: http://albergueweb1.uva.es/guias/guias2122/55050/1/\n",
-      "content-type: text/html;charset=UTF-8\n",
-      "content: b'<!DOCTYPE HTML PUBLIC \"-//W3C//DTD HTML 3.2 Final//EN\">\\n<html>\\n <head>\\n  <title>Index of /guias/guias2122/55050/1</title>\\n </head>\\n <body>\\n<h1>Index of /guias/guias2122/55050/1</h1>\\n<pre><img src=\"/ic'\n",
-      "url: http://albinism.life/\n",
-      "content-type: text/html; charset=utf-8\n",
-      "content: b'<!doctype html>\\n<html data-adblockkey=\"MFwwDQYJKoZIhvcNAQEBBQADSwAwSAJBANDrp2lz7AOmADaN8tA50LsWcjLFyQFcb/P2Txc58oYOeILb3vBw7J6f4pamkAQVSQuqYsKx3YzdUHCvbVZvFUsCAwEAAQ==_GrU/PDdGnTPi+4NwAyrXdT3uKJnQvoKe'\n",
-      "url: http://alerugby.over-blog.net/tag/carnet%20noir/\n",
-      "content-type: text/html; charset=UTF-8\n",
-      "content: b'<!DOCTYPE html>\\n<html lang=\"fr\">\\n    <head>             \\n  \\n          \\n          \\n                                                                                                    \\n                 '\n",
-      "url: http://alexandravoronina.ru/prinimajte-reshenie-i-dejstvujte-intervyu-s-miloj-kolokolovoj/?replytocom=514\n",
-      "content-type: text/html; charset=utf-8\n",
-      "content: b'<!DOCTYPE html>\\n<html lang=\"ru-RU\" itemscope itemtype=\"http://schema.org/Product\" class=\"no-js\">\\n<head>\\n  <meta charset=\"utf-8\">\\n  <script type=\"text/javascript\">\\n  // <![CDATA[\\n  // < ![CDATA[\\n  var '\n",
-      "url: http://alexboom.de/a-most-humble-year-2021-in-review/\n",
-      "content-type: text/html; charset=UTF-8\n",
-      "content: b'<!DOCTYPE HTML>\\r\\n<html lang=\"de-DE\">\\r\\n<head>\\r\\n\\t<meta charset=\"UTF-8\">\\r\\n\\t<title>A most humble year: 2021 in review &#8211; Der Content mit dem Knalleffekt!</title>\\n<meta name=\\'robots\\' content=\\'max-imag'\n",
-      "url: http://alexnerygravura.blogspot.com/\n",
-      "content-type: text/html; charset=UTF-8\n",
-      "content: b\"<!DOCTYPE html>\\n<html dir='ltr' xmlns='http://www.w3.org/1999/xhtml' xmlns:b='http://www.google.com/2005/gml/b' xmlns:data='http://www.google.com/2005/gml/data' xmlns:expr='http://www.google.com/2005/\"\n",
-      "url: http://alicegame.xsrv.jp/hina/old_log.php?room_no=4742&reverse_log=on&heaven_only=on&icon=on&personal_result=on&time=on&db_no=5\n",
-      "content-type: text/html; charset=UTF-8\n",
-      "content: b'<!DOCTYPE html>\\n<html lang=\"ja\">\\n<head>\\n<meta charset=\"UTF-8\">\\n<title>[4742\\xe7\\x95\\xaa\\xe5\\x9c\\xb0] \\xe3\\x80\\x90\\xe9\\x9b\\x9b4062\\xe3\\x80\\x91\\xe3\\x82\\x84\\xe3\\x82\\x8b\\xe5\\xa4\\xab\\xe9\\x81\\x94\\xe3\\x81\\xae\\xe5\\xb8\\x8c\\xe6\\x9c\\x9b\\xe6\\xb4\\xbe\\xe7\\x94\\x9f\\xe8\\xb6\\x85\\xe9\\x97\\x87 - \\xe6\\xb1\\x9d\\xe3\\x81\\xaf\\xe4\\xba\\xba\\xe7\\x8b\\xbc\\xe3\\x81\\xaa\\xe3\\x82\\x8a\\xe3\\x82\\x84\\xef\\xbc\\x9f[\\xe9\\x81\\x8e\\xe5\\x8e\\xbb\\xe3\\x83\\xad\\xe3\\x82\\xb0]</title>\\n<link rel=\"stylesheet'\n",
-      "url: http://alicegame.xsrv.jp/hina/old_log.php?room_no=985&heaven_talk=on&heaven_only=on&add_role=on&time=on&icon=on&db_no=1\n",
-      "content-type: text/html; charset=UTF-8\n",
-      "content: b'<!DOCTYPE html>\\n<html lang=\"ja\">\\n<head>\\n<meta charset=\"UTF-8\">\\n<title>[985\\xe7\\x95\\xaa\\xe5\\x9c\\xb0] \\xe3\\x80\\x90\\xe9\\x9b\\x9b817\\xe3\\x80\\x91\\xe3\\x82\\x84\\xe3\\x82\\x8b\\xe5\\xa4\\xab\\xe3\\x81\\x9f\\xe3\\x81\\xa1\\xe3\\x81\\xae\\xe8\\xb6\\x85\\xe9\\x97\\x87\\xe9\\x8d\\x8b - \\xe6\\xb1\\x9d\\xe3\\x81\\xaf\\xe4\\xba\\xba\\xe7\\x8b\\xbc\\xe3\\x81\\xaa\\xe3\\x82\\x8a\\xe3\\x82\\x84\\xef\\xbc\\x9f[\\xe9\\x81\\x8e\\xe5\\x8e\\xbb\\xe3\\x83\\xad\\xe3\\x82\\xb0]</title>\\n<link rel=\"stylesheet\" href=\"'\n",
-      "url: http://aliyavaleeva.ru/?attachment_id=4302\n",
-      "content-type: text/html; charset=UTF-8\n",
-      "content: b'<!DOCTYPE html>\\n<html lang=\"ru-RU\">\\n<head>\\n\\t<meta charset=\"UTF-8\" />\\n\\t\\n\\t<title>_J2A0862 | aliyavaleeva.ru</title>\\n\\n\\t\\n\\t\\t\\t\\n\\t\\t\\t\\t\\t\\t<meta name=\"viewport\" content=\"width=device-width,initial-scale=1,user-sc'\n",
-      "url: http://allthink.com/2158292/conserje\n",
-      "content-type: text/html; charset=UTF-8\n",
-      "content: b'<!DOCTYPE html>\\r\\n<html lang=\"en\">\\r\\n<head>\\r\\n<title>Conserje (tt0149969)</title>\\r\\n<meta charset=\"utf-8\" />\\r\\n<meta name=\"viewport\" content=\"width=device-width, initial-scale=1.0\" />\\r\\n<link rel=\"icon\" typ'\n",
-      "url: http://almanaquedasirmandades.gal/efemeride/maruxa-mallo-2036-02-06/\n",
-      "content-type: text/html; charset=UTF-8\n",
-      "content: b'<!DOCTYPE html>\\n<html class=\"no-js\" lang=\"gl-ES\" xmlns:fb=\"https://www.facebook.com/2008/fbml\" xmlns:addthis=\"https://www.addthis.com/help/api-spec\" >\\n<head>\\n<meta charset=\"UTF-8\">\\n<meta name=\"viewpor'\n",
-      "url: http://almaz.zp.ua/product/53857/materinskaya-plata-sfm2-biostar-a58md-bulk-amd-a55.html\n",
-      "content-type: text/html; charset=windows-1251\n",
-      "content: b'<!DOCTYPE HTML PUBLIC \"-//W3C//DTD HTML 4.01 Transitional//EN\" \"http://www.w3.org/TR/html4/loose.dtd\">\\n<html>\\n<head>\\n<title>\\xca\\xf3\\xef\\xe8\\xf2\\xfc \\xcc\\xe0\\xf2\\xe5\\xf0\\xe8\\xed\\xf1\\xea\\xe0\\xff \\xcf\\xeb\\xe0\\xf2\\xe0 sFM2+ Biostar A58MD Bulk AMD A55, 2*DDR3, 4*SATAII,'\n",
-      "url: http://alpacafarmtrivia.herokuapp.com/questions/18791\n",
-      "content-type: text/html; charset=utf-8\n",
-      "content: b'<!DOCTYPE html>\\n<html>\\n<head>\\n  <title>AlpacaFarm</title>\\n  <link rel=\"stylesheet\" media=\"all\" href=\"/assets/application-b12c99378c13cc251766fb6bbdf0395b1c98c9238c81e6ed62689b4091eb9c8a.css\" data-turb'\n",
-      "url: http://alrashadmarine.com/yefm-57102seti\n",
-      "content-type: text/html;charset=UTF-8\n",
-      "content: b'<!DOCTYPE html>\\r\\n<html lang=\"ja\">\\r\\n<head>\\r\\n    <meta charset=\"utf-8\">\\r\\n    <meta content=\"no-cache\" http-equiv=\"Pragma\"/>\\r\\n    <meta content=\"no-store, no-cache, must-revalidate\" http-equiv=\"Cache-Con'\n",
-      "url: http://altawap.ru/forum/index.php?topic=3\n",
-      "content-type: text/html; charset=UTF-8\n",
-      "content: b'<!DOCTYPE html>\\n<html lang=\"en\">\\n<head>\\n<meta charset=\"utf-8\">\\n<meta http-equiv=\"X-UA-Compatible\" content=\"IE=edge\">\\n<meta name=\"viewport\" content=\"width=device-width, initial-scale=1.0, maximum-scale'\n",
-      "url: http://alternatywadlalukowa.pl/question/%D0%B1%D1%80%D0%BE%D1%88%D1%83-%D0%BA%D1%83%D1%80%D0%B8%D1%82%D1%8C-%D0%B8-%D0%BF%D0%B8%D1%82%D1%8C-%D0%BA%D0%B0%D0%B7%D0%B0%D1%87%D0%B5%D0%BD%D0%BA%D0%BE/\n",
-      "content-type: text/html; charset=UTF-8\n",
-      "content: b'<!DOCTYPE html>\\n<html lang=\"pl-PL\">\\n\\n<head>\\n\\t<meta charset=\\'UTF-8\\'>\\n\\t<meta name=\"viewport\" content=\"width=device-width, initial-scale=1\">\\n\\t<link rel=\"profile\" href=\"http://gmpg.org/xfn/11\">\\n\\t\\t<title>\\xd0'\n",
-      "url: http://alternatywadlalukowa.pl/question/%D0%BA%D0%BE%D0%BB%D0%BC%D0%B5-%D1%86%D0%B5%D0%BD%D0%B0-%D0%B8-%D0%B8%D0%BD%D1%81%D1%82%D1%80%D1%83%D0%BA%D1%86%D0%B8%D1%8F-%D0%BF%D0%BE-%D0%BF%D1%80%D0%B8%D0%BC%D0%B5%D0%BD%D0%B5%D0%BD%D0%B8%D1%8E/\n",
-      "content-type: text/html; charset=UTF-8\n",
-      "content: b'<!DOCTYPE html>\\n<html lang=\"pl-PL\">\\n\\n<head>\\n\\t<meta charset=\\'UTF-8\\'>\\n\\t<meta name=\"viewport\" content=\"width=device-width, initial-scale=1\">\\n\\t<link rel=\"profile\" href=\"http://gmpg.org/xfn/11\">\\n\\t\\t<title>\\xd0'\n",
-      "url: http://alternatywadlalukowa.pl/question/piano-di-dieta-vegan-detox-5-giorni/\n",
-      "content-type: text/html; charset=UTF-8\n",
-      "content: b'<!DOCTYPE html>\\n<html lang=\"pl-PL\">\\n\\n<head>\\n\\t<meta charset=\\'UTF-8\\'>\\n\\t<meta name=\"viewport\" content=\"width=device-width, initial-scale=1\">\\n\\t<link rel=\"profile\" href=\"http://gmpg.org/xfn/11\">\\n\\t\\t<title>P'\n",
-      "url: http://amamiguide.main.jp/buyer101/breastfeeding-stylish-pajamas-301964.html\n",
-      "content-type: text/html; charset=UTF-8\n",
-      "content: b'\\n\\n\\n\\n\\n\\n\\n\\n\\n\\n\\n'\n",
-      "url: http://amamiguide.main.jp/cateogry59/jade-3rd-row-175365.html\n",
-      "content-type: text/html; charset=UTF-8\n",
-      "content: b'\\n\\n\\n\\n\\n\\n\\n\\n\\n\\n\\n'\n",
-      "url: http://amamiguide.main.jp/module15/panasonic-switch-initialization-42260.html\n",
-      "content-type: text/html; charset=UTF-8\n",
-      "content: b'\\n\\n\\n\\n\\n\\n\\n\\n\\n\\n\\n'\n",
-      "url: http://amazonastotal.com.br/marca-de-beleza-cria-kits-de-presentes-para-brincadeiras-de-amigo-secreto/\n",
-      "content-type: text/html; charset=UTF-8\n",
-      "content: b'<!DOCTYPE html>\\r\\n<html lang=\"pt-BR\" xmlns:fb=\"https://www.facebook.com/2008/fbml\" xmlns:addthis=\"https://www.addthis.com/help/api-spec\" >\\r\\n<head>\\r\\n\\t\\t\\t<meta charset=\"UTF-8\" />\\r\\n\\t\\t<meta name=\"viewport\" '\n",
-      "url: http://amenohitorigoto.blog.fc2.com/blog-entry-3492.html\n",
-      "content-type: text/html;charset=utf-8\n",
-      "content: b'<!DOCTYPE html>\\r\\n<html >\\r\\n<head>\\r\\n<meta charset=\"utf-8\">\\r\\n<meta name=\"author\" content=\"\\xe3\\x81\\xaf\\xe3\\x82\\x8b\" />\\r\\n<meta name=\"description\" content=\"\\xe3\\x82\\xa2\\xe3\\x83\\xa1\\xe3\\x82\\xb7\\xe3\\x83\\xa7\\xe2\\x99\\x80\\xe3\\x81\\xae*\\xe3\\x81\\x82\\xe3\\x82\\x81* \\xef\\xbc\\x86 \\xe4\\xb8\\x89\\xe6\\xaf\\x9b\\xe7\\x8c\\xab\\xe3\\x83\\xaa\\xe3\\x83\\xaa\\xe3\\x83\\xa9\\xe3\\x83\\xa9 \\xe3\\x81\\xa8 \\xe9\\xa3\\xbc\\xe3\\x81\\x84\\xe4\\xb8\\xbb\\xe3'\n",
-      "url: http://amis.zoo-logique.org/forum/index.php?sid=afab564828c7d94b8cb36ac01d136333\n",
-      "content-type: text/html\n",
-      "content: b'<!DOCTYPE HTML PUBLIC \"-//W3C//DTD HTML 4.01 Transitional//EN\">\\n<html dir=\"LTR\">\\n<head>\\n<meta http-equiv=\"Content-Type\" content=\"text/html; charset=ISO-8859-1\">\\n<meta http-equiv=\"Content-Style-Type\" c'\n",
-      "url: http://amorebello.blogspot.com/2005/11/pictures-galore.html\n",
-      "content-type: text/html; charset=UTF-8\n",
-      "content: b\"<!DOCTYPE html>\\n<html class='v2' dir='ltr' lang='en-US'>\\n<head>\\n<link href='https://www.blogger.com/static/v1/widgets/3566091532-css_bundle_v2.css' rel='stylesheet' type='text/css'/>\\n<meta content='wi\"\n",
-      "url: http://analforum.net/viewforum.php?f=6&start=39750\n",
-      "content-type: text/html; charset=UTF-8\n",
-      "content: b'<!DOCTYPE html PUBLIC \"-//W3C//DTD XHTML 1.0 Strict//EN\" \"http://www.w3.org/TR/xhtml1/DTD/xhtml1-strict.dtd\">\\r\\n<html xmlns=\"http://www.w3.org/1999/xhtml\" dir=\"ltr\" lang=\"en-gb\" xml:lang=\"en-gb\">\\r\\n<hea'\n",
-      "url: http://analogical-dictionary.sensagent.com/ma214266/ML-en-en/\n",
-      "content-type: text/html;charset=UTF-8\n",
-      "content: b'\\r\\n\\r\\n\\r\\n\\r\\n\\r\\n\\r\\n\\r\\n\\r\\n\\r\\n\\r\\n\\r\\n\\r\\n\\r\\n\\r\\n\\r\\n\\r\\n\\r\\n\\r\\n\\r\\n\\r\\n\\r\\n\\r\\n\\r\\n\\r\\n\\r\\n\\r\\n\\r\\n\\r\\n\\r\\n\\r\\n<!DOCTYPE HTML PUBLIC \"-//W3C//DTD HTML 4.01 Transitional//EN\" \"http://www.w3.org/TR/html4/loose.dtd\">\\r\\n<html version=\"-//w3c//dtd html 4.01'\n",
-      "url: http://ando-travel.com.ua/preuve-en-tenant-bad-comme-avis-sur-un-blog-en/\n",
-      "content-type: text/html; charset=UTF-8\n",
-      "content: b'<!DOCTYPE html>\\n\\n<html lang=\"ru-RU\">\\n\\n<head itemscope=\"itemscope\" itemtype=\"https://schema.org/WebSite\" >\\n\\t<meta charset=\"UTF-8\">\\n\\t<meta name=\"viewport\" content=\"width=device-width, initial-scale=1\">\\n'\n",
-      "url: http://anearful.blogspot.com/2017/03/collapsing-into-nordic-affects.html\n",
-      "content-type: text/html; charset=UTF-8\n",
-      "content: b\"<!DOCTYPE html>\\n<html dir='ltr' xmlns='http://www.w3.org/1999/xhtml' xmlns:b='http://www.google.com/2005/gml/b' xmlns:data='http://www.google.com/2005/gml/data' xmlns:expr='http://www.google.com/2005/\"\n",
-      "url: http://animaths.com/tag/car-hire-usa-age/\n",
-      "content-type: text/html; charset=UTF-8\n",
-      "content: b'<!DOCTYPE html>\\n<html lang=\"en-US\">\\n<head>\\n\\t<meta charset=\"UTF-8\" />\\n\\t<meta name=\"viewport\" content=\"width=device-width, initial-scale=1\" />\\n<meta name=\\'robots\\' content=\\'max-image-preview:large\\' />\\n<t'\n",
-      "url: http://ankaemlak.com.tr/emlak/agaoglu-my-world-europe-satilik-2plus1-daire/\n",
-      "content-type: text/html; charset=UTF-8\n",
-      "content: b'<!DOCTYPE html>\\r\\n<html lang=\"tr\">\\r\\n<head>\\r\\n<meta charset=\"UTF-8\" />\\r\\n<meta name=\"viewport\" content=\"width=device-width, initial-scale=1,user-scalable=no\">\\r\\n<link rel=\"pingback\" href=\"http://ankaemlak.'\n",
-      "url: http://anmedio.pl/\n",
-      "content-type: text/html; charset=UTF-8\n",
-      "content: b'<!DOCTYPE html>\\r\\n\\r\\n<html lang=\"pl-PL\" class=\"no-js\">\\r\\n<head>\\r\\n\\t\\r\\n\\t<meta charset=\"UTF-8\">\\r\\n\\t\\r\\n\\t<meta name=\"viewport\" content=\"width=device-width, initial-scale=1, maximum-scale=1, user-scalable=0\" /><t'\n",
-      "url: http://annapurnapharmacy.com/drug/7406-aktive-sacro-lumbar-support-xxl\n",
-      "content-type: text/html; charset=UTF-8\n",
-      "content: b'<!DOCTYPE html>\\n<html lang=\"en\">\\n<head>\\n    <meta charset=\"utf-8\">\\n    <meta http-equiv=\"X-UA-Compatible\" content=\"IE=edge\">\\n    <meta name=\"viewport\" content=\"width=device-width, initial-scale=1\">\\n  '\n",
-      "url: http://annesnyder.org/2014/06/30/canaries-in-the-cultural-coal-mine/caneries-in-the-cultural-coalmine/\n",
-      "content-type: text/html; charset=UTF-8\n",
-      "content: b'<!DOCTYPE html>\\r\\n<html lang=\"en-US\">\\r\\n<head>\\r\\n\\r\\n\\t<meta charset=\"UTF-8\">\\r\\n\\t<meta http-equiv=\"X-UA-Compatible\" content=\"IE=edge\">\\r\\n\\t<meta name=\"viewport\" content=\"width=device-width, initial-scale=1\">\\r\\n'\n",
-      "url: http://anoia.pigaim.cat/manual/de/howto/htaccess.html\n",
-      "content-type: text/html\n",
-      "content: b'<?xml version=\"1.0\" encoding=\"ISO-8859-1\"?>\\n<!DOCTYPE html PUBLIC \"-//W3C//DTD XHTML 1.0 Strict//EN\" \"http://www.w3.org/TR/xhtml1/DTD/xhtml1-strict.dtd\">\\n<html xmlns=\"http://www.w3.org/1999/xhtml\" lan'\n",
-      "url: http://anokovcheg.ru/%D0%BF%D1%80%D0%BE%D0%B3%D1%80%D0%B0%D0%BC%D0%BC%D1%8B-%D0%BF%D1%80%D0%BE%D0%B5%D0%BA%D1%82%D1%8B-%D0%B0%D0%BA%D1%86%D0%B8%D0%B8/%D0%B0%D0%BA%D1%86%D0%B8%D0%B8-%D0%BF%D0%BE-%D1%81%D0%B1%D0%BE%D1%80%D1%83-%D0%BF%D0%BE%D0%BC%D0%BE%D1%89%D0%B8/\n",
-      "content-type: text/html; charset=UTF-8\n",
-      "content: b'        <!DOCTYPE html>\\n        <html lang=\"ru-RU\">\\n        \\n<head>\\n\\t\\t<meta charset=\"UTF-8\">\\n\\t\\t<meta name=\"viewport\" content=\"width=device-width, initial-scale=1, minimum-scale=1\">\\n\\t\\t<link rel=\"profil'\n",
-      "url: http://anokovcheg.ru/category/%D0%B0%D0%BA%D1%86%D0%B8%D0%B8-%D0%BF%D0%BE-%D1%81%D0%B1%D0%BE%D1%80%D1%83-%D0%BF%D0%BE%D0%BC%D0%BE%D1%89%D0%B8-2/\n",
-      "content-type: text/html; charset=UTF-8\n",
-      "content: b'        <!DOCTYPE html>\\n        <html lang=\"ru-RU\">\\n        \\n<head>\\n\\t\\t<meta charset=\"UTF-8\">\\n\\t\\t<meta name=\"viewport\" content=\"width=device-width, initial-scale=1, minimum-scale=1\">\\n\\t\\t<link rel=\"profil'\n",
-      "url: http://anoreksja.org.pl/viewtopic.php?f=17&p=2535590&sid=8b5b01432fb4334c6fa2170f66894cef\n",
-      "content-type: text/html; charset=UTF-8\n",
-      "content: b'<!DOCTYPE html PUBLIC \"-//W3C//DTD XHTML 1.0 Transitional//EN\" \"http://www.w3.org/TR/xhtml1/DTD/xhtml1-transitional.dtd\">\\n<html xmlns=\"http://www.w3.org/1999/xhtml\" dir=\"ltr\" lang=\"pl-PL\" xml:lang=\"pl'\n",
-      "url: http://anosaka.blog.fc2.com/blog-entry-309.html\n",
-      "content-type: text/html;charset=utf-8\n",
-      "content: b'<?xml version=\"1.0\" encoding=\"utf-8\"?>\\r\\n<!DOCTYPE html PUBLIC \"-//W3C//DTD XHTML 1.0 Transitional//EN\" \"http://www.w3.org/TR/xhtml1/DTD/xhtml1-transitional.dtd\">\\r\\n<html xmlns=\"http://www.w3.org/1999/x'\n",
-      "url: http://another-place.cocolog-nifty.com/field/2012/10/post-0bab.html\n",
-      "content-type: text/html; charset=UTF-8\n",
-      "content: b'<!DOCTYPE html PUBLIC \"-//W3C//DTD XHTML 1.0 Transitional//EN\" \"http://www.w3.org/TR/xhtml1/DTD/xhtml1-transitional.dtd\">\\n\\n<html xmlns=\"http://www.w3.org/1999/xhtml\">\\n<head>\\n<meta http-equiv=\"Content-'\n",
-      "url: http://anticult.minibird.jp/cgi/cgi09/light.cgi?res=7806\n",
-      "content-type: text/html\n",
-      "content: b'<!DOCTYPE HTML PUBLIC \"-//W3C//DTD HTML 4.01 Transitional//EN\">\\n<html lang=\"ja\">\\n<head>\\n<meta http-equiv=\"content-type\" content=\"text/html; charset=shift_jis\">\\n<meta http-equiv=\"content-script-type\" c'\n",
-      "url: http://antoinepoulain.com/html/work/cinema/shade/shade_09.htm\n",
-      "content-type: text/html\n",
-      "content: b'<!DOCTYPE HTML PUBLIC \"-//W3C//DTD HTML 4.01 Transitional//EN\"\\n\"http://www.w3.org/TR/html4/loose.dtd\">\\n<html><!-- InstanceBegin template=\"/Templates/base_noire.dwt\" codeOutsideHTMLIsLocked=\"false\" -->'\n",
-      "url: http://antoninosaggio.blogspot.com/2010/12/convegno-marcello-piacentini-161718.html\n",
-      "content-type: text/html; charset=UTF-8\n",
-      "content: b\"<!DOCTYPE html>\\n<html class='v2' dir='ltr' lang='en-GB'>\\n<head>\\n<link href='https://www.blogger.com/static/v1/widgets/3566091532-css_bundle_v2.css' rel='stylesheet' type='text/css'/>\\n<meta content='wi\"\n",
-      "url: http://aozemi.blog.fc2.com/blog-category-465.html\n",
-      "content-type: text/html;charset=utf-8\n",
-      "content: b'<!DOCTYPE HTML PUBLIC \"-//W3C//DTD HTML 4.01 Transitional//EN\" \"http://www.w3.org/TR/html4/loose.dtd\">\\r\\n<html>\\r\\n<head>\\r\\n\\r\\n<script type=\"text/x-mathjax-config\">\\r\\n  MathJax.Hub.Config({ tex2jax: { inlin'\n",
-      "url: http://aperos-musique-blesle.com/gyslain/\n",
-      "content-type: text/html; charset=UTF-8\n",
-      "content: b'<!DOCTYPE html>\\r\\n<!--[if IE 7]>\\r\\n<html class=\"ie ie7\" dir=\"ltr\" lang=\"fr-FR\" prefix=\"og: https://ogp.me/ns#\">\\r\\n<![endif]-->\\r\\n<!--[if IE 8]>\\r\\n<html class=\"ie ie8\" dir=\"ltr\" lang=\"fr-FR\" prefix=\"og: htt'\n",
-      "url: http://apiros.hu/2-fok-magas-vrnyoms-kezels-s-tpllkozs-826257.php\n",
-      "content-type: text/html; charset=UTF-8\n",
-      "content: b'<!DOCTYPE html>\\r\\n<html prefix=\"og: http://ogp.me/ns#\">\\r\\n<head>\\r\\n<meta http-equiv=\"content-type\" content=\"text/html; charset=UTF-8\">\\r\\n<meta charset=\"UTF-8\">\\r\\n<meta name=\"viewport\" content=\"width=device'\n",
-      "url: http://apocalypseblogger.apocalypseradio.com/2019/02/apocalypse-radio-five-hundred-and_17.html\n",
-      "content-type: text/html; charset=UTF-8\n",
-      "content: b'<!DOCTYPE html PUBLIC \"-//W3C//DTD XHTML 1.0 Strict//EN\" \"http://www.w3.org/TR/xhtml1/DTD/xhtml1-strict.dtd\">\\n\\n \\n\\n<html xmlns=\"http://www.w3.org/1999/xhtml\" xml:lang=\"en\" lang=\"en\">\\n\\n<head> \\n\\n  <title'\n",
-      "url: http://apolloonline.ru/events/blagotvoritelnaya-baraholka/attachment/1619533193_875138_75/\n",
-      "content-type: text/html; charset=UTF-8\n",
-      "content: b'<!DOCTYPE html>\\n<html>\\n<head>\\n    <meta charset=\"UTF-8\" />\\n    <meta http-equiv=\"X-UA-Compatible\" content=\"\">\\n    <meta name=\"viewport\" content=\"width=device-width, user-scalable=no\">\\n    <link rel=\"s'\n",
-      "url: http://app.cm-pontadelgada.pt/895?geo_article_id=7300&list_of=nearby_list&page_articles=1&page_nearby_list=240&page_opinion=1&page_related=1&page_suggestions=180\n",
-      "content-type: text/html; charset=utf-8\n",
-      "content: b'<!DOCTYPE HTML>\\n<html lang=\"pt-PT\">\\n\\n<head>\\n  <title>Casa da M\\xc3\\xa3e de Deus | Visit Ponta Delgada</title>\\n  <link rel=\"stylesheet\" type=\"text/css\" href=\"/assets/wm-smile/stylesheets/frontoffice/mandator'\n",
-      "url: http://aquadina.com/hakone/category/19216/\n",
-      "content-type: text/html\n",
-      "content: b'<!DOCTYPE html>\\n<html lang=\"ja\">\\n<head>\\n<meta charset=\"utf-8\">\\n<title>\\xe4\\xbb\\x99\\xe7\\x9f\\xb3\\xe5\\x8e\\x9f\\xe3\\x81\\xae\\xe3\\x81\\x82\\xe3\\x82\\x93\\xe3\\x81\\xbf\\xe3\\x81\\xa4\\xef\\xbc\\x88\\xe5\\x85\\xa8\\xe5\\xb8\\xad\\xe7\\xa6\\x81\\xe7\\x85\\x99\\xe3\\x83\\xbb\\xe5\\x88\\x86\\xe7\\x85\\x99\\xef\\xbc\\x89\\xef\\xbc\\x881\\xe4\\xbb\\xb6\\xef\\xbc\\x89 [\\xe3\\x82\\xa2\\xe3\\x82\\xaf\\xe3\\x82\\xa2\\xe3\\x83\\x87\\xe3\\x82\\xa3\\xe3\\x83\\xbc\\xe3\\x83\\x8a\\xe7\\xae\\xb1\\xe6\\xa0\\xb9\\xe7\\x89\\x88]</title><meta name=\"description\" con'\n",
-      "url: http://aquanaut.com/bin/trg/aquanaut.com/clubs/DSA\n",
-      "content-type: text/html; charset=\"utf-8\"\n",
-      "content: b'<!doctype html>\\n<html>\\n<head>\\n<title>Dive Station Aquaventure Sdn Bhd</title>\\n<meta name=\"keywords\" content=\"Aquanaut, dive club, Dive Station Aquaventure Sdn Bhd\">\\n<meta name=\"description\" content=\"A'\n",
-      "url: http://araki298.blog109.fc2.com/blog-entry-1321.html\n",
-      "content-type: text/html;charset=utf-8\n",
-      "content: b'<?xml version=\"1.0\" encoding=\"EUC-JP\"?>\\r\\n<!DOCTYPE html PUBLIC \"-//W3C//DTD XHTML 1.0 Transitional//EN\" \"http://www.w3.org/TR/xhtml1/DTD/xhtml1-transitional.dtd\">\\r\\n<html xmlns=\"http://www.w3.org/1999/'\n",
-      "url: http://arch-group.org/projects/123\n",
-      "content-type: text/html; charset=utf-8\n",
-      "content: b'<!DOCTYPE html>\\n<html lang=\\'en\\'>\\n<head>\\n<meta charset=\\'utf-8\\'>\\n<meta content=\\'IE=Edge,chrome=1\\' http-equiv=\\'X-UA-Compatible\\'>\\n<meta content=\\'width=device-width\\' name=\\'viewport\\'>\\n<link href=\"/favicon.p'\n",
-      "url: http://architekciplus.pl/archwiel19.html\n",
-      "content-type: text/html\n",
-      "content: b'<!DOCTYPE html>\\r\\n<html class=\"no-js\" lang=\"en\">\\r\\n<head>\\r\\n<meta charset=\"utf-8\">\\r\\n<!--[if IE]> <meta http-equiv=\"X-UA-Compatible\" content=\"IE=edge\"> <![endif]-->\\r\\n<title>ARCHITEKCIplus</title>\\r\\n<meta n'\n",
-      "url: http://architektura.info.pl/2021/09/17/wanna-symbol-komfortu-w-twojej-lazience/\n",
-      "content-type: text/html; charset=UTF-8\n",
-      "content: b'<!DOCTYPE html><html lang=\"pl-PL\"><head><meta charset=\"UTF-8\"><meta name=\"viewport\" content=\"width=device-width, initial-scale=1\"><link rel=\"stylesheet\" media=\"print\" onload=\"this.onload=null;this.med'\n",
-      "url: http://archive.poppytalk.com/2011/10/art-tutorial-drink-up-these-kitchen.html?showComment=1320577585029\n",
-      "content-type: text/html; charset=UTF-8\n",
-      "content: b\"<!DOCTYPE html>\\n<html class='v2 item' dir='ltr' xmlns='http://www.w3.org/1999/xhtml' xmlns:b='http://www.google.com/2005/gml/b' xmlns:data='http://www.google.com/2005/gml/data' xmlns:expr='http://www.\"\n",
-      "url: http://archive.poppytalk.com/2012/02/6-fall-2012-fashion-week-must-haves.html?showComment=1329509673641\n",
-      "content-type: text/html; charset=UTF-8\n",
-      "content: b\"<!DOCTYPE html>\\n<html class='v2 item' dir='ltr' xmlns='http://www.w3.org/1999/xhtml' xmlns:b='http://www.google.com/2005/gml/b' xmlns:data='http://www.google.com/2005/gml/data' xmlns:expr='http://www.\"\n",
-      "url: http://archive.urbc.ru/3738-post3738.html\n",
-      "content-type: text/html\n",
-      "content: b'<!DOCTYPE html>\\r\\n<html lang=\"ru\">\\r\\n<head>\\r\\n<meta http-equiv=\"Content-Type\" content=\"text/html; charset=windows-1251\" />\\r\\n<title>\\xc5\\xea\\xe0\\xf2\\xe5\\xf0\\xe8\\xed\\xe1\\xf3\\xf0\\xe3\\xf1\\xea\\xe0\\xff \\xf4\\xe8\\xf0\\xec\\xe0 \\xab\\xca\\xee\\xed\\xf4\\xe8\\xbb \\xe2\\xee\\xe7\\xe3\\xeb\\xe0\\xe2\\xe8\\xf2 \\xed\\xee\\xe2\\xf3\\xfe \\xf0\\xee\\xf1\\xf1\\xe8\\xe9\\xf1\\xea\\xf3\\xfe \\xea\\xee\\xed\\xe4\\xe8\\xf2\\xe5\\xf0\\xf1\\xea\\xf3\\xfe \\xea'\n",
-      "url: http://archive.urbc.ru/4247-post4247.html\n",
-      "content-type: text/html\n",
-      "content: b'<!DOCTYPE html>\\r\\n<html lang=\"ru\">\\r\\n<head>\\r\\n<meta http-equiv=\"Content-Type\" content=\"text/html; charset=windows-1251\" />\\r\\n<title>\\xcf\\xe0\\xec\\xff\\xf2\\xed\\xfb\\xe5 \\xe4\\xe0\\xf2\\xfb &raquo; \\xc8\\xed\\xf4\\xee\\xf0\\xec\\xe0\\xf6\\xe8\\xee\\xed\\xed\\xee-\\xe0\\xed\\xe0\\xeb\\xe8\\xf2\\xe8\\xf7\\xe5\\xf1\\xea\\xee\\xe5 \\xe0\\xe3\\xe5\\xed\\xf2\\xf1\\xf2\\xe2\\xee \\xab\\xd3\\xf0\\xe0\\xeb\\xc1\\xe8\\xe7\\xed\\xe5\\xf1\\xca'\n",
-      "url: http://archive.wn.com/2004/01/02/1400/employment.html\n",
-      "content-type: text/html\n",
-      "content: b'<table border=\"0\" bgcolor=\"#ffffff\" cellpadding=\"4\" cellspacing=\"0\" width=\"100%\" color=\"#ffffff\"><tr><td><table border=\"0\" bgcolor=\"#d0d0d0\" cellpadding=\"2\" cellspacing=\"2\" width=\"100%\" color=\"#d0d0d0'\n",
-      "url: http://archive2016.muenchener-biennale.de/en/about-us/presenter/\n",
-      "content-type: text/html; charset=utf-8\n",
-      "content: b'\\n\\n\\n\\n\\n\\n\\t\\t\\n    \\n            \\n            \\n        \\n<!DOCTYPE html>\\n<!--[if IE 8]> \\t       <html class=\"no-js lt-ie9\" lang=\"en-GB\" > <![endif]-->\\n<!--[if gt IE 8]><!--> <html class=\"no-js\" lang=\"en-GB\" >'\n",
-      "url: http://archiwum.ciop.pl/20641.html\n",
-      "content-type: text/html; charset=iso-8859-2\n",
-      "content: b'<HTML>\\n<HEAD>\\n<META HTTP-EQUIV=Content-type CONTENT=\\'text/html; charset=iso-8859-2\\'>\\n<META NAME=\"keywords\" CONTENT=\"bhp, ha\\xb3as, noise control, konferencja, referaty\">\\n<META NAME=\"description\" CONTENT='\n",
-      "url: http://archsa.org/wp-content/uploads/2022/06/Nebraska_bishops.pdf\n",
-      "content-type: application/pdf\n",
-      "content: b'%PDF-1.5\\r%\\xe2\\xe3\\xcf\\xd3\\r\\n24 0 obj\\r<</Linearized 1/L 17959/O 26/E 7094/N 3/T 17645/H [ 473 178]>>\\rendobj\\r                   \\r\\n37 0 obj\\r<</DecodeParms<</Columns 4/Predictor 12>>/Filter/FlateDecode/ID[<5B43B2E217'\n",
-      "url: http://arcorusticon.com/\n",
-      "content-type: text/html\n",
-      "content: b'<!DOCTYPE html>\\n<!--[if lt IE 7]>      <html class=\"no-js lt-ie9 lt-ie8 lt-ie7\"> <![endif]-->\\n<!--[if IE 7]>         <html class=\"no-js lt-ie9 lt-ie8\"> <![endif]-->\\n<!--[if IE 8]>         <html class='\n",
-      "url: http://areso.eus/2016/03/page/3/\n",
-      "content-type: text/html; charset=UTF-8\n",
-      "content: b'<!doctype html>\\r\\n\\r\\n<!--[if lt IE 7]><html lang=\"eu\" prefix=\"og: http://ogp.me/ns#\" class=\"no-js lt-ie9 lt-ie8 lt-ie7\"><![endif]-->\\r\\n<!--[if (IE 7)&!(IEMobile)]><html lang=\"eu\" prefix=\"og: http://ogp.m'\n",
-      "url: http://argentina-anime.com/Tema-Fate-stay-night-Unlimited-Blade-Works--3994\n",
-      "content-type: text/html; charset=UTF-8\n",
-      "content: b'<!DOCTYPE html PUBLIC \"-//W3C//DTD XHTML 1.0 Transitional//EN\" \"http://www.w3.org/TR/xhtml1/DTD/xhtml1-transitional.dtd\"><!-- start: showthread -->\\n<html xml:lang=\"es\" lang=\"es\" xmlns=\"http://www.w3.o'\n",
-      "url: http://argonauta.pl/tag/przepowiednie/\n",
-      "content-type: text/html; charset=UTF-8\n",
-      "content: b'<!doctype html>\\r\\n<html lang=\"pl-PL\">\\r\\n<head>\\r\\n\\t<meta charset=\"UTF-8\">\\r\\n\\t<meta name=\"viewport\" content=\"width=device-width, initial-scale=1\">\\r\\n\\t<link rel=\"profile\" href=\"https://gmpg.org/xfn/11\">\\r\\n\\r\\n\\t<'\n",
-      "url: http://ariosto.ru/page/937\n",
-      "content-type: text/html; charset=UTF-8\n",
-      "content: b'<!DOCTYPE html PUBLIC \"-//W3C//DTD XHTML 1.0 Transitional//EN\" \"http://www.w3.org/TR/xhtml1/DTD/xhtml1-transitional.dtd\">\\r\\n<html xmlns=\"http://www.w3.org/1999/xhtml\" lang=\"ru-RU\">\\r\\n<head>\\r\\n<!--TB JS -'\n",
-      "url: http://arkadiahurt.pl/291-pedzle-maestro\n",
-      "content-type: text/html; charset=utf-8\n",
-      "content: b'<!DOCTYPE HTML> <!--[if lt IE 7]><html class=\"no-js lt-ie9 lt-ie8 lt-ie7\" lang=\"pl-pl\"><![endif]--> <!--[if IE 7]><html class=\"no-js lt-ie9 lt-ie8 ie7\" lang=\"pl-pl\"><![endif]--> <!--[if IE 8]><html cl'\n",
-      "url: http://arkfurnitures.com/product/tripple/\n",
-      "content-type: text/html; charset=UTF-8\n",
-      "content: b'<!DOCTYPE html>\\n<html lang=\"en\" class=\"no-js\">\\n\\n<head>\\n\\n<meta charset=\"UTF-8\" />\\n<link rel=\"alternate\" hreflang=\"en\" href=\"http://arkfurnitures.com/shop/\"/>\\n<title>TRIPPLE &#8211; ARK FURNITURE</title'\n",
-      "url: http://armorique.blog.fc2.com/blog-entry-4214.html\n",
-      "content-type: text/html;charset=utf-8\n",
-      "content: b'<?xml version=\"1.0\" encoding=\"utf-8\"?><!DOCTYPE html PUBLIC \"-//W3C//DTD XHTML 1.0 Transitional//EN\" \"http://www.w3.org/TR/xhtml1/DTD/xhtml1-transitional.dtd\">\\r\\n<html  dir=\"ltr\" xmlns=\"http://www.w3.o'\n",
-      "url: http://arseniev-eparhia.ru/inocheskiy-postrig/\n",
-      "content-type: text/html; charset=UTF-8\n",
-      "content: b'<!DOCTYPE html PUBLIC \"-//W3C//DTD XHTML 1.0 Transitional//EN\" \"http://www.w3.org/TR/xhtml1/DTD/xhtml1-transitional.dtd\"><html xmlns=\"http://www.w3.org/1999/xhtml\" lang=\"ru-RU\"><head profile=\"http://g'\n",
-      "url: http://art-exlibris.net/person/6396\n",
-      "content-type: text/html; charset=UTF-8\n",
-      "content: b'<!DOCTYPE html PUBLIC \"-//W3C//DTD XHTML 1.0 Transitional//EN\" \"http://www.w3.org/TR/xhtml1/DTD/xhtml1-transitional.dtd\">\\n<html xmlns=\"http://www.w3.org/1999/xhtml\">\\n<head>\\n<meta http-equiv=\"Content-T'\n",
-      "url: http://artefaccio.blogspot.com/2016/03/sleeping-beauty-turquoise-copper-wire.html\n",
-      "content-type: text/html; charset=UTF-8\n",
-      "content: b\"<!DOCTYPE html>\\n<html class='v2' dir='ltr' lang='en-GB'>\\n<head>\\n<link href='https://www.blogger.com/static/v1/widgets/3566091532-css_bundle_v2.css' rel='stylesheet' type='text/css'/>\\n<meta content='wi\"\n",
-      "url: http://artem.kolesalux.ru/diski-ls-flowforming-wheels.html\n",
-      "content-type: text/html; charset=UTF-8\n",
-      "content: b'<!DOCTYPE HTML PUBLIC \"-//W3C//DTD HTML 4.01 Transitional//EN\">\\n<html><head><title>\\xd0\\x94\\xd0\\xb8\\xd1\\x81\\xd0\\xba\\xd0\\xb8 LS FlowForming \\xd0\\xba\\xd1\\x83\\xd0\\xbf\\xd0\\xb8\\xd1\\x82\\xd1\\x8c \\xd0\\xb0\\xd0\\xb2\\xd1\\x82\\xd0\\xbe\\xd0\\xbc\\xd0\\xbe\\xd0\\xb1\\xd0\\xb8\\xd0\\xbb\\xd1\\x8c\\xd0\\xbd\\xd1\\x8b\\xd0\\xb5 \\xd0\\xba\\xd0\\xbe\\xd0\\xbb\\xd0\\xb5\\xd1\\x81\\xd0\\xbd\\xd1\\x8b\\xd0\\xb5 \\xd0\\xbb\\xd0\\xb8\\xd1\\x82\\xd1\\x8b\\xd0\\xb5 \\xd0\\xb4\\xd0\\xb8\\xd1\\x81\\xd0\\xba\\xd0\\xb8 \\xd0\\xa4\\xd0\\x9b\\xd0\\x9e\\xd0\\xa3 \\xd0\\xa4\\xd0'\n",
-      "url: http://articles.ivymag.org/ivysubs/moreabo0_memo.html\n",
-      "content-type: text/html; charset=UTF-8\n",
-      "content: b'<HTML>\\n  <HEAD><script type=\"text/javascript\">(window.NREUM||(NREUM={})).init={privacy:{cookies_enabled:true},ajax:{deny_list:[\"bam.nr-data.net\"]},distributed_tracing:{enabled:true}};(window.NREUM||(N'\n",
-      "url: http://artofthinkingsmart.com/2012/02/\n",
-      "content-type: text/html\n",
-      "content: b'<!DOCTYPE html>\\n<html lang=\"en\">\\n<head>\\n    <meta charset=\"UTF-8\">\\n    <title>Captcha</title>\\n    <link rel=\"stylesheet\"\\n          href=\"https://maxcdn.bootstrapcdn.com/bootstrap/3.3.6/css/bootstrap.m'\n",
-      "url: http://arvidlone.com/product/organization36762?id=985\n",
-      "content-type: text/html; charset=UTF-8\n",
-      "content: b'<!DOCTYPE html>\\r\\n<html class=\"no-js\" lang=\"ja\">\\r\\n<head>\\r\\n<meta http-equiv=\"X-UA-Compatible\" content=\"IE=edge\">\\r\\n<meta charset=\"UTF-8\">\\r\\n<meta name=\"viewport\" content=\"width=device-width, initial-scale'\n",
-      "url: http://arzone.ning.com/gifts/gift/list?screenName=2gp9st530pcpk\n",
-      "content-type: text/html; charset=UTF-8\n",
-      "content: b'<!DOCTYPE html>\\n<html lang=\"en\" xmlns:og=\"http://ogp.me/ns#\">\\n    <head data-layout-view=\"default\">\\n<script>\\n    window.dataLayer = window.dataLayer || [];\\n        </script>\\n<!-- Google Tag Manager --'\n",
-      "url: http://asa-kensetsu.com/related/\n",
-      "content-type: text/html; charset=UTF-8\n",
-      "content: b'<!DOCTYPE html>\\r\\n<!--[if IE]>\\r\\n<meta http-equiv=\"X-UA-Compatible\" content=\"IE=Edge\">\\r\\n<![endif]-->\\r\\n<html xmlns:fb=\"http://ogp.me/ns/fb#\" lang=\"ja\">\\r\\n<head>\\r\\n<meta charset=\"UTF-8\" />\\r\\n<title>\\xe9\\x96\\xa2\\xe9\\x80\\xa3\\xe5\\x9b\\xa3'\n",
-      "url: http://asahi25881939.blog.fc2.com/blog-date-20130524.html\n",
-      "content-type: text/html;charset=utf-8\n",
-      "content: b'<!DOCTYPE HTML PUBLIC \"-//W3C//DTD HTML 4.01 Transitional//EN\" \"http://www.w3.org/TR/html4/loose.dtd\">\\n<html>\\n<head>\\n<meta http-equiv=\"content-type\" content=\"text/html; charset=utf-8\">\\n<meta http-equi'\n",
-      "url: http://asahi25881939.blog.fc2.com/blog-date-20140211.html\n",
-      "content-type: text/html;charset=utf-8\n",
-      "content: b'<!DOCTYPE HTML PUBLIC \"-//W3C//DTD HTML 4.01 Transitional//EN\" \"http://www.w3.org/TR/html4/loose.dtd\">\\n<html>\\n<head>\\n<meta http-equiv=\"content-type\" content=\"text/html; charset=utf-8\">\\n<meta http-equi'\n",
-      "url: http://ascelin.com/kort-blond-kapsel-2022/\n",
-      "content-type: text/html; charset=UTF-8\n",
-      "content: b'<!DOCTYPE html>\\n<html xmlns=\"http://www.w3.org/1999/xhtml\" dir=\"ltr\" lang=\"nl\" prefix=\"og: https://ogp.me/ns#\">\\n<head>\\n<meta http-equiv=\"Content-Type\" content=\"text/html; charset=UTF-8\">\\n<meta name=\"v'\n",
-      "url: http://asianteenytubes.net/moviehd/phthisic-jav-academy-tsun-fucks-saturated-file-accommodations-off-out-be-required-of-one-s-mind-mendicant/index.html\n",
-      "content-type: text/html; charset=UTF-8\n",
-      "content: b'<!DOCTYPE html>\\n<html lang=\"en\">\\n<head> <meta name=\"referrer\" content=\"unsafe-url\">\\n<meta charset=\"utf-8\">\\n<meta name=\"viewport\" content=\"width=device-width, initial-scale=1\">\\n<title>Asian porn video<'\n",
-      "url: http://asienveracruz.blogspot.com/2014/07/busca-veracruz-ser-sede-del-congreso.html\n",
-      "content-type: text/html; charset=UTF-8\n",
-      "content: b\"<!DOCTYPE html>\\n<html class='v2' dir='ltr' lang='es'>\\n<head>\\n<link href='https://www.blogger.com/static/v1/widgets/3566091532-css_bundle_v2.css' rel='stylesheet' type='text/css'/>\\n<meta content='width\"\n",
-      "url: http://asmo45.ru/news/vargashinskij_okrug_vargashinskaja_pchjolka/2024-01-15-6873\n",
-      "content-type: text/html; charset=UTF-8\n",
-      "content: b'<html>\\n<head>\\n<meta http-equiv=\"content-type\" content=\"text/html; charset=UTF-8\">\\n<title>\\xd0\\xa1\\xd0\\xbe\\xd0\\xb2\\xd0\\xb5\\xd1\\x82 \\xd0\\xbc\\xd1\\x83\\xd0\\xbd\\xd0\\xb8\\xd1\\x86\\xd0\\xb8\\xd0\\xbf\\xd0\\xb0\\xd0\\xbb\\xd1\\x8c\\xd0\\xbd\\xd1\\x8b\\xd1\\x85 \\xd0\\xbe\\xd0\\xb1\\xd1\\x80\\xd0\\xb0\\xd0\\xb7\\xd0\\xbe\\xd0\\xb2\\xd0\\xb0\\xd0\\xbd\\xd0\\xb8\\xd0\\xb9 \\xd0\\x9a\\xd1\\x83\\xd1\\x80\\xd0\\xb3\\xd0\\xb0\\xd0\\xbd\\xd1\\x81\\xd0\\xba\\xd0\\xbe\\xd0\\xb9 \\xd0\\xbe\\xd0\\xb1\\xd0\\xbb\\xd0\\xb0\\xd1\\x81\\xd1\\x82\\xd0\\xb8 - \\xd0\\x9d\\xd0\\xbe\\xd0\\xb2\\xd0\\xbe\\xd1\\x81\\xd1\\x82'\n"
-     ]
-    }
-   ],
-   "execution_count": 26
-  },
-  {
-   "metadata": {
-    "ExecuteTime": {
-     "end_time": "2025-10-09T13:33:46.725140Z",
-     "start_time": "2025-10-09T13:33:46.704415Z"
-    }
-   },
-   "cell_type": "code",
-   "source": [
-    "warc_path = './TEST-000000.extracted.warc.gz'\n",
-    "dump_all_records(warc_path, limit=5)"
-   ],
-   "id": "d1d433956ce0f3fa",
-   "outputs": [
-    {
-     "name": "stdout",
-     "output_type": "stream",
-     "text": [
-      "url: https://turux.at/\n",
-      "content-type: text/html\n",
-      "content: b'<!DOCTYPE html PUBLIC \"-//W3C//DTD XHTML 1.0 Transitional//EN\" \"http://www.w3.org/TR/xhtml1/DTD/xhtml1-transitional.dtd\">\\n<html xmlns=\"http://www.w3.org/1999/xhtml\">\\n<head>\\n<meta http-equiv=\"Content-T'\n",
-      "url: http://turux.at/\n",
-      "content-type: text/html; charset=iso-8859-1\n",
-      "content: b'<!DOCTYPE HTML PUBLIC \"-//IETF//DTD HTML 2.0//EN\">\\n<html><head>\\n<title>302 Found</title>\\n</head><body>\\n<h1>Found</h1>\\n<p>The document has moved <a href=\"https://turux.at/\">here</a>.</p>\\n<hr>\\n<address>'\n"
-     ]
-    }
-   ],
-   "execution_count": 8
-  },
-  {
-   "metadata": {
-    "ExecuteTime": {
-     "end_time": "2025-10-09T13:39:50.153808Z",
-     "start_time": "2025-10-09T13:39:49.714987Z"
-    }
-   },
-   "cell_type": "code",
-   "source": [
-    "import os\n",
-    "import json\n",
-    "import pandas as pd\n",
-    "\n",
-    "cdxj_path = os.path.splitext(warc_path)[0] + '.cdxj'\n",
-    "objects = []\n",
-    "with open(cdxj_path, 'rt') as f:\n",
-    "    for line in f:\n",
-    "        surl, timestamp, json_dict = line.split(' ', 2)\n",
-    "        data = json.loads(json_dict)\n",
-    "        data.update({'surl': surl, 'timestamp': timestamp})\n",
-    "        print(surl, timestamp, data)\n",
-    "        objects.append(data)\n",
-    "\n",
-    "df = pd.DataFrame.from_records(objects)\n"
-   ],
-   "id": "92ec566d3f2fc08e",
-   "outputs": [
-    {
-     "name": "stdout",
-     "output_type": "stream",
-     "text": [
-      "at,turux)/ 20250911025500 {'url': 'https://turux.at/', 'mime': 'text/html', 'status': '200', 'digest': 'sha1:23AM7B43CHDLUEZTG2SLWB5DF25YJJTX', 'length': '1993', 'offset': '358', 'filename': 'TEST-000000.extracted.warc.gz', 'surl': 'at,turux)/', 'timestamp': '20250911025500'}\n",
-      "at,turux)/ 20250911030852 {'url': 'http://turux.at/', 'mime': 'text/html', 'status': '302', 'digest': 'sha1:MPFPQGOFANQKEGZJCIZMGJPBYSFEYW6V', 'length': '813', 'offset': '2351', 'filename': 'TEST-000000.extracted.warc.gz', 'surl': 'at,turux)/', 'timestamp': '20250911030852'}\n"
-     ]
-    }
-   ],
-   "execution_count": 15
-  },
-  {
-   "metadata": {
-    "ExecuteTime": {
-     "end_time": "2025-10-09T13:39:53.792972Z",
-     "start_time": "2025-10-09T13:39:53.754838Z"
-    }
-   },
-   "cell_type": "code",
-   "source": "df",
-   "id": "7f698d6e6ee84795",
-   "outputs": [
-    {
-     "data": {
-      "text/plain": [
-       "                 url       mime status                                 digest  \\\n",
-       "0  https://turux.at/  text/html    200  sha1:23AM7B43CHDLUEZTG2SLWB5DF25YJJTX   \n",
-       "1   http://turux.at/  text/html    302  sha1:MPFPQGOFANQKEGZJCIZMGJPBYSFEYW6V   \n",
-       "\n",
-       "  length offset                       filename        surl       timestamp  \n",
-       "0   1993    358  TEST-000000.extracted.warc.gz  at,turux)/  20250911025500  \n",
-       "1    813   2351  TEST-000000.extracted.warc.gz  at,turux)/  20250911030852  "
-      ],
-      "text/html": [
-       "<div>\n",
-       "<style scoped>\n",
-       "    .dataframe tbody tr th:only-of-type {\n",
-       "        vertical-align: middle;\n",
-       "    }\n",
-       "\n",
-       "    .dataframe tbody tr th {\n",
-       "        vertical-align: top;\n",
-       "    }\n",
-       "\n",
-       "    .dataframe thead th {\n",
-       "        text-align: right;\n",
-       "    }\n",
-       "</style>\n",
-       "<table border=\"1\" class=\"dataframe\">\n",
-       "  <thead>\n",
-       "    <tr style=\"text-align: right;\">\n",
-       "      <th></th>\n",
-       "      <th>url</th>\n",
-       "      <th>mime</th>\n",
-       "      <th>status</th>\n",
-       "      <th>digest</th>\n",
-       "      <th>length</th>\n",
-       "      <th>offset</th>\n",
-       "      <th>filename</th>\n",
-       "      <th>surl</th>\n",
-       "      <th>timestamp</th>\n",
-       "    </tr>\n",
-       "  </thead>\n",
-       "  <tbody>\n",
-       "    <tr>\n",
-       "      <th>0</th>\n",
-       "      <td>https://turux.at/</td>\n",
-       "      <td>text/html</td>\n",
-       "      <td>200</td>\n",
-       "      <td>sha1:23AM7B43CHDLUEZTG2SLWB5DF25YJJTX</td>\n",
-       "      <td>1993</td>\n",
-       "      <td>358</td>\n",
-       "      <td>TEST-000000.extracted.warc.gz</td>\n",
-       "      <td>at,turux)/</td>\n",
-       "      <td>20250911025500</td>\n",
-       "    </tr>\n",
-       "    <tr>\n",
-       "      <th>1</th>\n",
-       "      <td>http://turux.at/</td>\n",
-       "      <td>text/html</td>\n",
-       "      <td>302</td>\n",
-       "      <td>sha1:MPFPQGOFANQKEGZJCIZMGJPBYSFEYW6V</td>\n",
-       "      <td>813</td>\n",
-       "      <td>2351</td>\n",
-       "      <td>TEST-000000.extracted.warc.gz</td>\n",
-       "      <td>at,turux)/</td>\n",
-       "      <td>20250911030852</td>\n",
-       "    </tr>\n",
-       "  </tbody>\n",
-       "</table>\n",
-       "</div>"
-      ]
-     },
-     "execution_count": 16,
-     "metadata": {},
-     "output_type": "execute_result"
-    }
-   ],
-   "execution_count": 16
-  },
-  {
-   "metadata": {
-    "ExecuteTime": {
-     "end_time": "2025-10-09T13:14:05.351832Z",
-     "start_time": "2025-10-09T13:14:05.333388Z"
-    }
-   },
-   "cell_type": "code",
-   "source": "r = get_first_record(warc_path)",
-   "id": "2b296a5741ca8045",
-   "outputs": [],
-   "execution_count": 10
-  },
-  {
-   "metadata": {
-    "ExecuteTime": {
-     "end_time": "2025-10-09T13:15:07.506053Z",
-     "start_time": "2025-10-09T13:15:07.485847Z"
-    }
-   },
-   "cell_type": "code",
-   "source": "r.content_stream().read()",
-   "id": "e7b2171bcad517f7",
-   "outputs": [
-    {
-     "data": {
-      "text/plain": [
-       "b'<meta http-equiv=Refresh content=0;url=/web/error.php?id=2>'"
-      ]
-     },
-     "execution_count": 18,
-     "metadata": {},
-     "output_type": "execute_result"
-    }
-   ],
-   "execution_count": 18
-  },
-  {
-   "metadata": {},
-   "cell_type": "code",
-   "outputs": [],
-   "execution_count": null,
-   "source": "",
-   "id": "f7293efd120ac1b4"
-  }
- ],
- "metadata": {
-  "kernelspec": {
-   "display_name": "Python 3",
-   "language": "python",
-   "name": "python3"
-  },
-  "language_info": {
-   "codemirror_mode": {
-    "name": "ipython",
-    "version": 2
-   },
-   "file_extension": ".py",
-   "mimetype": "text/x-python",
-   "name": "python",
-   "nbconvert_exporter": "python",
-   "pygments_lexer": "ipython2",
-   "version": "2.7.6"
-  }
- },
- "nbformat": 4,
- "nbformat_minor": 5
-}

From 3f2a5d2fe27af3739837be94f508e0655dc4d474 Mon Sep 17 00:00:00 2001
From: Damian Stewart <ot@damianstewart.com>
Date: Wed, 15 Oct 2025 15:20:57 +0200
Subject: [PATCH 05/22] expand and clarify CDXT section

---
 Makefile  |  4 ++--
 README.md | 26 ++++++++++++++++++++------
 2 files changed, 22 insertions(+), 8 deletions(-)

diff --git a/Makefile b/Makefile
index 3e58cb7..1c7df3c 100644
--- a/Makefile
+++ b/Makefile
@@ -41,8 +41,8 @@ cdx_toolkit:
 	@echo
 	@echo cleanup previous work
 	rm -f TEST-000000.extracted.warc.gz
-	@echo extract the content from the commoncrawl s3 bucket, using the timestamp from above
-	cdxt --limit 1 --crawl CC-MAIN-2024-22 --from 20240518015810 --to 20240518015810 warc an.wikipedia.org/wiki/Escopete
+	@echo retrieve the content from the commoncrawl s3 bucket
+	cdxt --limit 1 --crawl CC-MAIN-2024-22 warc an.wikipedia.org/wiki/Escopete
 	@echo
 	@echo index this new warc
 	cdxj-indexer TEST-000000.extracted.warc.gz  > TEST-000000.extracted.warc.cdxj
diff --git a/README.md b/README.md
index 832cfee..59552e1 100644
--- a/README.md
+++ b/README.md
@@ -1,6 +1,6 @@
 # Whirlwind Tour of Common Crawl's Datasets using Python
 
-The Common Crawl corpus contains petabytes of crawl data, including raw web page data, metadata extracts, and text extracts. Common Crawl's data storage is a little complicated, as you might expect for such a large and rich dataset. We make our crawl data available in a variety of formats (WARC, WET, WAT) and we also have two index files of the crawled webpages: CDXJ and columnar.
+The Common Crawl corpus contains petabytes of crawl data, including raw web page data, metadata, and parsed text. Common Crawl's data storage is a little complicated, as you might expect for such a large and rich dataset. We make our crawl data available in a variety of formats (WARC, WET, WAT) and we also have two index files of the crawled webpages: CDXJ and columnar.
 ```mermaid
 flowchart TD
     WEB["WEB"] -- crawler --> cc["Common Crawl"]
@@ -350,14 +350,14 @@ The output looks like this:
   <summary>Click to view output</summary>
 
 ```
-look up this capture in the comoncrawl cdx index for CC-MAIN-2024-22, returning only the first match
+lookup captures for the given url in the comoncrawl cdx index for CC-MAIN-2024-22, returning only the first match
 cdxt --limit 1 --crawl CC-MAIN-2024-22 iter an.wikipedia.org/wiki/Escopete
 status 200, timestamp 20240518015810, url https://an.wikipedia.org/wiki/Escopete
 
 cleanup previous work
 rm -f TEST-000000.extracted.warc.gz
-extract the content from the commoncrawl s3 bucket, using the timestamp from above
-cdxt --limit 1 --crawl CC-MAIN-2024-22 --from 20240518015810 --to 20240518015810 warc an.wikipedia.org/wiki/Escopete
+retrieve the content from the commoncrawl s3 bucket
+cdxt --limit 1 --crawl CC-MAIN-2024-22 warc an.wikipedia.org/wiki/Escopete
 
 index this new warc
 cdxj-indexer TEST-000000.extracted.warc.gz  > TEST-000000.extracted.warc.cdxj
@@ -373,9 +373,23 @@ $ python ./warcio-iterator.py TEST-000000.extracted.warc.gz
 
 </details>
 
-We look up the capture using the `cdxt` commands by specifying the exact URL (`an.wikipedia.org/wiki/Escopete`) and the date of its capture (20240518015810). The output is the WARC file `TEST-000000.extracted.warc.gz` which contains a `warcinfo` record explaining what the WARC is, followed by the `response` record we requested. The Makefile target then runs `cdxj-indexer` on this new WARC to make a CDXJ index of it as in Task 3, and finally iterates over the WARC using `warcio-iterator.py` as in Task 2.
+There's a lot going on here so let's unpack it a little.
 
-If you dig into cdx_toolkit's code, you'll find that it is using the offset and length of the WARC record, as returned by the CDX index query, to make a HTTP byte range request to S3 to download the single WARC record we want. It only downloads the response WARC record because our CDX index only has the response records indexed.
+#### Check that the crawl has a record for the page we are interested in
+
+We check for capture results using the `cdxt` command `iter`, specifying the exact URL `an.wikipedia.org/wiki/Escopete` and the crawl identifier `CC-MAIN-2024-22`. The result of this tells us that the crawl successfuly fetched this page at timestamp `20240518015810`.
+* You can try removing the `--limit 1` flag and/or replacing `--crawl CC-MAIN-2024-22` with `--cc`, which will return more results reflecting more times when this URL was crawled. 
+* You can also use `--from <timestamp>` and `--to <timestamp>` to restrict the time range. This can even be used to pinpoint an exact record. For example, `--from 20240518015810 --to 20240518015810` will only ever return the record that we've been looking at elsewhere in the whirlwind tour.
+
+#### Retrieve the fetched content as WARC
+
+Next, we use the `cdxt` command `warc` to retrieve the content and save it locally as a new WARC file, again specifying the exact URL and crawl identifier. This creates the WARC file `TEST-000000.extracted.warc.gz` which contains a `warcinfo` record explaining what the WARC is, followed by the `response` record we requested. 
+* If you dig into cdx_toolkit's code, you'll find that it is using the offset and length of the WARC record (as returned by the CDX index query) to make a HTTP byte range request to S3 that isolates and returns just the single record we want from the full file. It only downloads the response WARC record because our CDX index only has the response records indexed.
+* By default `cdxt` avoids overwriting existing files by automatically incrementing the counter in the filename. If you run this again without deleting `TEST-000000.extracted.warc.gz`, the data will be written again to a new file `TEST-000001.extracted.warc.gz`.
+
+### Indexing the WARC and viewing its contents
+
+Next, we run `cdxj-indexer` on this new WARC to make a CDXJ index of it as in Task 3, and finally iterates over the WARC using `warcio-iterator.py` as in Task 2.
 
 ## Task 7: Find the right part of the columnar index 
 

From 47b6ac97188179b539902d0a32b4e61e040ac444 Mon Sep 17 00:00:00 2001
From: Damian Stewart <ot@damianstewart.com>
Date: Wed, 15 Oct 2025 15:26:21 +0200
Subject: [PATCH 06/22] add note about wildcards

---
 README.md | 1 +
 1 file changed, 1 insertion(+)

diff --git a/README.md b/README.md
index 59552e1..e64bce5 100644
--- a/README.md
+++ b/README.md
@@ -380,6 +380,7 @@ There's a lot going on here so let's unpack it a little.
 We check for capture results using the `cdxt` command `iter`, specifying the exact URL `an.wikipedia.org/wiki/Escopete` and the crawl identifier `CC-MAIN-2024-22`. The result of this tells us that the crawl successfuly fetched this page at timestamp `20240518015810`.
 * You can try removing the `--limit 1` flag and/or replacing `--crawl CC-MAIN-2024-22` with `--cc`, which will return more results reflecting more times when this URL was crawled. 
 * You can also use `--from <timestamp>` and `--to <timestamp>` to restrict the time range. This can even be used to pinpoint an exact record. For example, `--from 20240518015810 --to 20240518015810` will only ever return the record that we've been looking at elsewhere in the whirlwind tour.
+* URLs may be specified with wildcards to return even more results - `"an.wikipedia.org/wiki/Escop*"` matches `an.wikipedia.org/wiki/Escopulión` and `an.wikipedia.org/wiki/Escopete`.
 
 #### Retrieve the fetched content as WARC
 

From 059a3947b51838ff5ff682fd302741841c8501ee Mon Sep 17 00:00:00 2001
From: Damian Stewart <ot@damianstewart.com>
Date: Wed, 15 Oct 2025 15:31:18 +0200
Subject: [PATCH 07/22] polish task 6

---
 README.md | 5 +++--
 1 file changed, 3 insertions(+), 2 deletions(-)

diff --git a/README.md b/README.md
index e64bce5..8622840 100644
--- a/README.md
+++ b/README.md
@@ -379,14 +379,15 @@ There's a lot going on here so let's unpack it a little.
 
 We check for capture results using the `cdxt` command `iter`, specifying the exact URL `an.wikipedia.org/wiki/Escopete` and the crawl identifier `CC-MAIN-2024-22`. The result of this tells us that the crawl successfuly fetched this page at timestamp `20240518015810`.
 * You can try removing the `--limit 1` flag and/or replacing `--crawl CC-MAIN-2024-22` with `--cc`, which will return more results reflecting more times when this URL was crawled. 
-* You can also use `--from <timestamp>` and `--to <timestamp>` to restrict the time range. This can even be used to pinpoint an exact record. For example, `--from 20240518015810 --to 20240518015810` will only ever return the record that we've been looking at elsewhere in the whirlwind tour.
-* URLs may be specified with wildcards to return even more results - `"an.wikipedia.org/wiki/Escop*"` matches `an.wikipedia.org/wiki/Escopulión` and `an.wikipedia.org/wiki/Escopete`.
+* You can also use `--from <timestamp>` and `--to <timestamp>` to restrict the time range when the URL was crawled. This can even be used to pinpoint an exact record — for example, `--from 20240518015810 --to 20240518015810` will only ever return the record that we've been looking at elsewhere in this tutorial.
+* URLs may be specified with wildcards to return even more results: `"an.wikipedia.org/wiki/Escop*"` matches `an.wikipedia.org/wiki/Escopulión` and `an.wikipedia.org/wiki/Escopete`.
 
 #### Retrieve the fetched content as WARC
 
 Next, we use the `cdxt` command `warc` to retrieve the content and save it locally as a new WARC file, again specifying the exact URL and crawl identifier. This creates the WARC file `TEST-000000.extracted.warc.gz` which contains a `warcinfo` record explaining what the WARC is, followed by the `response` record we requested. 
 * If you dig into cdx_toolkit's code, you'll find that it is using the offset and length of the WARC record (as returned by the CDX index query) to make a HTTP byte range request to S3 that isolates and returns just the single record we want from the full file. It only downloads the response WARC record because our CDX index only has the response records indexed.
 * By default `cdxt` avoids overwriting existing files by automatically incrementing the counter in the filename. If you run this again without deleting `TEST-000000.extracted.warc.gz`, the data will be written again to a new file `TEST-000001.extracted.warc.gz`.
+* Limit, timestamp, and crawl index args, as well as URL wildcards, work as for `iter`.
 
 ### Indexing the WARC and viewing its contents
 

From 38758e0940815354ad36ac6c2f94eb4ce90de03f Mon Sep 17 00:00:00 2001
From: Damian Stewart <ot@damianstewart.com>
Date: Wed, 15 Oct 2025 15:37:09 +0200
Subject: [PATCH 08/22] revert file

---
 CC-MAIN-2024-22.warc.paths.gz | Bin 844 -> 817 bytes
 1 file changed, 0 insertions(+), 0 deletions(-)

diff --git a/CC-MAIN-2024-22.warc.paths.gz b/CC-MAIN-2024-22.warc.paths.gz
index 4099c937498315ba83082f995ada7387e218c4db..0ff536d75299e54bb5edb342fa040d3a0743fadb 100644
GIT binary patch
delta 19
YcmX@ZwvmlXzMF#q1elmNs;V;s04W*+O8@`>

delta 46
zcmdnUc7{z=zMF&NMgH>)24-hxU0+8}KV2gOBNJUCBfav(qGY{-#FC6+hK*e6%m7J#
B4SoOs


From f96361d05ac6432f380952adf10b87df71ddca12 Mon Sep 17 00:00:00 2001
From: Damian Stewart <ot@damianstewart.com>
Date: Wed, 15 Oct 2025 15:43:43 +0200
Subject: [PATCH 09/22] delete stray #

---
 README.md | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/README.md b/README.md
index 8622840..a3f78a9 100644
--- a/README.md
+++ b/README.md
@@ -365,7 +365,7 @@ cat TEST-000000.extracted.warc.cdxj
 org,wikipedia,an)/wiki/escopete 20240518015810 {"url": "https://an.wikipedia.org/wiki/Escopete", "mime": "text/html", "status": "200", "digest": "sha1:RY7PLBUFQNI2FFV5FTUQK72W6SNPXLQU", "length": "17455", "offset": "406", "filename": "TEST-000000.extracted.warc.gz"}
 
 iterate this new warc
-$ python ./warcio-iterator.py TEST-000000.extracted.warc.gz
+python ./warcio-iterator.py TEST-000000.extracted.warc.gz
   WARC-Type: warcinfo
   WARC-Type: response
     WARC-Target-URI https://an.wikipedia.org/wiki/Escopete

From e5086d4b1d76987c731e5850834bbfe090a26b09 Mon Sep 17 00:00:00 2001
From: Damian Stewart <ot@damianstewart.com>
Date: Wed, 15 Oct 2025 15:52:59 +0200
Subject: [PATCH 10/22] polish

---
 README.md | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/README.md b/README.md
index a3f78a9..4b0ea11 100644
--- a/README.md
+++ b/README.md
@@ -391,7 +391,7 @@ Next, we use the `cdxt` command `warc` to retrieve the content and save it local
 
 ### Indexing the WARC and viewing its contents
 
-Next, we run `cdxj-indexer` on this new WARC to make a CDXJ index of it as in Task 3, and finally iterates over the WARC using `warcio-iterator.py` as in Task 2.
+Finally, we run `cdxj-indexer` on this new WARC to make a CDXJ index of it as in Task 3, and then iterate over the WARC using `warcio-iterator.py` as in Task 2.
 
 ## Task 7: Find the right part of the columnar index 
 

From 57351b663edc929708a5fb09d2f8d46e568c4c64 Mon Sep 17 00:00:00 2001
From: Damian <d@damianstewart.com>
Date: Wed, 15 Oct 2025 14:39:54 +0200
Subject: [PATCH 11/22] polish cdx_toolkit example

---
 Makefile | 8 ++++----
 1 file changed, 4 insertions(+), 4 deletions(-)

diff --git a/Makefile b/Makefile
index 8d6d7a4..3e58cb7 100644
--- a/Makefile
+++ b/Makefile
@@ -36,12 +36,12 @@ extract:
 	@echo "hint: python -m json.tool extraction.json"
 
 cdx_toolkit:
-	@echo look up this capture in the comoncrawl cdx index
-	#cdxt --cc --from 20240518015810 --to 20240518015810 iter an.wikipedia.org/wiki/Escopete
-	cdxt --limit 1 --crawl CC-MAIN-2024-22 --from 20240518015810 --to 20240518015810 iter an.wikipedia.org/wiki/Escopete
+	@echo look up this capture in the comoncrawl cdx index for CC-MAIN-2024-22, returning only the first match
+	cdxt --limit 1 --crawl CC-MAIN-2024-22 iter an.wikipedia.org/wiki/Escopete
 	@echo
-	@echo extract the content from the commoncrawl s3 bucket
+	@echo cleanup previous work
 	rm -f TEST-000000.extracted.warc.gz
+	@echo extract the content from the commoncrawl s3 bucket, using the timestamp from above
 	cdxt --limit 1 --crawl CC-MAIN-2024-22 --from 20240518015810 --to 20240518015810 warc an.wikipedia.org/wiki/Escopete
 	@echo
 	@echo index this new warc

From 2e964172ee32a49d180fc7b0fefb221e6893ac3b Mon Sep 17 00:00:00 2001
From: Damian Stewart <ot@damianstewart.com>
Date: Wed, 15 Oct 2025 14:51:37 +0200
Subject: [PATCH 12/22] wip edits 2

---
 CC-MAIN-2024-22.warc.paths.gz      | Bin 817 -> 844 bytes
 README.md                          |  11 +-
 notebooks/warcio_experiments.ipynb | 923 +++++++++++++++++++++++++++++
 3 files changed, 929 insertions(+), 5 deletions(-)
 create mode 100644 notebooks/warcio_experiments.ipynb

diff --git a/CC-MAIN-2024-22.warc.paths.gz b/CC-MAIN-2024-22.warc.paths.gz
index 0ff536d75299e54bb5edb342fa040d3a0743fadb..4099c937498315ba83082f995ada7387e218c4db 100644
GIT binary patch
delta 46
zcmdnUc7{z=zMF&NMgH>)24-hxU0+8}KV2gOBNJUCBfav(qGY{-#FC6+hK*e6%m7J#
B4SoOs

delta 19
YcmX@ZwvmlXzMF#q1elmNs;V;s04W*+O8@`>

diff --git a/README.md b/README.md
index d6d1d70..d64c668 100644
--- a/README.md
+++ b/README.md
@@ -350,18 +350,19 @@ The output looks like this:
   <summary>Click to view output</summary>
 
 ```
-look up this capture in the comoncrawl cdx index for CC-MAIN-2024-22, returning only the first match:
-cdxt --limit 1 --crawl CC-MAIN-2024-22 --from 20240518015810 --to 20240518015810 iter an.wikipedia.org/wiki/Escopete
+look up this capture in the comoncrawl cdx index for CC-MAIN-2024-22, returning only the first match
+cdxt --limit 1 --crawl CC-MAIN-2024-22 iter an.wikipedia.org/wiki/Escopete
 status 200, timestamp 20240518015810, url https://an.wikipedia.org/wiki/Escopete
 
-extract the content from the commoncrawl s3 bucket
+cleanup previous work
 rm -f TEST-000000.extracted.warc.gz
-cdxt --cc --from 20240518015810 --to 20240518015810 warc an.wikipedia.org/wiki/Escopete
+extract the content from the commoncrawl s3 bucket, using the timestamp from above
+cdxt --limit 1 --crawl CC-MAIN-2024-22 --from 20240518015810 --to 20240518015810 warc an.wikipedia.org/wiki/Escopete
 
 index this new warc
 cdxj-indexer TEST-000000.extracted.warc.gz  > TEST-000000.extracted.warc.cdxj
 cat TEST-000000.extracted.warc.cdxj
-org,wikipedia,an)/wiki/escopete 20240518015810 {"url": "https://an.wikipedia.org/wiki/Escopete", "mime": "text/html", "status": "200", "digest": "sha1:RY7PLBUFQNI2FFV5FTUQK72W6SNPXLQU", "length": "17455", "offset": "379", "filename": "TEST-000000.extracted.warc.gz"}
+org,wikipedia,an)/wiki/escopete 20240518015810 {"url": "https://an.wikipedia.org/wiki/Escopete", "mime": "text/html", "status": "200", "digest": "sha1:RY7PLBUFQNI2FFV5FTUQK72W6SNPXLQU", "length": "17455", "offset": "406", "filename": "TEST-000000.extracted.warc.gz"}
 
 iterate this new warc
 python ./warcio-iterator.py TEST-000000.extracted.warc.gz
diff --git a/notebooks/warcio_experiments.ipynb b/notebooks/warcio_experiments.ipynb
new file mode 100644
index 0000000..c51d87a
--- /dev/null
+++ b/notebooks/warcio_experiments.ipynb
@@ -0,0 +1,923 @@
+{
+ "cells": [
+  {
+   "metadata": {
+    "ExecuteTime": {
+     "end_time": "2025-10-09T13:33:27.910213Z",
+     "start_time": "2025-10-09T13:33:27.895153Z"
+    }
+   },
+   "cell_type": "code",
+   "source": [
+    "%load_ext autoreload\n",
+    "%autoreload 2"
+   ],
+   "id": "f142ae2305e8e09d",
+   "outputs": [],
+   "execution_count": 2
+  },
+  {
+   "cell_type": "code",
+   "id": "initial_id",
+   "metadata": {
+    "collapsed": true,
+    "ExecuteTime": {
+     "end_time": "2025-10-09T13:33:28.691992Z",
+     "start_time": "2025-10-09T13:33:28.678002Z"
+    }
+   },
+   "source": "from warcio.archiveiterator import ArchiveIterator\n",
+   "outputs": [],
+   "execution_count": 3
+  },
+  {
+   "metadata": {
+    "ExecuteTime": {
+     "end_time": "2025-10-09T13:10:59.883851Z",
+     "start_time": "2025-10-09T13:10:59.857364Z"
+    }
+   },
+   "cell_type": "code",
+   "source": "warc_path = \"/home/cc-pds/commoncrawl/crawl-data/CC-MAIN-2024-10/segments/1707947473347.0/warc/CC-MAIN-20240220211055-20240221001055-00101.warc.gz\"",
+   "id": "88a4052768f17978",
+   "outputs": [],
+   "execution_count": 5
+  },
+  {
+   "metadata": {
+    "ExecuteTime": {
+     "end_time": "2025-10-09T13:33:31.045128Z",
+     "start_time": "2025-10-09T13:33:31.022226Z"
+    }
+   },
+   "cell_type": "code",
+   "source": [
+    "\n",
+    "def dump_all_records(warc_path, limit: int=5):\n",
+    "    count = 0\n",
+    "    with open(warc_path, \"rb\") as f:\n",
+    "        for record in ArchiveIterator(f):\n",
+    "            if record.rec_type == \"response\":\n",
+    "                #print(record.rec_headers)\n",
+    "                print(\"url:\", record.rec_headers.get_header(\"WARC-Target-URI\"))\n",
+    "                print(\"content-type:\", record.http_headers.get_header(\"Content-Type\"))\n",
+    "                content = record.content_stream().read()\n",
+    "                print(\"content:\", content[:200])\n",
+    "                count += 1\n",
+    "                if count >= limit:\n",
+    "                    break\n",
+    "\n",
+    "def get_first_record(warc_path):\n",
+    "    with open(warc_path, \"rb\") as f:\n",
+    "        for record in ArchiveIterator(f):\n",
+    "            if record.rec_type == \"response\":\n",
+    "                return record"
+   ],
+   "id": "72d21cc15eb4b1c0",
+   "outputs": [],
+   "execution_count": 4
+  },
+  {
+   "metadata": {
+    "ExecuteTime": {
+     "end_time": "2025-10-09T13:19:25.393165Z",
+     "start_time": "2025-10-09T13:19:24.977645Z"
+    }
+   },
+   "cell_type": "code",
+   "source": "dump_all_records(warc_path, limit=200)",
+   "id": "16d1afcec0c6de96",
+   "outputs": [
+    {
+     "name": "stdout",
+     "output_type": "stream",
+     "text": [
+      "url: http://020zxdq.baiwanx.com.cn/?user=020zxdq\n",
+      "content-type: text/html\n",
+      "content: b'<meta http-equiv=Refresh content=0;url=/web/error.php?id=2>'\n",
+      "url: http://04.ma/2017/05/05/%D8%A7%D9%84%D9%81%D8%B1%D8%A7%D8%B4%D8%A9-%D8%AF%D9%8A%D8%A7%D9%84-%D9%82%D9%8A%D8%B3%D8%A7%D8%B1%D9%8A%D8%A9-%D8%B3%D8%A8%D8%A7%D8%AA%D8%A9-%D8%AF%D8%A7%D8%B1%D9%88-%D9%88%D9%82%D9%81%D8%A9-%D9%82/\n",
+      "content-type: text/html; charset=UTF-8\n",
+      "content: b'<!DOCTYPE html>\\n<html dir=\"rtl\" lang=\"ar\" prefix=\"og: http://ogp.me/ns#\">\\n<head>\\n\\n<script async src=\"https://pagead2.googlesyndication.com/pagead/js/adsbygoogle.js?client=ca-pub-9575065054627750\"\\n    '\n",
+      "url: http://0qc.juzdani.com/site-data/hnstmjzxh/html/zfjx1/index.html\n",
+      "content-type: text/html; charset=utf-8\n",
+      "content: b'<!DOCTYPE html>\\n<html lang=\"zh-CN\"><head>\\n<meta name=\"applicable-device\" content=\"pc,mobile\"/> \\n  <meta charset=\"utf-8\"/> \\n  <meta content=\"width=device-width,initial-scale=1.0,user-scalabel=0, user-s'\n",
+      "url: http://1001kick.com/4140/\n",
+      "content-type: text/html; charset=UTF-8\n",
+      "content: b'<!DOCTYPE html>\\n<html class=\"no-js\" prefix=\"og: http://ogp.me/ns#\"><!--<![endif]--><head>\\n  <!--[if IE]><meta http-equiv=\"X-UA-Compatible\" content=\"IE=edge\"><![endif]-->\\n  <!--[if lt IE 8]>\\n  <script '\n",
+      "url: http://118184.webhosting44.1blu.de/wolf/famfo/i1373.html\n",
+      "content-type: text/html\n",
+      "content: b'<!DOCTYPE HTML PUBLIC \"-//W3C//DTD HTML 4.01 Transitional//EN\">\\r\\n<html>\\r\\n<head>\\r\\n<meta http-equiv=\"Content-Type\" content=\"text/html; charset=windows-1252\" />\\r\\n<meta name=\"KEYWORDS\" content=\"Genealogie'\n",
+      "url: http://1438195.tdne869.com/index.phtml?PUT=A_SORT&CHANNEL=&SORT=R7&FID=1438195\n",
+      "content-type: text/html; charset=Big5\n",
+      "content: b'<html><head><title>173live\\xac\\xfc\\xa4k\\xbcv\\xad\\xb5live\\xa8q - \\xa4@\\xb9\\xef\\xa4@\\xc2I\\xbc\\xc6::\\xa5\\xd1\\xb0\\xaa\\xa6\\xdc\\xa7C\\xb1\\xc6\\xa7\\xc7 </title><meta http-equiv=\"Content-Language\" content=\"zh-tw\"><meta http-equiv=content-type content=\"text/html; charset=big5\"><meta na'\n",
+      "url: http://170501.afg054.com/index.phtml?PUT=a_show&AID=193096&FID=170501&R2=&CHANNEL=\n",
+      "content-type: text/html; charset=Big5\n",
+      "content: b'<html><head><title>\\xaa\\xf7\\xb2~\\xb1\\xf6\\xb5\\xf8\\xb0T,\\xa7K\\xb6O\\xa6b\\xbdu\\xb5\\xf8\\xc0W\\xb2\\xe1\\xa4\\xd1</title><meta http-equiv=content-type content=\"text/html; charset=big5\">\\n<meta name=\"Keywords\" content=\"\\xa7K\\xb6O\\xa6h\\xa4H\\xb5\\xf8\\xc0W\\xba\\xf4\\xaf\\xb8 ,\\xa5x\\xc6W\\xb5\\xb7\\xc4\\xfb\\xac\\xfc\\xa4k\\xbcg\\xafu ,\\xa9t\\xa8k\\xb9\\xe8\\xa4k\\xbd\\xcd\\xa4\\xdf\\xc1p'\n",
+      "url: http://176507.k997hh.com/?PUT=a_show&AID=74944&FID=176507&R2=&CHANNEL=\n",
+      "content-type: text/html; charset=Big5\n",
+      "content: b'<html><head><title>\\xab\\xe1\\xaec\\xa8\\xdf\\xb6O\\xa6\\xe2\\xb1\\xa1\\xb5\\xf8\\xc0W\\xaa\\xbd\\xbc\\xbd\\xb6\\xa1</title><meta http-equiv=content-type content=\"text/html; charset=big5\">\\n<meta name=\"Keywords\" content=\"\\xad\\xb5\\xc6[\\xbd\\xe0\\xa4s\\xb1T\\xa4\\xf2\\xa4\\xf9\\xa4\\xd1\\xa4W\\xa4H\\xb6\\xa1\\xaeT\\xbc\\xd6\\xba\\xf4\\xb5\\xf8\\xb0T\\xb7|\\xc4\\xb3\\xad^\\xa4\\xe5\\xa7K\\xb6O\\xa6\\xa8\\xa4H\\xba\\xf4\\xac\\xbd\\xac\\xbd\\xef'\n",
+      "url: http://1g40.hoosierscabinet.net/About-Us/Principals-Page/index.html\n",
+      "content-type: text/html; charset=utf-8\n",
+      "content: b'<!DOCTYPE html>\\n<html lang=\"en\"><head>\\n<meta name=\"applicable-device\" content=\"pc,mobile\"/>\\n    <!-- Global site tag (gtag.js) - Google Analytics -->\\n    <script async=\"\" src=\"http://www.googletagmana'\n",
+      "url: http://1g40.hoosierscabinet.net/Admissions/index.html\n",
+      "content-type: text/html; charset=utf-8\n",
+      "content: b'<!DOCTYPE html>\\n<html lang=\"en\"><head>\\n<meta name=\"applicable-device\" content=\"pc,mobile\"/>\\n    <!-- Global site tag (gtag.js) - Google Analytics -->\\n    <script async=\"\" src=\"http://www.googletagmana'\n",
+      "url: http://1stforprint.co.uk/shop/print/leaflets-and-flyers/dl-flyers-leaflets-full-colour-single-sided-24hr-dispatch-250gsm-plain-2/\n",
+      "content-type: text/html; charset=UTF-8\n",
+      "content: b'<!DOCTYPE html>\\n<html lang=\"en-GB\" prefix=\"og: https://ogp.me/ns#\">\\n<head>\\n<meta charset=\\'UTF-8\\'>\\n<meta name=\"viewport\" content=\"width=device-width, initial-scale=1\">\\n<link rel=\"profile\" href=\"http://'\n",
+      "url: http://2-floor.dyndns.org/item_detail.php?pro_id=541152\n",
+      "content-type: text/html; charset=UTF-8\n",
+      "content: b'<!DOCTYPE html PUBLIC \"-//W3C//DTD XHTML 1.0 Transitional//EN\" \"http://www.w3.org/TR/xhtml1/DTD/xhtml1-transitional.dtd\">\\r\\n<html xmlns=\"http://www.w3.org/1999/xhtml\">\\r\\n<head>\\r\\n<meta http-equiv=\"Conten'\n",
+      "url: http://221625.tu75h.com/index.phtml?PUT=a_show&AID=206811&FID=221625&R2=&CHANNEL=\n",
+      "content-type: text/html; charset=Big5\n",
+      "content: b'<html>\\n\\n<head>\\n<title>\\n\\xa5x\\xc6W\\xb2\\xa2\\xa4\\xdf\\xa4k\\xab\\xc4\\xb2\\xe1\\xa4\\xd1\\xab\\xc7</title>\\n<meta http-equiv=\"PICS-Label\" content=\\'(PICS-1.1 \"http://www.ticrf.org.tw/chinese/html/06-rating-v11.htm\" l gen true for \"http://221625.tu75h.com\" r ('\n",
+      "url: http://222.ninja-official.com/2020/01/30/ninja-kyototo/\n",
+      "content-type: text/html; charset=UTF-8\n",
+      "content: b'<!DOCTYPE html>\\n<html lang=\"ja\"><head prefix=\"og: http://ogp.me/ns# fb: http://ogp.me/ns/fb# article: http://ogp.me/ns/article#\"><meta charset=\"utf-8\" />\\n<meta name=\"viewport\" content=\"width=device-wi'\n",
+      "url: http://24hourlocksmith-san-antonio.com/02/1050/\n",
+      "content-type: text/html; charset=UTF-8\n",
+      "content: b'<!doctype html>\\n<html lang=\"zh-CN\">\\n<head>\\n\\t<meta charset=\"UTF-8\">\\n\\t<meta name=\"viewport\" content=\"width=device-width, initial-scale=1\">\\n\\t<link rel=\"profile\" href=\"https://gmpg.org/xfn/11\">\\n<script ty'\n",
+      "url: http://24hourlocksmith-san-antonio.com/06/1216/\n",
+      "content-type: text/html; charset=UTF-8\n",
+      "content: b'<!doctype html>\\n<html lang=\"zh-CN\">\\n<head>\\n\\t<meta charset=\"UTF-8\">\\n\\t<meta name=\"viewport\" content=\"width=device-width, initial-scale=1\">\\n\\t<link rel=\"profile\" href=\"https://gmpg.org/xfn/11\">\\n<script ty'\n",
+      "url: http://257500.ru/vilyarreal-atletiko-m-smotret-onlajn-videotranslyaciyu-matcha-la-ligi/\n",
+      "content-type: text/html; charset=UTF-8\n",
+      "content: b'<!DOCTYPE html PUBLIC \"-//W3C//DTD XHTML 1.0 Transitional//EN\" \"http://www.w3.org/TR/xhtml1/DTD/xhtml1-transitional.dtd\">\\r\\n<html xmlns=\"http://www.w3.org/1999/xhtml\">\\r\\n<head>\\r\\n<meta http-equiv=\"Conten'\n",
+      "url: http://2chjoke.blog51.fc2.com/blog-entry-51.html\n",
+      "content-type: text/html;charset=utf-8\n",
+      "content: b'<!DOCTYPE html PUBLIC \"-//W3C//DTD XHTML 1.0 Transitional//EN\" \"http://www.w3.org/TR/xhtml1/DTD/xhtml1-transitional.dtd\">\\r\\n<html xmlns=\"http://www.w3.org/1999/xhtml\">\\r\\n<head>\\r\\n<title>[\\xe6\\x90\\xbe\\xe4\\xb9\\xb3\\xe6\\xa9\\x9f] by \\xef\\xbc'\n",
+      "url: http://2d6lodge.co.uk/a-cambridge-too-far-2019/cambridge-far-2017/2d6_logo/\n",
+      "content-type: text/html; charset=UTF-8\n",
+      "content: b'<!DOCTYPE html>\\n<!--[if IE 7]>\\n<html class=\"ie ie7\" lang=\"en-US\">\\n<![endif]-->\\n<!--[if IE 8]>\\n<html class=\"ie ie8\" lang=\"en-US\">\\n<![endif]-->\\n<!--[if !(IE 7) & !(IE 8)]><!-->\\n<html lang=\"en-US\">\\n<!--<'\n",
+      "url: http://2fit.anandtech.com/tag/mali-t880\n",
+      "content-type: text/html; charset=utf-8\n",
+      "content: b'\\n<!DOCTYPE html>\\n<html>\\n<!--[if IE 6]> <html class=\"ie6\"> <![endif]-->\\n<!--[if IE 7]> <html class=\"ie7\"> <![endif]-->\\n<!--[if IE 8]> <html class=\"ie8\"> <![endif]-->\\n<!--[if IE 9]> <html class=\"ie9\"> <'\n",
+      "url: http://2mares.org/un-relampago-desvanece-su-rostro/\n",
+      "content-type: text/html; charset=UTF-8\n",
+      "content: b'<!DOCTYPE html>\\n<html lang=\"fr-FR\" itemscope itemtype=\"http://schema.org/WebPage\">\\n<head>\\n<meta charset=\"UTF-8\">\\n<link rel=\"profile\" href=\"http://gmpg.org/xfn/11\">\\n<title>Un rel\\xc3\\xa1mpago desvanece su ro'\n",
+      "url: http://2pm4u.blog.fc2.com/blog-entry-226.html\n",
+      "content-type: text/html;charset=utf-8\n",
+      "content: b'<!DOCTYPE HTML\\r\\n\\tPUBLIC \"-//W3C//DTD HTML 4.01 Transitional//EN\"\\r\\n\\t\\t\"http://www.w3.org/TR/html4/loose.dtd\">\\r\\n<!--\\r\\n<!DOCTYPE HTML\\r\\n\\tPUBLIC \"-//W3C//DTD HTML 4.01//EN\"\\r\\n\\t\\t\"http://www.w3.org/TR/html4/st'\n",
+      "url: http://2sc.sohu.com/buycar/carinfo_sohu_1907886.shtml\n",
+      "content-type: text/html;charset=UTF-8\n",
+      "content: b'<!DOCTYPE html PUBLIC \"-//W3C//DTD XHTML 1.0 Transitional//EN\" \"http://www.w3.org/TR/xhtml1/DTD/xhtml1-transitional.dtd\">\\n<html>\\n<head>\\n    <meta http-equiv=\"Content-Type\" content=\"text/html; charset='\n",
+      "url: http://31sdgsyyktjdyxgs.hbpuyu.com/\n",
+      "content-type: text/html\n",
+      "content: b'<!DOCTYPE html>\\n<html lang=\"en\">\\n<head>\\n<meta charset=\"UTF-8\">\\n<meta http-equiv=\"X-UA-Compatible\" content=\"IE=edge,chrome=1\" />\\n<meta name=\"viewport\" content=\"width=device-width, initial-scale=1.0, mi'\n",
+      "url: http://342156.ya93e.com/?PUT=a_show&AID=184235&FID=342156&R2=&CHANNEL=\n",
+      "content-type: text/html; charset=Big5\n",
+      "content: b'<html><head><title>mm\\xa9]\\xa6\\xe2\\xa6\\xe2\\xafT\\xaa\\xbd\\xbc\\xbd ,\\xa7K\\xb6O\\xa6\\xa8.\\xa4H\\xba\\xa9\\xb5e\\xbdu\\xa4W\\xac\\xdd</title><meta http-equiv=content-type content=\"text/html; charset=big5\">\\n<meta name=\"Keywords\" content=\"UThome\\xb5\\xf8\\xc0W\\xb2\\xe1\\xa4\\xd1\\xab\\xc7 ,\\xa6\\xe2\\xb1\\xa1\\xac\\xfc\\xa4k\\xa8q\\xb3\\xf5\\xbbr\\xb2\\xe1 ,\\xa4\\xa4\\xb0\\xea\\xbbr'\n",
+      "url: http://34383314.blog.fc2.com/blog-entry-4360.html\n",
+      "content-type: text/html;charset=utf-8\n",
+      "content: b'<!DOCTYPE HTML PUBLIC \"-//W3C//DTD HTML 4.01 Transitional//EN\" \"http://www.w3.org/TR/html4/loose.dtd\">\\n<html>\\n<head>\\n<meta http-equiv=\"Content-Type\" content=\"text/html; charset=utf-8\">\\n<meta http-equi'\n",
+      "url: http://344560.hge100.com/index.phtml?PUT=A_SORT&SORT=R42&FID=344560\n",
+      "content-type: text/html; charset=Big5\n",
+      "content: b'<html>\\n\\n<head>\\n<title>\\n\\xa4\\xe9\\xa5\\xbbAV\\xa4k\\xc0u\\xbcg\\xafu\\xb6\\xb0\\xb5\\xf8\\xc0W,28\\xb8\\xb9\\xa4\\xbd\\xc0]\\xb5\\xf8\\xc0W\\xb2\\xe1\\xa4\\xd1\\xab\\xc7</title>\\n<meta http-equiv=\"PICS-Label\" content=\\'(PICS-1.1 \"http://www.ticrf.org.tw/chinese/html/06-rating-v11.htm\" l gen true for \"http://'\n",
+      "url: http://34474.dynamicboard.de/t506f72-Info-MSI-NetOn-AP-DisplayPC-GB-GB.html\n",
+      "content-type: text/html; charset=iso-8859-1\n",
+      "content: b'\\r\\n<!DOCTYPE html>\\r\\n<HTML xmlns=\"http://www.w3.org/1999/xhtml\" xmlns:fb=\"http://www.facebook.com/2008/fbml\" xml:lang=\"de\" lang=\"de\">\\r\\n<HEAD>\\r\\n\\r\\n<title>Informationen &raquo; Info: MSI NetOn AP 1900 Disp'\n",
+      "url: http://344812842.ideedigusto.it/%E5%B9%BC%E5%85%92-%E5%A4%A7-%E8%82%8C%E8%82%89-%E9%81%8A%E6%88%B2.html/\n",
+      "content-type: text/html; charset=UTF-8\n",
+      "content: b'<form action=\"http://344812842.ideedigusto.it/%E5%B9%BC%E5%85%92-%E5%A4%A7-%E8%82%8C%E8%82%89-%E9%81%8A%E6%88%B2.html/\" method=\"post\">\\n<button type=\"submit\">Please click if you not a BOT</button>\\n</fo'\n",
+      "url: http://360ext.com/vodplay/450675-1-1.html\n",
+      "content-type: text/html; charset=utf-8\n",
+      "content: b'<!DOCTYPE html PUBLIC \"-//W3C//DTD XHTML 1.0 Transitional//EN\" \"http://www.w3.org/TR/xhtml1/DTD/xhtml1-transitional.dtd\">\\n<html xmlns=\"http://www.w3.org/1999/xhtml\">\\n<head>\\n<meta http-equiv=\"Content-T'\n",
+      "url: http://371692975.eubos-med.pl/%E5%8F%B0-%E5%8D%8A-%E7%89%B9-%E6%96%AF-%E6%8B%89.html/\n",
+      "content-type: text/html; charset=UTF-8\n",
+      "content: b'<form action=\"http://371692975.eubos-med.pl/%E5%8F%B0-%E5%8D%8A-%E7%89%B9-%E6%96%AF-%E6%8B%89.html/\" method=\"post\">\\n<button type=\"submit\">Please click if you not a BOT</button>\\n</form>\\n34.236.134.129 '\n",
+      "url: http://3ai6.121wk.com/doc-902.html\n",
+      "content-type: text/html; charset=UTF-8\n",
+      "content: b'<!DOCTYPE html>\\n<html>\\n<head>\\n    <meta charset=\"utf-8\" />\\n    <meta http-equiv=\"X-UA-Compatible\" content=\"IE=edge,chrome=1\" />\\n    <meta name=\"keywords\" content=\"\\xe8\\x89\\xbe\\xe7\\x91\\x9e,\\xe5\\x92\\xa8\\xe8\\xaf\\xa2,2021,\\xe4\\xb8\\xad\\xe5\\x9b\\xbd,\\xe4\\xba\\xba\\xe7\\x89\\xa9,\\xe8\\x81\\x94\\xe7'\n",
+      "url: http://3tilbudnu.dk/elektriker/praestemark-holbaek-7/\n",
+      "content-type: text/html; charset=UTF-8\n",
+      "content: b'<!DOCTYPE html>\\n<html lang=\"da-DK\">\\n<head >\\n<meta charset=\"UTF-8\" />\\n<meta name=\"viewport\" content=\"width=device-width, initial-scale=1\" />\\n<title>Elektriker i Pr\\xc3\\xa6stemark Holb\\xc3\\xa6k \\xe2\\x87\\x92 F\\xc3\\xa5 3 gratis og '\n",
+      "url: http://401605997.skvazhinann.ru/%D8%A7%D9%84%D9%82%D8%B1%D9%8A%D8%AF%D8%A7%D8%AA.html/\n",
+      "content-type: text/html; charset=UTF-8\n",
+      "content: b'<form action=\"http://401605997.skvazhinann.ru/%D8%A7%D9%84%D9%82%D8%B1%D9%8A%D8%AF%D8%A7%D8%AA.html/\" method=\"post\">\\n<button type=\"submit\">Please click if you not a BOT</button>\\n</form>\\n44.192.20.240 '\n",
+      "url: http://422960136.euphoria4u.ru/%E5%9B%9B%E5%AD%A3-%E8%A2%AB-%E6%98%AF-%E4%BB%80%E9%BA%BC.html/\n",
+      "content-type: text/html; charset=UTF-8\n",
+      "content: b'<form action=\"http://422960136.euphoria4u.ru/%E5%9B%9B%E5%AD%A3-%E8%A2%AB-%E6%98%AF-%E4%BB%80%E9%BA%BC.html/\" method=\"post\">\\n<button type=\"submit\">Please click if you not a BOT</button>\\n</form>\\n35.175'\n",
+      "url: http://432722.com/11134029194.html\n",
+      "content-type: text/html; charset=UTF-8\n",
+      "content: b'<html mip=\"\">\\n<head>\\n<meta charset=\"utf-8\" />\\n<title>\\xe5\\xbc\\x80\\xe9\\x97\\xa8\\xe7\\xba\\xa2-\\xe6\\xbb\\xa1\\xe6\\xb1\\x9f\\xe7\\xba\\xa2\\xef\\xbc\\x81</title>\\n<meta name=\"keywords\" content=\"404 Not Found\"/>\\n<meta name=\"description\" content=\"404 Not Found\" />\\n<script>\\n(functi'\n",
+      "url: http://435027.com/info/1679823\n",
+      "content-type: text/html; charset=utf-8\n",
+      "content: b'<!DOCTYPE html>\\n<html lang=\"en\">\\n<head>\\n\\t<meta charset=\"utf-8\">\\n\\t<meta name=\"viewport\" content=\"width=device-width, initial-scale=1.0\" />\\n\\t<title>065\\xe6\\x9c\\x9f:\\xe3\\x80\\x96\\xe4\\xbb\\xbb\\xe6\\x88\\x91\\xe7\\x99\\xbc\\xe6\\x9c\\x80\\xe9\\xab\\x98\\xe5\\xbf\\x83\\xe6\\xb0\\xb4\\xe3\\x80\\x97\\xe7\\xb2\\xbe\\xe9\\x81\\xb8\\xe3\\x80\\x90\\xe7\\xbb\\x9d\\xe6\\x9d\\x80\\xe5\\x8d\\x8a\\xe6\\xb3'\n",
+      "url: http://4promoproducts.com/nm/t-shirts-caballo.php\n",
+      "content-type: text/html; charset=UTF-8\n",
+      "content: b'<!DOCTYPE html>\\n<html xmlns=\"http://www.w3.org/1999/xhtml\">\\n<head><meta content=\"HTML Tidy for Linux (vers 25 March 2009), see www.w3.org\" name=\"generator\" /><script type=\"text/javascript\">\\n \\n //<![CD'\n",
+      "url: http://4put.ru/pics/s_50_17/r_140_13/u_4_4/g_11_1/small_8475/\n",
+      "content-type: text/html; charset=WINDOWS-1251\n",
+      "content: b'<!DOCTYPE HTML PUBLIC \"-//W3C//DTD HTML 4.0 Transitional//EN\">\\r\\n<HTML>\\r\\n<HEAD>\\r\\n<TITLE>2008.10.10. 22-37. 1 \\xea\\xe0\\xed\\xe0\\xeb. \\xc3\\xee\\xf0\\xe4\\xee\\xed \\xca\\xe8\\xf5\\xee\\xf2 (\\xb95). \\xc2.\\xc5\\xf0\\xee\\xf4\\xe5\\xe5\\xe2 (sl), \\xef\\xf0\\xe5\\xe2\\xfc\\xfe / \\xd1\\xcc\\xc8. \\xd2\\xc2. 1 \\xea\\xe0\\xed\\xe0\\xeb. / \\xca\\xe0\\xf0\\xf2\\xe8\\xed\\xea\\xe8 \\xef\\xee\\xeb\\xfc\\xe7\\xee\\xe2\\xe0\\xf2\\xe5\\xeb\\xff'\n",
+      "url: http://52eshu.com/89807609.html\n",
+      "content-type: text/html; charset=UTF-8\n",
+      "content: b'<html mip=\"\">\\n<head>\\n<meta charset=\"utf-8\" />\\n<title>\\xe8\\xae\\xbf\\xe9\\x97\\xae\\xe5\\xae\\x89\\xe5\\x85\\xa8</title>\\n<meta name=\"keywords\" content=\"\\xe8\\xae\\xbf\\xe9\\x97\\xae\\xe5\\xae\\x89\\xe5\\x85\\xa8\"/>\\n<meta name=\"description\" content=\"\\xe8\\xae\\xbf\\xe9\\x97\\xae\\xe5\\xae\\x89\\xe5\\x85\\xa8\" />\\n<script>\\n(function(){\\nvar bp'\n",
+      "url: http://534925480.administracja-publiczna.edu.pl/%E5%8F%B0%E4%B8%AD-%E5%B8%82-%E6%9D%B1%E5%8D%80-%E6%9D%B1-%E8%8B%B1-%E5%8D%81-%E4%BA%94-%E8%A1%97.html/\n",
+      "content-type: text/html; charset=UTF-8\n",
+      "content: b'<form action=\"http://534925480.administracja-publiczna.edu.pl/%E5%8F%B0%E4%B8%AD-%E5%B8%82-%E6%9D%B1%E5%8D%80-%E6%9D%B1-%E8%8B%B1-%E5%8D%81-%E4%BA%94-%E8%A1%97.html/\" method=\"post\">\\n<button type=\"subm'\n",
+      "url: http://5funny.blog.fc2.com/img/IMG_0833.jpg/\n",
+      "content-type: text/html;charset=utf-8\n",
+      "content: b'<!DOCTYPE html>\\n<html>\\n<head>\\n<meta charset=\"utf-8\">\\n<meta http-equiv=\"X-UA-Compatible\" content=\"IE=edge\">\\n<meta name=\"viewport\" content=\"width=device-width,initial-scale=1\">\\n<title>\\xef\\xbc\\x95\\xe5\\x8c\\xb9\\xe3\\x81\\xae Funny St'\n",
+      "url: http://6aly.livertransplantation.net/contact\n",
+      "content-type: text/html; charset=utf-8; charset=utf-8\n",
+      "content: b'<!DOCTYPE html PUBLIC \"-//W3C//DTD XHTML+RDFa 1.0//EN\" \"http://www.w3.org/MarkUp/DTD/xhtml-rdfa-1.dtd\">\\n<html dir=\"ltr\" version=\"XHTML+RDFa 1.0\" xml:lang=\"en\" xmlns=\"http://www.w3.org/1999/xhtml\" xmln'\n",
+      "url: http://703933260.barbiestyle.it/%E4%BF%9D-%E5%8F%AF-%E6%98%8E.html/\n",
+      "content-type: text/html; charset=UTF-8\n",
+      "content: b'<form action=\"http://703933260.barbiestyle.it/%E4%BF%9D-%E5%8F%AF-%E6%98%8E.html/\" method=\"post\">\\n<button type=\"submit\">Please click if you not a BOT</button>\\n</form>\\n3.236.223.106 CCBot/2.0 (https://'\n",
+      "url: http://71lady.net/48365906.html\n",
+      "content-type: text/html; charset=UTF-8\n",
+      "content: b'<html><head><title>71lady.net</title><meta name=\"viewport\" content=\"width=device-width, initial-scale=1.0, minimum-scale=1.0, maximum-scale=1.0, user-scalable=no\"/><script src=\"http://libs.baidu.com/j'\n",
+      "url: http://8008202020.alacte.com/a/639j9_357217.html\n",
+      "content-type: text/html; charset=UTF-8\n",
+      "content: b'<html><head><script type=text/javascript src=\"/xynew.js\"></script><script type=text/javascript src=\"/ts.js\"></script></head><body bgcolor=\"white\"><center><h1>404 Not Found</h1></center><hr><center>ngi'\n",
+      "url: http://83863.webhosting22.1blu.de/omega17/index.php/UsersOnlineList/?s=5de99930b6d5d985129a7bd0a6d3658264fbbe4e\n",
+      "content-type: text/html; charset=UTF-8\n",
+      "content: b'<!DOCTYPE html>\\r\\n<html dir=\"ltr\" lang=\"de\">\\r\\n<head>\\r\\n\\t<title>Benutzer online - Omega Allianz</title>\\r\\n\\t\\r\\n\\t<base href=\"http://83863.webhosting22.1blu.de/omega17/\" />\\r\\n<meta charset=\"utf-8\" />\\r\\n<meta na'\n",
+      "url: http://8bithorse.blogspot.com/2014/12/the-legend-of-zelda-101.html\n",
+      "content-type: text/html; charset=UTF-8\n",
+      "content: b\"<!DOCTYPE html>\\n<html dir='ltr' xmlns='http://www.w3.org/1999/xhtml' xmlns:b='http://www.google.com/2005/gml/b' xmlns:data='http://www.google.com/2005/gml/data' xmlns:expr='http://www.google.com/2005/\"\n",
+      "url: http://8oix2.fabulousshontay.com/library\n",
+      "content-type: text/html; charset=UTF-8; charset=utf-8\n",
+      "content: b'<!DOCTYPE html>\\n<html lang=\"en\"><head>\\n<meta name=\"applicable-device\" content=\"pc,mobile\"/>\\n<!-- Landmark College\\'s Google Tag Manager -->\\n<script>(function(w,d,s,l,i){w[l]=w[l]||[];w[l].push({\\'gtm.st'\n",
+      "url: http://9.landmark-church.com/supporting-shipping/\n",
+      "content-type: text/html; charset=UTF-8; charset=utf-8\n",
+      "content: b'<!DOCTYPE html>\\n<html lang=\"en-GB\"><head>\\n<meta name=\"applicable-device\" content=\"pc,mobile\"/>\\n\\n<script type=\"00f6e064c1242f15ee4f122a-text/javascript\">\\n        (function(w, d, s, l, i) {\\n            '\n",
+      "url: http://90-tage-am-see.de/product/D/582273\n",
+      "content-type: text/html; charset=UTF-8\n",
+      "content: b'<!DOCTYPE html>\\r\\n<html lang=\"ja\">\\r\\n<head>\\r\\n<meta http-equiv=\"Content-Type\" content=\"text/html; charset=UTF-8\">\\r\\n<meta http-equiv=\"X-UA-Compatible\" content=\"IE=edge\">\\r\\n<meta name=\"format-detection\" con'\n",
+      "url: http://911forum.org.uk/viewtopic.php?t=21970&sid=2cf3a3f8df71fb7cc5deeb25a0bea34d&start=30\n",
+      "content-type: text/html; charset=UTF-8\n",
+      "content: b'<!DOCTYPE html>\\n<html dir=\"ltr\" lang=\"en-gb\">\\n<head>\\n<meta charset=\"utf-8\" />\\n<meta http-equiv=\"X-UA-Compatible\" content=\"IE=edge\">\\n<meta name=\"viewport\" content=\"width=device-width, initial-scale=1\" '\n",
+      "url: http://9243591.compuguide.be/warmtepomp-kopen/\n",
+      "content-type: text/html; charset=UTF-8\n",
+      "content: b'<!doctype html>\\n<html lang=\"nl\">\\n<head>\\n\\n<!-- Global site tag (gtag.js) - Google Analytics -->\\n<script async src=\"https://www.googletagmanager.com/gtag/js?id=G-1RKG3F1CVB\"></script>\\n<script>\\n  window.'\n",
+      "url: http://98tang028.xyz/index.php/vod/play/id/111686/sid/1/nid/1.html\n",
+      "content-type: text/html; charset=utf-8\n",
+      "content: b'<!doctype html><html lang=\"en\"><head><meta charset=\"utf-8\"><meta http-equiv=\"X-UA-Compatible\" content=\"IE=edge,chrome=1\"><meta content=\"width=device-width, initial-scale=1.0, user-scalable=0\" name=\"vi'\n",
+      "url: http://9kt7.tyjyjt.net/products-services/data-networking/servers/\n",
+      "content-type: text/html; charset=UTF-8; charset=utf-8\n",
+      "content: b'<!DOCTYPE html>\\n<html class=\"no-js\" lang=\"en\"><head>\\n<meta name=\"applicable-device\" content=\"pc,mobile\"/>\\n\\t<meta charset=\"utf-8\"/>\\n\\t<meta content=\"width=device-width, initial-scale=1, maximum-scale=1,'\n",
+      "url: http://a-fleur-de-peau.fr/archives/eyes-of-the-day/midnight-crawl/\n",
+      "content-type: text/html; charset=UTF-8\n",
+      "content: b'\\r\\n\\r\\n<!DOCTYPE html>\\r\\n<!--[if IE 9]><html class=\"ie9 no-mobile-device\" lang=\"fr-FR\"> <![endif]-->\\r\\n<!--[if gt IE 9]><!--> <html class=\"no-mobile-device\" lang=\"fr-FR\"> <!--<![endif]-->\\r\\n\\r\\n<head>\\r\\n\\r\\n\\t<me'\n",
+      "url: http://aappma-sarrebourg.eu/blog/craigslist-ic.html\n",
+      "content-type: text/html\n",
+      "content: b''\n",
+      "url: http://abali.ru/tag/vozdushnye-sily/\n",
+      "content-type: text/html; charset=UTF-8\n",
+      "content: b'<!DOCTYPE html>\\n<html lang=\"ru-RU\" class=\"\" data-skin=\"light\">\\n<head>\\n\\t<meta charset=\"UTF-8\" />\\n\\t<link rel=\"profile\" href=\"http://gmpg.org/xfn/11\" />\\n\\t<title>\\xd0\\xb2\\xd0\\xbe\\xd0\\xb7\\xd0\\xb4\\xd1\\x83\\xd1\\x88\\xd0\\xbd\\xd1\\x8b\\xd0\\xb5 \\xd1\\x81\\xd0\\xb8\\xd0\\xbb\\xd1\\x8b &#8212; Abali.'\n",
+      "url: http://abanagazetesi.org/harmasonda-parke-calismalari/\n",
+      "content-type: text/html; charset=UTF-8\n",
+      "content: b'<!DOCTYPE html>\\n<html lang=\"tr\"><head>\\n<meta charset=\"UTF-8\" />\\n<meta name=\"viewport\" content=\"width=device-width, initial-scale=1\" />\\n<meta name=\"theme-color\" content=\"#d80000\" />\\n<title>  HARMASON\\xe2\\x80'\n",
+      "url: http://abcfec.performancepublishing.net/services/commercial-buildout-construction/l/487\n",
+      "content-type: text/html; charset=utf-8\n",
+      "content: b'\\r\\n\\r\\n<!DOCTYPE html>\\r\\n<!--[if lt IE 7 ]><html class=\"no-js ie6 ie\" lang=\"en\"><![endif]-->\\r\\n<!--[if IE 7 ]><html class=\"no-js ie7 ie\" lang=\"en\"><![endif]-->\\r\\n<!--[if IE 8 ]><html class=\"no-js ie8 ie\" la'\n",
+      "url: http://abolition-ms.org/es/recursos/newsletter/boletin-de-noticias-noviembre-2023/\n",
+      "content-type: text/html; charset=UTF-8\n",
+      "content: b' <!doctype html>\\r\\n<html lang=\"es-ES\">\\r\\n<head>\\r\\n\\t<meta charset=\"UTF-8\">\\r\\n\\t<meta name=\"viewport\" content=\"width=device-width, initial-scale=1.0, user-scalable=no\">\\r\\n\\t<link rel=\"profile\" href=\"http://gmp'\n",
+      "url: http://abooktopia.weebly.com/reviews/the-bone-season-by-samantha-shannon5866611\n",
+      "content-type: text/html; charset=UTF-8\n",
+      "content: b'<!DOCTYPE html>\\n<html lang=\"en\" xmlns:fb=\"http://ogp.me/ns/fb#\">\\n\\t<head>\\n\\t\\t<title>THE BONE SEASON BY SAMANTHA SHANNON - Abooktopia</title><meta property=\"og:site_name\" content=\"Abooktopia\" />\\n<meta pr'\n",
+      "url: http://abschaffung-der-jagd.at/reaktionen-jaeger-anregung-diskussion.htm\n",
+      "content-type: text/html\n",
+      "content: b'<html>\\n\\n<head>\\n<meta http-equiv=\"Content-Type\" content=\"text/html; charset=windows-1252\">\\n<meta name=\"GENERATOR\" content=\"Microsoft FrontPage 4.0\">\\n<meta name=\"ProgId\" content=\"FrontPage.Editor.Docume'\n",
+      "url: http://absoku072.com/blog-entry-3039.html\n",
+      "content-type: text/html; charset=UTF-8\n",
+      "content: b'<!DOCTYPE HTML>\\n<html>\\n<head>\\n<meta charset=\"utf-8\">\\n<meta http-equiv=\"X-UA-Compatible\" content=\"IE=edge\">\\n<link rel=\"stylesheet\" href=\"http://absoku072.com/wp-content/themes/abusoku/style.css?=202402'\n",
+      "url: http://absurd.blogo.jp/archives/49496350.html\n",
+      "content-type: text/html; charset=utf-8\n",
+      "content: b'<?xml version=\"1.0\" encoding=\"UTF-8\"?>\\n<!DOCTYPE html PUBLIC \"-//W3C//DTD XHTML 1.0 Transitional//EN\" \"http://www.w3.org/TR/xhtml1/DTD/xhtml1-transitional.dtd\">\\n<html xmlns=\"http://www.w3.org/1999/xht'\n",
+      "url: http://absurd.blogo.jp/archives/52797615.html\n",
+      "content-type: text/html; charset=utf-8\n",
+      "content: b'<?xml version=\"1.0\" encoding=\"UTF-8\"?>\\n<!DOCTYPE html PUBLIC \"-//W3C//DTD XHTML 1.0 Transitional//EN\" \"http://www.w3.org/TR/xhtml1/DTD/xhtml1-transitional.dtd\">\\n<html xmlns=\"http://www.w3.org/1999/xht'\n",
+      "url: http://academy.reihan-studio.com/become-an-instructor/\n",
+      "content-type: text/html; charset=UTF-8\n",
+      "content: b'<!DOCTYPE html>\\n<html dir=\"rtl\" lang=\"fa-IR\" >\\n<head>\\n\\t\\t<meta charset=\"UTF-8\">\\n\\t<meta name=\"viewport\" content=\"width=device-width, initial-scale=1\">\\n\\t<link rel=\"profile\" href=\"https://gmpg.org/xfn/11\"'\n",
+      "url: http://academybyga.com/2021/04/01/test-post/\n",
+      "content-type: text/html; charset=UTF-8\n",
+      "content: b'<!doctype html>\\n\\n\\n<!--[if lt IE 7]><html class=\"no-js lt-ie9 lt-ie8 lt-ie7 blog post \" lang=\"en\"> <![endif]-->\\n<!--[if IE 7]>  <html class=\"no-js lt-ie9 lt-ie8 blog post \" lang=\"en\"> <![endif]-->\\n<!--'\n",
+      "url: http://academybyga.com/2021/12/11/where-to-purchase-zithromax-500-mg-without-prescription/\n",
+      "content-type: text/html; charset=UTF-8\n",
+      "content: b'<!doctype html>\\n\\n\\n<!--[if lt IE 7]><html class=\"no-js lt-ie9 lt-ie8 lt-ie7 blog post \" lang=\"en\"> <![endif]-->\\n<!--[if IE 7]>  <html class=\"no-js lt-ie9 lt-ie8 blog post \" lang=\"en\"> <![endif]-->\\n<!--'\n",
+      "url: http://accept.bison.net/en/product.6303671\n",
+      "content-type: text/html;charset=utf-8\n",
+      "content: b'<!doctype html>\\r\\n<html xmlns=\"http://www.w3.org/1999/xhtml\" lang=\"en\" class=\"brand-site\">\\r\\n    <head>\\r\\n        <meta charset=\"utf-8\"/>\\r\\n<title>Bison | Product</title>\\r\\n<meta http-equiv=\"X-UA-Compatibl'\n",
+      "url: http://accm.de/\n",
+      "content-type: text/html\n",
+      "content: b'<!DOCTYPE html PUBLIC \"-//W3C//DTD XHTML 1.0 Frameset//EN\" \"http://www.w3.org/TR/xhtml1/DTD/xhtml1-frameset.dtd\">\\n\\n<html>\\n<head>\\n<title>EcoTaxes GmbH Steuerberatungsgesellschaft</title>\\n<meta name=\"ke'\n",
+      "url: http://accuratus.co.za/?MA\n",
+      "content-type: text/html; charset=UTF-8\n",
+      "content: b'<!DOCTYPE html><html><head><meta http-equiv=\"Content-type\" content=\"text/html; charset=UTF-8\" /><meta name=\"viewport\" content=\"width=device-width, initial-scale=1.0\" /><link rel=\"stylesheet\" href=\"/_a'\n",
+      "url: http://ace.armor.kiev.ua/forum/viewforum.php?f=1&sid=10f2b222d08c71a6df0829eabc165cac\n",
+      "content-type: text/html; charset=UTF-8\n",
+      "content: b'<!DOCTYPE html>\\n<html dir=\"ltr\" lang=\"ru\">\\n<head>\\n<meta charset=\"utf-8\" />\\n<meta http-equiv=\"X-UA-Compatible\" content=\"IE=edge\">\\n\\n<title>\\xd0\\x91\\xd1\\x80\\xd0\\xbe\\xd0\\xbd\\xd1\\x8f \\xd0\\xb2 72 \\xd0\\xbc\\xd0\\xb0\\xd1\\x81\\xd1\\x88\\xd1\\x82\\xd0\\xb0\\xd0\\xb1\\xd0\\xb5 - \\xd0\\xa4\\xd0\\xbe\\xd1\\x80\\xd1\\x83\\xd0\\xbc \\xd0\\xbc\\xd0\\xb0\\xd1\\x81\\xd1\\x88\\xd1\\x82\\xd0\\xb0\\xd0\\xb1\\xd0\\xbd'\n",
+      "url: http://ace.mu.nu/archives/400450.php\n",
+      "content-type: text/html; charset=UTF-8\n",
+      "content: b'<!DOCTYPE html PUBLIC \"-//W3C//DTD XHTML 1.0 Transitional//EN\" \"http://www.w3.org/TR/xhtml1/DTD/xhtml1-transitional.dtd\">\\n\\n<html xmlns=\"http://www.w3.org/1999/xhtml\">\\n<head>\\n<meta http-equiv=\"Content-'\n",
+      "url: http://acekitchenplace.com/contact-us/\n",
+      "content-type: text/html; charset=UTF-8\n",
+      "content: b'<!doctype html>\\r\\n<html lang=\"en-US\">\\r\\n  <head>\\r\\n    <meta charset=\"UTF-8\" />\\r\\n    <meta name=\"viewport\" content=\"width=device-width, initial-scale=1\" />\\r\\n    <link rel=\"profile\" href=\"https://gmpg.org'\n",
+      "url: http://acervo.if.usp.br/index.php/informationobject/browse?subjects=&sort=lastUpdated&collection=20485&places=852&showAdvanced=1&topLod=0\n",
+      "content-type: text/html; charset=utf-8\n",
+      "content: b'<!DOCTYPE html>\\n<html lang=\"pt\" dir=\"ltr\">\\n  <head>\\n    <meta http-equiv=\"Content-Type\" content=\"text/html; charset=utf-8\" />\\n<meta http-equiv=\"X-Ua-Compatible\" content=\"IE=edge,chrome=1\" />\\n    <meta'\n",
+      "url: http://achatina.unnat.ru/Photo/Page.Eng/Photo18.htm\n",
+      "content-type: text/html; charset=windows-1251\n",
+      "content: b'<html>\\n\\n<head>\\n<style>\\nA.t1:link { color:\"#00FFFF\"; text-decoration: none}\\nA.t1:visited { color:white; text-decoration: none}\\nA.t1:hover {color:red; text-decoration: none}\\n</style>\\n\\n\\n\\n<meta NAME=\"Desc'\n",
+      "url: http://achfin.ru/2015-01-28-12-33-25/utverzhdennye-parametry-byudzheta-goroda/2015-2017-gody-6\n",
+      "content-type: text/html; charset=utf-8\n",
+      "content: b'<!DOCTYPE html PUBLIC \"-//W3C//DTD XHTML 1.0 Transitional//EN\" \"http://www.w3.org/TR/xhtml1/DTD/xhtml1-transitional.dtd\">\\r\\n<html xmlns=\"http://www.w3.org/1999/xhtml\" xml:lang=\"ru-ru\" lang=\"ru-ru\" >\\r\\n<'\n",
+      "url: http://achfin.ru/o-byudzhete-2/normativnaya-baza/normativnaya-baza-3\n",
+      "content-type: text/html; charset=utf-8\n",
+      "content: b'<!DOCTYPE html PUBLIC \"-//W3C//DTD XHTML 1.0 Transitional//EN\" \"http://www.w3.org/TR/xhtml1/DTD/xhtml1-transitional.dtd\">\\r\\n<html xmlns=\"http://www.w3.org/1999/xhtml\" xml:lang=\"ru-ru\" lang=\"ru-ru\" >\\r\\n<'\n",
+      "url: http://acikerisim.agu.edu.tr/xmlui/browse?type=author&value=Ulu%C4%9F%2C+%C3%96zden+Melis\n",
+      "content-type: text/html;charset=utf-8\n",
+      "content: b'<!DOCTYPE html>\\n            <!--[if lt IE 7]> <html class=\"no-js lt-ie9 lt-ie8 lt-ie7\" lang=\"en\"> <![endif]-->\\n            <!--[if IE 7]>    <html class=\"no-js lt-ie9 lt-ie8\" lang=\"en\"> <![endif]-->\\n '\n",
+      "url: http://actionforswifts.blogspot.com/2020/01/modified-schwegler-1mf.html\n",
+      "content-type: text/html; charset=UTF-8\n",
+      "content: b\"<!DOCTYPE html>\\n<html class='v2' dir='ltr' xmlns='http://www.w3.org/1999/xhtml' xmlns:b='http://www.google.com/2005/gml/b' xmlns:data='http://www.google.com/2005/gml/data' xmlns:expr='http://www.googl\"\n",
+      "url: http://acworthtourism.acworth.org/directory-things_to_do/listing/acworth-depot-park/\n",
+      "content-type: text/html; charset=UTF-8\n",
+      "content: b'<!DOCTYPE html>\\n<html class=\"avada-html-layout-wide avada-html-header-position-top\" lang=\"en-US\" prefix=\"og: http://ogp.me/ns# fb: http://ogp.me/ns/fb# og: http://ogp.me/ns# business: http://ogp.me/ns'\n",
+      "url: http://adaptanet.com.br/cliente/index.php?rp=%2Fstore%2Fstreaming%2Fs&carttpl=standard_cart&language=spanish\n",
+      "content-type: text/html; charset=UTF-8\n",
+      "content: b'Site error: the <a href=\"http://www.ioncube.com\">ionCube</a> PHP Loader needs to be installed. This is a widely used PHP extension for running ionCube protected PHP code, website security and malware '\n",
+      "url: http://adeera.com.ar/newsroom/archivosrevistas/ADEERA_43.pdf#page=53\n",
+      "content-type: application/pdf\n",
+      "content: b'%PDF-1.6\\r%\\xe2\\xe3\\xcf\\xd3\\r\\n1 0 obj<</CropBox[0.0 0.0 552.756 779.528]/Parent 825 0 R/Contents 2 0 R/Rotate 0/BleedBox[0.0 0.0 552.756 779.528]/ArtBox[0.0 0.0 552.756 779.528]/Group 22 0 R/MediaBox[0.0 0.0 552.75'\n",
+      "url: http://adhkiindonesia.or.id/580-Lia-Noviana/\n",
+      "content-type: text/html; charset=UTF-8\n",
+      "content: b'<!DOCTYPE html>\\n<html dir=\"ltr\" lang=\"id\" prefix=\"og: https://ogp.me/ns#\">\\n\\n<head>\\n\\t<meta charset=\"UTF-8\">\\n\\t<meta name=\"viewport\" content=\"width=device-width, initial-scale=1, minimum-scale=1\">\\n\\t<link'\n",
+      "url: http://adr.fr/000013105314313.html\n",
+      "content-type: text/html; charset=UTF-8\n",
+      "content: b''\n",
+      "url: http://adr.fr/00001310534329.html\n",
+      "content-type: text/html; charset=UTF-8\n",
+      "content: b''\n",
+      "url: http://advantagepestonline.com/tag/plants/\n",
+      "content-type: text/html; charset=UTF-8\n",
+      "content: b'<!doctype html>\\r\\n<html lang=\"en-US\">\\r\\n<head>\\r\\n\\t<meta charset=\"UTF-8\">\\r\\n\\t<meta name=\"viewport\" content=\"width=device-width, initial-scale=1.0\">\\r\\n\\t<meta http-equiv=\"X-UA-Compatible\" content=\"IE=edge\">\\r\\n'\n",
+      "url: http://aestheticbeards.com/lm1/\n",
+      "content-type: text/html; charset=utf-8\n",
+      "content: b'<!DOCTYPE HTML PUBLIC \"-//W3C//DTD HTML 4.01 Transitional//EN\" \"http://www.w3c.org/TR/1999/REC-html401-19991224/loose.dtd\">\\r\\n<HTML xmlns=\"http://www.w3.org/1999/xhtml\">\\r\\n\\r\\n<HEAD>\\r\\n    <meta http-equiv'\n",
+      "url: http://afub.cppkw.com/product/llyb/vzllj.html\n",
+      "content-type: text/html; charset=utf-8\n",
+      "content: b'<!DOCTYPE html><html xmlns=\"http://www.w3.org/1999/xhtml\"><head>\\n<title>V\\xe9\\x94\\xa5\\xe6\\xb5\\x81\\xe9\\x87\\x8f\\xe8\\xae\\xa1\\xe5\\x8e\\x9f\\xe7\\x90\\x86,V\\xe9\\x94\\xa5\\xe5\\x9e\\x8b\\xe6\\xb5\\x81\\xe9\\x87\\x8f\\xe8\\xae\\xa1,\\xe5\\x86\\x85\\xe9\\x94\\xa5\\xe6\\xb5\\x81\\xe9\\x87\\x8f\\xe8\\xae\\xa1-\\xe6\\xb1\\x9f\\xe8\\x8b\\x8f&#28909;&#21338;RB88&#20307;&#32946;\\xe4\\xbb\\xaa\\xe8\\xa1\\xa8\\xe6\\x9c\\x89\\xe9\\x99\\x90\\xe5\\x85\\xac\\xe5\\x8f\\xb8</title>\\n<link '\n",
+      "url: http://agenciahabitatge.gencat.cat/wps/portal/serveis/convenis%20i%20contractacio/!ut/p/z0/fc09D4IwEAbgvwIDo7kLFoNjowlCIDFOtYspTYUqaflo0J9vSVjxtsu9z3vAgQE3YtaNcNoa0fn9zg-P9HrMLimJS7wVBGmSn7M8JXtEAgVwH8CNobg0xGN1qhrgvXDtTpunBfZR9Ur_dHuqX8PAKXBpjVNfB0y0UooIvY9wUuOs9BShv87K6CnQwRIchXRCarvxe2XAtlj_5nXSzSWlYfgD96k18Q!!/\n",
+      "content-type: text/html; charset=UTF-8\n",
+      "content: b'<!DOCTYPE html>\\n<html lang=\"en\" >\\n<head>\\n<!-- Google Tag Manager -->\\n<script>(function(w,d,s,l,i){w[l]=w[l]||[];w[l].push({\\'gtm.start\\':\\nnew Date().getTime(),event:\\'gtm.js\\'});var f=d.getElementsByTagNa'\n",
+      "url: http://agnenterprises.com/product/optical-bench-metal-double-rod-agn-make-s-s-rod-1-5meter-long-half-shaper-riders/\n",
+      "content-type: text/html; charset=UTF-8\n",
+      "content: b'<!DOCTYPE html>\\n<html lang=\"en-US\">\\n<head>\\n\\t<meta charset=\"UTF-8\">\\n\\t<link rel=\"profile\" href=\"http://gmpg.org/xfn/11\">\\n\\t<link rel=\"pingback\" href=\"http://agnenterprises.com/xmlrpc.php\">\\n\\n\\t\\t\\t<script>wi'\n",
+      "url: http://agorapatos.com/2020/04/14/coronavirus-camara-dos-deputados-aprova-apoio-a-estados-e-municipios/\n",
+      "content-type: text/html; charset=UTF-8\n",
+      "content: b'<!doctype html>\\n<html lang=\"pt-BR\">\\n<head>\\n\\t<meta charset=\"UTF-8\">\\n\\t<meta name=\"viewport\" content=\"width=device-width, initial-scale=1, shrink-to-fit=no\">\\n\\t<link rel=\"profile\" href=\"https://gmpg.org/x'\n",
+      "url: http://agro-product.ru/index.php?option=com_adsmanager&page=show_category&catid=114&order=0&expand=0&Itemid=39\n",
+      "content-type: text/html\n",
+      "content: b'<!DOCTYPE html PUBLIC \"-//W3C//DTD XHTML 1.0 Transitional//EN\" \\n\"http://www.w3.org/TR/xhtml1/DTD/xhtml1-transitional.dtd\">\\n<html xmlns=\"http://www.w3.org/1999/xhtml\" lang=\"ru\" \\nxml:lang=\"ru\"\\n<head>\\n<m'\n",
+      "url: http://ahirukacho.blog81.fc2.com/?mode=edit&rno=5230\n",
+      "content-type: text/html;charset=utf-8\n",
+      "content: b'<?xml version=\"1.0\" encoding=\"EUC-JP\"?>\\r\\n<!DOCTYPE html PUBLIC \"-//W3C//DTD XHTML 1.0 Transitional//EN\" \"http://www.w3.org/TR/xhtml1/DTD/xhtml1-transitional.dtd\">\\r\\n<html xmlns=\"http://www.w3.org/1999/'\n",
+      "url: http://ahshfshygzc.com/shfw/hyindex.xp?doAction=news&menuid=0045\n",
+      "content-type: text/html;charset=GBK\n",
+      "content: b'<!DOCTYPE html PUBLIC \"-//W3C//DTD XHTML 1.0 Transitional//EN\" \"http://www.w3.org/TR/xhtml1/DTD/xhtml1-transitional.dtd\">\\r\\n<html xmlns=\"http://www.w3.org/1999/xhtml\">\\r\\n<head>\\r\\n<meta http-equiv=\"Conten'\n",
+      "url: http://aindahing.info/diary/ain-dah-ing/winter-sale/\n",
+      "content-type: text/html; charset=UTF-8\n",
+      "content: b'\\n<!DOCTYPE html>\\n<html lang=\"ja\">\\n<head>\\n<!-- Basic Page Needs\\n================================================== -->\\n<meta charset=\"UTF-8\">\\n<meta name=\"viewport\" content=\"width=device-width, initial-'\n",
+      "url: http://airambulanceworld.com/medical-flight/arkansas/strong/\n",
+      "content-type: text/html; charset=UTF-8\n",
+      "content: b'<!DOCTYPE html>\\r\\n<!--[if lt IE 7]>      <html class=\"no-js lt-ie9 lt-ie8 lt-ie7\"> <![endif]-->\\r\\n<!--[if IE 7]>         <html class=\"no-js lt-ie9 lt-ie8\"> <![endif]-->\\r\\n<!--[if IE 8]>         <html cla'\n",
+      "url: http://airstation734.blog.fc2.com/blog-entry-1360.html\n",
+      "content-type: text/html;charset=utf-8\n",
+      "content: b'<!DOCTYPE html PUBLIC \"-//W3C//DTD XHTML 1.0 Transitional//EN\" \"http://www.w3.org/TR/xhtml1/DTD/xhtml1-transitional.dtd\">\\r\\n<html xmlns=\"http://www.w3.org/1999/xhtml\"  >\\r\\n<head>\\r\\n<meta http-equiv=\"Cont'\n",
+      "url: http://airstation734.blog.fc2.com/blog-entry-2141.html\n",
+      "content-type: text/html;charset=utf-8\n",
+      "content: b'<!DOCTYPE html PUBLIC \"-//W3C//DTD XHTML 1.0 Transitional//EN\" \"http://www.w3.org/TR/xhtml1/DTD/xhtml1-transitional.dtd\">\\r\\n<html xmlns=\"http://www.w3.org/1999/xhtml\"  >\\r\\n<head>\\r\\n<meta http-equiv=\"Cont'\n",
+      "url: http://aisaikamasa.blog91.fc2.com/blog-entry-201.html\n",
+      "content-type: text/html;charset=utf-8\n",
+      "content: b'<!DOCTYPE html PUBLIC \"-//W3C//DTD XHTML 1.0 Transitional//EN\" \"http://www.w3.org/TR/xhtml1/DTD/xhtml1-transitional.dtd\">\\r\\n<html xmlns=\"http://www.w3.org/1999/xhtml\" xml:lang=\"ja\" lang=\"ja\">\\r\\n<head>\\r\\n'\n",
+      "url: http://aiuas.cn/En_Ct_index_gci_16.html\n",
+      "content-type: text/html; charset=utf-8\n",
+      "content: b'<!DOCTYPE html><!-- saved from url=(0035)http://www.coldec.cn/index.do?store --><html lang=\"zh-cn\"><head><!--  --><meta http-equiv=\"Content-Type\" content=\"text/html; charset=UTF-8\"><meta name=\"viewpor'\n",
+      "url: http://ajansahiska.com/sayfa/genel-bilgi-surgun-i6.html\n",
+      "content-type: text/html; charset=utf-8\n",
+      "content: b'<!doctype html><html dir=\"ltr\" lang=\"tr\" itemid=\"http://ajansahiska.com/sayfa/genel-bilgi-surgun-i6.html\" itemscope=\"\" itemtype=\"http://schema.org/NewsArticle\" xmlns:og=\"http://opengraphprotocol.org/s'\n",
+      "url: http://ajboudoir.blogspot.com/2010/11/evolution-girls.html\n",
+      "content-type: text/html; charset=UTF-8\n",
+      "content: b\"<!DOCTYPE html>\\n<html class='v2' dir='ltr' lang='en'>\\n<head>\\n<link href='https://www.blogger.com/static/v1/widgets/3566091532-css_bundle_v2.css' rel='stylesheet' type='text/css'/>\\n<meta content='width\"\n",
+      "url: http://ajboudoir.blogspot.com/2013/04/couple-boudoir-photography-stony-plain.html\n",
+      "content-type: text/html; charset=UTF-8\n",
+      "content: b\"<!DOCTYPE html>\\n<html class='v2' dir='ltr' lang='en'>\\n<head>\\n<link href='https://www.blogger.com/static/v1/widgets/3566091532-css_bundle_v2.css' rel='stylesheet' type='text/css'/>\\n<meta content='width\"\n",
+      "url: http://ajstovesphotography.co.uk/albums/Mr---Mrs-Routledge-Wedding/297343/Nikky-and-Neils-wedding-a37\n",
+      "content-type: text/html; charset=utf-8\n",
+      "content: b'\\r\\n\\r\\n<!DOCTYPE html>\\r\\n<html lang=\"en-us\">\\r\\n<head><title>Nikky and Neils wedding-a37.jpg | Mr & Mrs Routledge Wedding | AJ.Stoves Photography</title>\\r\\n    <!--meta-->\\r\\n    <meta name=\"viewport\" content='\n",
+      "url: http://akabane.cocolog-nifty.com/hotcafe/2013/12/post-c92f.html\n",
+      "content-type: text/html; charset=UTF-8\n",
+      "content: b'<!DOCTYPE html PUBLIC \"-//W3C//DTD XHTML 1.0 Transitional//EN\"\\n\\t\"http://www.w3.org/TR/xhtml1/DTD/xhtml1-transitional.dtd\">\\n<html xmlns=\"http://www.w3.org/1999/xhtml\" id=\"sixapart-standard\">\\n<head>\\n\\t\\n\\t'\n",
+      "url: http://akihadai.ed.jp/akihadai/news/14298/\n",
+      "content-type: text/html; charset=UTF-8\n",
+      "content: b'<!DOCTYPE html>\\r\\n<html lang=\"ja\">\\r\\n\\r\\n\\r\\n<head>\\r\\n<!-- Google Tag Manager -->\\r\\n<script>(function(w,d,s,l,i){w[l]=w[l]||[];w[l].push({\\'gtm.start\\':\\r\\nnew Date().getTime(),event:\\'gtm.js\\'});var f=d.getElement'\n",
+      "url: http://akorda.info/kz/executive_office/schedule\n",
+      "content-type: text/html; charset=UTF-8\n",
+      "content: b'<!DOCTYPE html>\\n<html lang=\"kz\">\\n<head>\\n  <meta charset=\"UTF-8\">\\n  <meta name=\"viewport\" content=\"width=device-width, initial-scale=1, shrink-to-fit=no\">\\n  <meta http-equiv=\"x-ua-compatible\" content=\"'\n",
+      "url: http://akshskzzzx.yizhumao.com/ProductDetail/7273123.html\n",
+      "content-type: text/html;charset=utf-8\n",
+      "content: b'<!-- cache for /ProductDetail/7273123.html 2024-02-20 23:36:14-->\\r\\n<!DOCTYPE html>\\n<html><!--PHP-->\\n<head>\\n\\t<title>\\xe4\\xba\\xac\\xe4\\xb8\\x9c\\xe5\\xa4\\xa7\\xe8\\x8d\\xaf\\xe6\\x88\\xbf\\xe5\\x85\\xa5\\xe9\\xa9\\xbb\\xe5\\x85\\xa5\\xe5\\x8f\\xa3,\\xe4\\xba\\xac\\xe4\\xb8\\x9c\\xe5\\x8d\\x83\\xe5\\xb1\\xb1\\xe5\\x81\\xa5\\xe5\\xba\\xb7\\xe5\\x85\\xa5\\xe9\\xa9\\xbb</title>\\n\\t<meta name=\"keywords\" c'\n",
+      "url: http://alanandrews.net/live/zuqiu/22386.html\n",
+      "content-type: text/html; charset=utf-8\n",
+      "content: b'\\r\\n<!doctype html>\\r\\n<html>\\r\\n<head>\\r\\n<meta charset=\"utf-8\">\\r\\n<meta http-equiv=\"X-UA-Compatible\" content=\"IE=edge\" >\\r\\n<meta name=\"renderer\" content=\"webkit\">\\r\\n<title>2024\\xe5\\xb9\\xb402\\xe6\\x9c\\x8804\\xe6\\x97\\xa5\\xe6\\x98\\x9f\\xe6\\x9c\\x9f\\xe6\\x97\\xa5 \\xe8\\xb4\\xb9\\xe8\\x90\\xa8\\xe9\\x87\\x8c\\xe5'\n",
+      "url: http://alasdairstuart.com/tag/paseudopod/\n",
+      "content-type: text/html; charset=UTF-8\n",
+      "content: b'<!DOCTYPE html>\\r\\n<html lang=\"en-US\">\\r\\n<head>\\r\\n<meta charset=\"UTF-8\">\\r\\n<meta name=\"viewport\" content=\"width=device-width, initial-scale=1\">\\r\\n\\t <link rel=\"profile\" href=\"https://gmpg.org/xfn/11\"> \\r\\n\\t <m'\n",
+      "url: http://alba.selitondemo.ro/product/405/rochie-pentru-corp-sculpturat.html\n",
+      "content-type: text/html; charset=utf-8\n",
+      "content: b'<!DOCTYPE html>\\n<html lang=\"ro\">\\n<head>\\n\\t<meta http-equiv=\"Content-Type\" content=\"text/html; charset=utf-8\" />\\n<meta http-equiv=\"Content-Script-Type\" content=\"text/javascript\" />\\n<meta http-equiv=\"Con'\n",
+      "url: http://albadoors.ru/internet-magazin/product/dver-magnoliya-lyuks-73l-st-grafit-2000-800\n",
+      "content-type: text/html; charset=utf-8\n",
+      "content: b'\\n        <!doctype html>\\n<html lang=\"ru\">\\n<head>\\n<meta charset=\"utf-8\">\\n<meta name=\"robots\" content=\"all\" />\\n<title>\\xd0\\x9a\\xd1\\x83\\xd0\\xbf\\xd0\\xb8\\xd1\\x82\\xd1\\x8c \\xd0\\xb2 \\xd0\\x9d\\xd0\\xb8\\xd0\\xb6\\xd0\\xbd\\xd0\\xb5 \\xd0\\x9d\\xd0\\xbe\\xd0\\xb2\\xd0\\xb3\\xd0\\xbe\\xd1\\x80\\xd0\\xbe\\xd0\\xb4\\xd0\\xb5 - \\xd0\\x9c\\xd0\\xb5\\xd0\\xb6\\xd0\\xba\\xd0\\xbe\\xd0\\xbc\\xd0\\xbd\\xd0\\xb0\\xd1\\x82\\xd0\\xbd\\xd0\\xb0\\xd1\\x8f \\xd0\\xb4\\xd0\\xb2\\xd0\\xb5\\xd1\\x80\\xd1\\x8c '\n",
+      "url: http://albergueweb1.uva.es/guias/guias2122/55050/1/\n",
+      "content-type: text/html;charset=UTF-8\n",
+      "content: b'<!DOCTYPE HTML PUBLIC \"-//W3C//DTD HTML 3.2 Final//EN\">\\n<html>\\n <head>\\n  <title>Index of /guias/guias2122/55050/1</title>\\n </head>\\n <body>\\n<h1>Index of /guias/guias2122/55050/1</h1>\\n<pre><img src=\"/ic'\n",
+      "url: http://albinism.life/\n",
+      "content-type: text/html; charset=utf-8\n",
+      "content: b'<!doctype html>\\n<html data-adblockkey=\"MFwwDQYJKoZIhvcNAQEBBQADSwAwSAJBANDrp2lz7AOmADaN8tA50LsWcjLFyQFcb/P2Txc58oYOeILb3vBw7J6f4pamkAQVSQuqYsKx3YzdUHCvbVZvFUsCAwEAAQ==_GrU/PDdGnTPi+4NwAyrXdT3uKJnQvoKe'\n",
+      "url: http://alerugby.over-blog.net/tag/carnet%20noir/\n",
+      "content-type: text/html; charset=UTF-8\n",
+      "content: b'<!DOCTYPE html>\\n<html lang=\"fr\">\\n    <head>             \\n  \\n          \\n          \\n                                                                                                    \\n                 '\n",
+      "url: http://alexandravoronina.ru/prinimajte-reshenie-i-dejstvujte-intervyu-s-miloj-kolokolovoj/?replytocom=514\n",
+      "content-type: text/html; charset=utf-8\n",
+      "content: b'<!DOCTYPE html>\\n<html lang=\"ru-RU\" itemscope itemtype=\"http://schema.org/Product\" class=\"no-js\">\\n<head>\\n  <meta charset=\"utf-8\">\\n  <script type=\"text/javascript\">\\n  // <![CDATA[\\n  // < ![CDATA[\\n  var '\n",
+      "url: http://alexboom.de/a-most-humble-year-2021-in-review/\n",
+      "content-type: text/html; charset=UTF-8\n",
+      "content: b'<!DOCTYPE HTML>\\r\\n<html lang=\"de-DE\">\\r\\n<head>\\r\\n\\t<meta charset=\"UTF-8\">\\r\\n\\t<title>A most humble year: 2021 in review &#8211; Der Content mit dem Knalleffekt!</title>\\n<meta name=\\'robots\\' content=\\'max-imag'\n",
+      "url: http://alexnerygravura.blogspot.com/\n",
+      "content-type: text/html; charset=UTF-8\n",
+      "content: b\"<!DOCTYPE html>\\n<html dir='ltr' xmlns='http://www.w3.org/1999/xhtml' xmlns:b='http://www.google.com/2005/gml/b' xmlns:data='http://www.google.com/2005/gml/data' xmlns:expr='http://www.google.com/2005/\"\n",
+      "url: http://alicegame.xsrv.jp/hina/old_log.php?room_no=4742&reverse_log=on&heaven_only=on&icon=on&personal_result=on&time=on&db_no=5\n",
+      "content-type: text/html; charset=UTF-8\n",
+      "content: b'<!DOCTYPE html>\\n<html lang=\"ja\">\\n<head>\\n<meta charset=\"UTF-8\">\\n<title>[4742\\xe7\\x95\\xaa\\xe5\\x9c\\xb0] \\xe3\\x80\\x90\\xe9\\x9b\\x9b4062\\xe3\\x80\\x91\\xe3\\x82\\x84\\xe3\\x82\\x8b\\xe5\\xa4\\xab\\xe9\\x81\\x94\\xe3\\x81\\xae\\xe5\\xb8\\x8c\\xe6\\x9c\\x9b\\xe6\\xb4\\xbe\\xe7\\x94\\x9f\\xe8\\xb6\\x85\\xe9\\x97\\x87 - \\xe6\\xb1\\x9d\\xe3\\x81\\xaf\\xe4\\xba\\xba\\xe7\\x8b\\xbc\\xe3\\x81\\xaa\\xe3\\x82\\x8a\\xe3\\x82\\x84\\xef\\xbc\\x9f[\\xe9\\x81\\x8e\\xe5\\x8e\\xbb\\xe3\\x83\\xad\\xe3\\x82\\xb0]</title>\\n<link rel=\"stylesheet'\n",
+      "url: http://alicegame.xsrv.jp/hina/old_log.php?room_no=985&heaven_talk=on&heaven_only=on&add_role=on&time=on&icon=on&db_no=1\n",
+      "content-type: text/html; charset=UTF-8\n",
+      "content: b'<!DOCTYPE html>\\n<html lang=\"ja\">\\n<head>\\n<meta charset=\"UTF-8\">\\n<title>[985\\xe7\\x95\\xaa\\xe5\\x9c\\xb0] \\xe3\\x80\\x90\\xe9\\x9b\\x9b817\\xe3\\x80\\x91\\xe3\\x82\\x84\\xe3\\x82\\x8b\\xe5\\xa4\\xab\\xe3\\x81\\x9f\\xe3\\x81\\xa1\\xe3\\x81\\xae\\xe8\\xb6\\x85\\xe9\\x97\\x87\\xe9\\x8d\\x8b - \\xe6\\xb1\\x9d\\xe3\\x81\\xaf\\xe4\\xba\\xba\\xe7\\x8b\\xbc\\xe3\\x81\\xaa\\xe3\\x82\\x8a\\xe3\\x82\\x84\\xef\\xbc\\x9f[\\xe9\\x81\\x8e\\xe5\\x8e\\xbb\\xe3\\x83\\xad\\xe3\\x82\\xb0]</title>\\n<link rel=\"stylesheet\" href=\"'\n",
+      "url: http://aliyavaleeva.ru/?attachment_id=4302\n",
+      "content-type: text/html; charset=UTF-8\n",
+      "content: b'<!DOCTYPE html>\\n<html lang=\"ru-RU\">\\n<head>\\n\\t<meta charset=\"UTF-8\" />\\n\\t\\n\\t<title>_J2A0862 | aliyavaleeva.ru</title>\\n\\n\\t\\n\\t\\t\\t\\n\\t\\t\\t\\t\\t\\t<meta name=\"viewport\" content=\"width=device-width,initial-scale=1,user-sc'\n",
+      "url: http://allthink.com/2158292/conserje\n",
+      "content-type: text/html; charset=UTF-8\n",
+      "content: b'<!DOCTYPE html>\\r\\n<html lang=\"en\">\\r\\n<head>\\r\\n<title>Conserje (tt0149969)</title>\\r\\n<meta charset=\"utf-8\" />\\r\\n<meta name=\"viewport\" content=\"width=device-width, initial-scale=1.0\" />\\r\\n<link rel=\"icon\" typ'\n",
+      "url: http://almanaquedasirmandades.gal/efemeride/maruxa-mallo-2036-02-06/\n",
+      "content-type: text/html; charset=UTF-8\n",
+      "content: b'<!DOCTYPE html>\\n<html class=\"no-js\" lang=\"gl-ES\" xmlns:fb=\"https://www.facebook.com/2008/fbml\" xmlns:addthis=\"https://www.addthis.com/help/api-spec\" >\\n<head>\\n<meta charset=\"UTF-8\">\\n<meta name=\"viewpor'\n",
+      "url: http://almaz.zp.ua/product/53857/materinskaya-plata-sfm2-biostar-a58md-bulk-amd-a55.html\n",
+      "content-type: text/html; charset=windows-1251\n",
+      "content: b'<!DOCTYPE HTML PUBLIC \"-//W3C//DTD HTML 4.01 Transitional//EN\" \"http://www.w3.org/TR/html4/loose.dtd\">\\n<html>\\n<head>\\n<title>\\xca\\xf3\\xef\\xe8\\xf2\\xfc \\xcc\\xe0\\xf2\\xe5\\xf0\\xe8\\xed\\xf1\\xea\\xe0\\xff \\xcf\\xeb\\xe0\\xf2\\xe0 sFM2+ Biostar A58MD Bulk AMD A55, 2*DDR3, 4*SATAII,'\n",
+      "url: http://alpacafarmtrivia.herokuapp.com/questions/18791\n",
+      "content-type: text/html; charset=utf-8\n",
+      "content: b'<!DOCTYPE html>\\n<html>\\n<head>\\n  <title>AlpacaFarm</title>\\n  <link rel=\"stylesheet\" media=\"all\" href=\"/assets/application-b12c99378c13cc251766fb6bbdf0395b1c98c9238c81e6ed62689b4091eb9c8a.css\" data-turb'\n",
+      "url: http://alrashadmarine.com/yefm-57102seti\n",
+      "content-type: text/html;charset=UTF-8\n",
+      "content: b'<!DOCTYPE html>\\r\\n<html lang=\"ja\">\\r\\n<head>\\r\\n    <meta charset=\"utf-8\">\\r\\n    <meta content=\"no-cache\" http-equiv=\"Pragma\"/>\\r\\n    <meta content=\"no-store, no-cache, must-revalidate\" http-equiv=\"Cache-Con'\n",
+      "url: http://altawap.ru/forum/index.php?topic=3\n",
+      "content-type: text/html; charset=UTF-8\n",
+      "content: b'<!DOCTYPE html>\\n<html lang=\"en\">\\n<head>\\n<meta charset=\"utf-8\">\\n<meta http-equiv=\"X-UA-Compatible\" content=\"IE=edge\">\\n<meta name=\"viewport\" content=\"width=device-width, initial-scale=1.0, maximum-scale'\n",
+      "url: http://alternatywadlalukowa.pl/question/%D0%B1%D1%80%D0%BE%D1%88%D1%83-%D0%BA%D1%83%D1%80%D0%B8%D1%82%D1%8C-%D0%B8-%D0%BF%D0%B8%D1%82%D1%8C-%D0%BA%D0%B0%D0%B7%D0%B0%D1%87%D0%B5%D0%BD%D0%BA%D0%BE/\n",
+      "content-type: text/html; charset=UTF-8\n",
+      "content: b'<!DOCTYPE html>\\n<html lang=\"pl-PL\">\\n\\n<head>\\n\\t<meta charset=\\'UTF-8\\'>\\n\\t<meta name=\"viewport\" content=\"width=device-width, initial-scale=1\">\\n\\t<link rel=\"profile\" href=\"http://gmpg.org/xfn/11\">\\n\\t\\t<title>\\xd0'\n",
+      "url: http://alternatywadlalukowa.pl/question/%D0%BA%D0%BE%D0%BB%D0%BC%D0%B5-%D1%86%D0%B5%D0%BD%D0%B0-%D0%B8-%D0%B8%D0%BD%D1%81%D1%82%D1%80%D1%83%D0%BA%D1%86%D0%B8%D1%8F-%D0%BF%D0%BE-%D0%BF%D1%80%D0%B8%D0%BC%D0%B5%D0%BD%D0%B5%D0%BD%D0%B8%D1%8E/\n",
+      "content-type: text/html; charset=UTF-8\n",
+      "content: b'<!DOCTYPE html>\\n<html lang=\"pl-PL\">\\n\\n<head>\\n\\t<meta charset=\\'UTF-8\\'>\\n\\t<meta name=\"viewport\" content=\"width=device-width, initial-scale=1\">\\n\\t<link rel=\"profile\" href=\"http://gmpg.org/xfn/11\">\\n\\t\\t<title>\\xd0'\n",
+      "url: http://alternatywadlalukowa.pl/question/piano-di-dieta-vegan-detox-5-giorni/\n",
+      "content-type: text/html; charset=UTF-8\n",
+      "content: b'<!DOCTYPE html>\\n<html lang=\"pl-PL\">\\n\\n<head>\\n\\t<meta charset=\\'UTF-8\\'>\\n\\t<meta name=\"viewport\" content=\"width=device-width, initial-scale=1\">\\n\\t<link rel=\"profile\" href=\"http://gmpg.org/xfn/11\">\\n\\t\\t<title>P'\n",
+      "url: http://amamiguide.main.jp/buyer101/breastfeeding-stylish-pajamas-301964.html\n",
+      "content-type: text/html; charset=UTF-8\n",
+      "content: b'\\n\\n\\n\\n\\n\\n\\n\\n\\n\\n\\n'\n",
+      "url: http://amamiguide.main.jp/cateogry59/jade-3rd-row-175365.html\n",
+      "content-type: text/html; charset=UTF-8\n",
+      "content: b'\\n\\n\\n\\n\\n\\n\\n\\n\\n\\n\\n'\n",
+      "url: http://amamiguide.main.jp/module15/panasonic-switch-initialization-42260.html\n",
+      "content-type: text/html; charset=UTF-8\n",
+      "content: b'\\n\\n\\n\\n\\n\\n\\n\\n\\n\\n\\n'\n",
+      "url: http://amazonastotal.com.br/marca-de-beleza-cria-kits-de-presentes-para-brincadeiras-de-amigo-secreto/\n",
+      "content-type: text/html; charset=UTF-8\n",
+      "content: b'<!DOCTYPE html>\\r\\n<html lang=\"pt-BR\" xmlns:fb=\"https://www.facebook.com/2008/fbml\" xmlns:addthis=\"https://www.addthis.com/help/api-spec\" >\\r\\n<head>\\r\\n\\t\\t\\t<meta charset=\"UTF-8\" />\\r\\n\\t\\t<meta name=\"viewport\" '\n",
+      "url: http://amenohitorigoto.blog.fc2.com/blog-entry-3492.html\n",
+      "content-type: text/html;charset=utf-8\n",
+      "content: b'<!DOCTYPE html>\\r\\n<html >\\r\\n<head>\\r\\n<meta charset=\"utf-8\">\\r\\n<meta name=\"author\" content=\"\\xe3\\x81\\xaf\\xe3\\x82\\x8b\" />\\r\\n<meta name=\"description\" content=\"\\xe3\\x82\\xa2\\xe3\\x83\\xa1\\xe3\\x82\\xb7\\xe3\\x83\\xa7\\xe2\\x99\\x80\\xe3\\x81\\xae*\\xe3\\x81\\x82\\xe3\\x82\\x81* \\xef\\xbc\\x86 \\xe4\\xb8\\x89\\xe6\\xaf\\x9b\\xe7\\x8c\\xab\\xe3\\x83\\xaa\\xe3\\x83\\xaa\\xe3\\x83\\xa9\\xe3\\x83\\xa9 \\xe3\\x81\\xa8 \\xe9\\xa3\\xbc\\xe3\\x81\\x84\\xe4\\xb8\\xbb\\xe3'\n",
+      "url: http://amis.zoo-logique.org/forum/index.php?sid=afab564828c7d94b8cb36ac01d136333\n",
+      "content-type: text/html\n",
+      "content: b'<!DOCTYPE HTML PUBLIC \"-//W3C//DTD HTML 4.01 Transitional//EN\">\\n<html dir=\"LTR\">\\n<head>\\n<meta http-equiv=\"Content-Type\" content=\"text/html; charset=ISO-8859-1\">\\n<meta http-equiv=\"Content-Style-Type\" c'\n",
+      "url: http://amorebello.blogspot.com/2005/11/pictures-galore.html\n",
+      "content-type: text/html; charset=UTF-8\n",
+      "content: b\"<!DOCTYPE html>\\n<html class='v2' dir='ltr' lang='en-US'>\\n<head>\\n<link href='https://www.blogger.com/static/v1/widgets/3566091532-css_bundle_v2.css' rel='stylesheet' type='text/css'/>\\n<meta content='wi\"\n",
+      "url: http://analforum.net/viewforum.php?f=6&start=39750\n",
+      "content-type: text/html; charset=UTF-8\n",
+      "content: b'<!DOCTYPE html PUBLIC \"-//W3C//DTD XHTML 1.0 Strict//EN\" \"http://www.w3.org/TR/xhtml1/DTD/xhtml1-strict.dtd\">\\r\\n<html xmlns=\"http://www.w3.org/1999/xhtml\" dir=\"ltr\" lang=\"en-gb\" xml:lang=\"en-gb\">\\r\\n<hea'\n",
+      "url: http://analogical-dictionary.sensagent.com/ma214266/ML-en-en/\n",
+      "content-type: text/html;charset=UTF-8\n",
+      "content: b'\\r\\n\\r\\n\\r\\n\\r\\n\\r\\n\\r\\n\\r\\n\\r\\n\\r\\n\\r\\n\\r\\n\\r\\n\\r\\n\\r\\n\\r\\n\\r\\n\\r\\n\\r\\n\\r\\n\\r\\n\\r\\n\\r\\n\\r\\n\\r\\n\\r\\n\\r\\n\\r\\n\\r\\n\\r\\n\\r\\n<!DOCTYPE HTML PUBLIC \"-//W3C//DTD HTML 4.01 Transitional//EN\" \"http://www.w3.org/TR/html4/loose.dtd\">\\r\\n<html version=\"-//w3c//dtd html 4.01'\n",
+      "url: http://ando-travel.com.ua/preuve-en-tenant-bad-comme-avis-sur-un-blog-en/\n",
+      "content-type: text/html; charset=UTF-8\n",
+      "content: b'<!DOCTYPE html>\\n\\n<html lang=\"ru-RU\">\\n\\n<head itemscope=\"itemscope\" itemtype=\"https://schema.org/WebSite\" >\\n\\t<meta charset=\"UTF-8\">\\n\\t<meta name=\"viewport\" content=\"width=device-width, initial-scale=1\">\\n'\n",
+      "url: http://anearful.blogspot.com/2017/03/collapsing-into-nordic-affects.html\n",
+      "content-type: text/html; charset=UTF-8\n",
+      "content: b\"<!DOCTYPE html>\\n<html dir='ltr' xmlns='http://www.w3.org/1999/xhtml' xmlns:b='http://www.google.com/2005/gml/b' xmlns:data='http://www.google.com/2005/gml/data' xmlns:expr='http://www.google.com/2005/\"\n",
+      "url: http://animaths.com/tag/car-hire-usa-age/\n",
+      "content-type: text/html; charset=UTF-8\n",
+      "content: b'<!DOCTYPE html>\\n<html lang=\"en-US\">\\n<head>\\n\\t<meta charset=\"UTF-8\" />\\n\\t<meta name=\"viewport\" content=\"width=device-width, initial-scale=1\" />\\n<meta name=\\'robots\\' content=\\'max-image-preview:large\\' />\\n<t'\n",
+      "url: http://ankaemlak.com.tr/emlak/agaoglu-my-world-europe-satilik-2plus1-daire/\n",
+      "content-type: text/html; charset=UTF-8\n",
+      "content: b'<!DOCTYPE html>\\r\\n<html lang=\"tr\">\\r\\n<head>\\r\\n<meta charset=\"UTF-8\" />\\r\\n<meta name=\"viewport\" content=\"width=device-width, initial-scale=1,user-scalable=no\">\\r\\n<link rel=\"pingback\" href=\"http://ankaemlak.'\n",
+      "url: http://anmedio.pl/\n",
+      "content-type: text/html; charset=UTF-8\n",
+      "content: b'<!DOCTYPE html>\\r\\n\\r\\n<html lang=\"pl-PL\" class=\"no-js\">\\r\\n<head>\\r\\n\\t\\r\\n\\t<meta charset=\"UTF-8\">\\r\\n\\t\\r\\n\\t<meta name=\"viewport\" content=\"width=device-width, initial-scale=1, maximum-scale=1, user-scalable=0\" /><t'\n",
+      "url: http://annapurnapharmacy.com/drug/7406-aktive-sacro-lumbar-support-xxl\n",
+      "content-type: text/html; charset=UTF-8\n",
+      "content: b'<!DOCTYPE html>\\n<html lang=\"en\">\\n<head>\\n    <meta charset=\"utf-8\">\\n    <meta http-equiv=\"X-UA-Compatible\" content=\"IE=edge\">\\n    <meta name=\"viewport\" content=\"width=device-width, initial-scale=1\">\\n  '\n",
+      "url: http://annesnyder.org/2014/06/30/canaries-in-the-cultural-coal-mine/caneries-in-the-cultural-coalmine/\n",
+      "content-type: text/html; charset=UTF-8\n",
+      "content: b'<!DOCTYPE html>\\r\\n<html lang=\"en-US\">\\r\\n<head>\\r\\n\\r\\n\\t<meta charset=\"UTF-8\">\\r\\n\\t<meta http-equiv=\"X-UA-Compatible\" content=\"IE=edge\">\\r\\n\\t<meta name=\"viewport\" content=\"width=device-width, initial-scale=1\">\\r\\n'\n",
+      "url: http://anoia.pigaim.cat/manual/de/howto/htaccess.html\n",
+      "content-type: text/html\n",
+      "content: b'<?xml version=\"1.0\" encoding=\"ISO-8859-1\"?>\\n<!DOCTYPE html PUBLIC \"-//W3C//DTD XHTML 1.0 Strict//EN\" \"http://www.w3.org/TR/xhtml1/DTD/xhtml1-strict.dtd\">\\n<html xmlns=\"http://www.w3.org/1999/xhtml\" lan'\n",
+      "url: http://anokovcheg.ru/%D0%BF%D1%80%D0%BE%D0%B3%D1%80%D0%B0%D0%BC%D0%BC%D1%8B-%D0%BF%D1%80%D0%BE%D0%B5%D0%BA%D1%82%D1%8B-%D0%B0%D0%BA%D1%86%D0%B8%D0%B8/%D0%B0%D0%BA%D1%86%D0%B8%D0%B8-%D0%BF%D0%BE-%D1%81%D0%B1%D0%BE%D1%80%D1%83-%D0%BF%D0%BE%D0%BC%D0%BE%D1%89%D0%B8/\n",
+      "content-type: text/html; charset=UTF-8\n",
+      "content: b'        <!DOCTYPE html>\\n        <html lang=\"ru-RU\">\\n        \\n<head>\\n\\t\\t<meta charset=\"UTF-8\">\\n\\t\\t<meta name=\"viewport\" content=\"width=device-width, initial-scale=1, minimum-scale=1\">\\n\\t\\t<link rel=\"profil'\n",
+      "url: http://anokovcheg.ru/category/%D0%B0%D0%BA%D1%86%D0%B8%D0%B8-%D0%BF%D0%BE-%D1%81%D0%B1%D0%BE%D1%80%D1%83-%D0%BF%D0%BE%D0%BC%D0%BE%D1%89%D0%B8-2/\n",
+      "content-type: text/html; charset=UTF-8\n",
+      "content: b'        <!DOCTYPE html>\\n        <html lang=\"ru-RU\">\\n        \\n<head>\\n\\t\\t<meta charset=\"UTF-8\">\\n\\t\\t<meta name=\"viewport\" content=\"width=device-width, initial-scale=1, minimum-scale=1\">\\n\\t\\t<link rel=\"profil'\n",
+      "url: http://anoreksja.org.pl/viewtopic.php?f=17&p=2535590&sid=8b5b01432fb4334c6fa2170f66894cef\n",
+      "content-type: text/html; charset=UTF-8\n",
+      "content: b'<!DOCTYPE html PUBLIC \"-//W3C//DTD XHTML 1.0 Transitional//EN\" \"http://www.w3.org/TR/xhtml1/DTD/xhtml1-transitional.dtd\">\\n<html xmlns=\"http://www.w3.org/1999/xhtml\" dir=\"ltr\" lang=\"pl-PL\" xml:lang=\"pl'\n",
+      "url: http://anosaka.blog.fc2.com/blog-entry-309.html\n",
+      "content-type: text/html;charset=utf-8\n",
+      "content: b'<?xml version=\"1.0\" encoding=\"utf-8\"?>\\r\\n<!DOCTYPE html PUBLIC \"-//W3C//DTD XHTML 1.0 Transitional//EN\" \"http://www.w3.org/TR/xhtml1/DTD/xhtml1-transitional.dtd\">\\r\\n<html xmlns=\"http://www.w3.org/1999/x'\n",
+      "url: http://another-place.cocolog-nifty.com/field/2012/10/post-0bab.html\n",
+      "content-type: text/html; charset=UTF-8\n",
+      "content: b'<!DOCTYPE html PUBLIC \"-//W3C//DTD XHTML 1.0 Transitional//EN\" \"http://www.w3.org/TR/xhtml1/DTD/xhtml1-transitional.dtd\">\\n\\n<html xmlns=\"http://www.w3.org/1999/xhtml\">\\n<head>\\n<meta http-equiv=\"Content-'\n",
+      "url: http://anticult.minibird.jp/cgi/cgi09/light.cgi?res=7806\n",
+      "content-type: text/html\n",
+      "content: b'<!DOCTYPE HTML PUBLIC \"-//W3C//DTD HTML 4.01 Transitional//EN\">\\n<html lang=\"ja\">\\n<head>\\n<meta http-equiv=\"content-type\" content=\"text/html; charset=shift_jis\">\\n<meta http-equiv=\"content-script-type\" c'\n",
+      "url: http://antoinepoulain.com/html/work/cinema/shade/shade_09.htm\n",
+      "content-type: text/html\n",
+      "content: b'<!DOCTYPE HTML PUBLIC \"-//W3C//DTD HTML 4.01 Transitional//EN\"\\n\"http://www.w3.org/TR/html4/loose.dtd\">\\n<html><!-- InstanceBegin template=\"/Templates/base_noire.dwt\" codeOutsideHTMLIsLocked=\"false\" -->'\n",
+      "url: http://antoninosaggio.blogspot.com/2010/12/convegno-marcello-piacentini-161718.html\n",
+      "content-type: text/html; charset=UTF-8\n",
+      "content: b\"<!DOCTYPE html>\\n<html class='v2' dir='ltr' lang='en-GB'>\\n<head>\\n<link href='https://www.blogger.com/static/v1/widgets/3566091532-css_bundle_v2.css' rel='stylesheet' type='text/css'/>\\n<meta content='wi\"\n",
+      "url: http://aozemi.blog.fc2.com/blog-category-465.html\n",
+      "content-type: text/html;charset=utf-8\n",
+      "content: b'<!DOCTYPE HTML PUBLIC \"-//W3C//DTD HTML 4.01 Transitional//EN\" \"http://www.w3.org/TR/html4/loose.dtd\">\\r\\n<html>\\r\\n<head>\\r\\n\\r\\n<script type=\"text/x-mathjax-config\">\\r\\n  MathJax.Hub.Config({ tex2jax: { inlin'\n",
+      "url: http://aperos-musique-blesle.com/gyslain/\n",
+      "content-type: text/html; charset=UTF-8\n",
+      "content: b'<!DOCTYPE html>\\r\\n<!--[if IE 7]>\\r\\n<html class=\"ie ie7\" dir=\"ltr\" lang=\"fr-FR\" prefix=\"og: https://ogp.me/ns#\">\\r\\n<![endif]-->\\r\\n<!--[if IE 8]>\\r\\n<html class=\"ie ie8\" dir=\"ltr\" lang=\"fr-FR\" prefix=\"og: htt'\n",
+      "url: http://apiros.hu/2-fok-magas-vrnyoms-kezels-s-tpllkozs-826257.php\n",
+      "content-type: text/html; charset=UTF-8\n",
+      "content: b'<!DOCTYPE html>\\r\\n<html prefix=\"og: http://ogp.me/ns#\">\\r\\n<head>\\r\\n<meta http-equiv=\"content-type\" content=\"text/html; charset=UTF-8\">\\r\\n<meta charset=\"UTF-8\">\\r\\n<meta name=\"viewport\" content=\"width=device'\n",
+      "url: http://apocalypseblogger.apocalypseradio.com/2019/02/apocalypse-radio-five-hundred-and_17.html\n",
+      "content-type: text/html; charset=UTF-8\n",
+      "content: b'<!DOCTYPE html PUBLIC \"-//W3C//DTD XHTML 1.0 Strict//EN\" \"http://www.w3.org/TR/xhtml1/DTD/xhtml1-strict.dtd\">\\n\\n \\n\\n<html xmlns=\"http://www.w3.org/1999/xhtml\" xml:lang=\"en\" lang=\"en\">\\n\\n<head> \\n\\n  <title'\n",
+      "url: http://apolloonline.ru/events/blagotvoritelnaya-baraholka/attachment/1619533193_875138_75/\n",
+      "content-type: text/html; charset=UTF-8\n",
+      "content: b'<!DOCTYPE html>\\n<html>\\n<head>\\n    <meta charset=\"UTF-8\" />\\n    <meta http-equiv=\"X-UA-Compatible\" content=\"\">\\n    <meta name=\"viewport\" content=\"width=device-width, user-scalable=no\">\\n    <link rel=\"s'\n",
+      "url: http://app.cm-pontadelgada.pt/895?geo_article_id=7300&list_of=nearby_list&page_articles=1&page_nearby_list=240&page_opinion=1&page_related=1&page_suggestions=180\n",
+      "content-type: text/html; charset=utf-8\n",
+      "content: b'<!DOCTYPE HTML>\\n<html lang=\"pt-PT\">\\n\\n<head>\\n  <title>Casa da M\\xc3\\xa3e de Deus | Visit Ponta Delgada</title>\\n  <link rel=\"stylesheet\" type=\"text/css\" href=\"/assets/wm-smile/stylesheets/frontoffice/mandator'\n",
+      "url: http://aquadina.com/hakone/category/19216/\n",
+      "content-type: text/html\n",
+      "content: b'<!DOCTYPE html>\\n<html lang=\"ja\">\\n<head>\\n<meta charset=\"utf-8\">\\n<title>\\xe4\\xbb\\x99\\xe7\\x9f\\xb3\\xe5\\x8e\\x9f\\xe3\\x81\\xae\\xe3\\x81\\x82\\xe3\\x82\\x93\\xe3\\x81\\xbf\\xe3\\x81\\xa4\\xef\\xbc\\x88\\xe5\\x85\\xa8\\xe5\\xb8\\xad\\xe7\\xa6\\x81\\xe7\\x85\\x99\\xe3\\x83\\xbb\\xe5\\x88\\x86\\xe7\\x85\\x99\\xef\\xbc\\x89\\xef\\xbc\\x881\\xe4\\xbb\\xb6\\xef\\xbc\\x89 [\\xe3\\x82\\xa2\\xe3\\x82\\xaf\\xe3\\x82\\xa2\\xe3\\x83\\x87\\xe3\\x82\\xa3\\xe3\\x83\\xbc\\xe3\\x83\\x8a\\xe7\\xae\\xb1\\xe6\\xa0\\xb9\\xe7\\x89\\x88]</title><meta name=\"description\" con'\n",
+      "url: http://aquanaut.com/bin/trg/aquanaut.com/clubs/DSA\n",
+      "content-type: text/html; charset=\"utf-8\"\n",
+      "content: b'<!doctype html>\\n<html>\\n<head>\\n<title>Dive Station Aquaventure Sdn Bhd</title>\\n<meta name=\"keywords\" content=\"Aquanaut, dive club, Dive Station Aquaventure Sdn Bhd\">\\n<meta name=\"description\" content=\"A'\n",
+      "url: http://araki298.blog109.fc2.com/blog-entry-1321.html\n",
+      "content-type: text/html;charset=utf-8\n",
+      "content: b'<?xml version=\"1.0\" encoding=\"EUC-JP\"?>\\r\\n<!DOCTYPE html PUBLIC \"-//W3C//DTD XHTML 1.0 Transitional//EN\" \"http://www.w3.org/TR/xhtml1/DTD/xhtml1-transitional.dtd\">\\r\\n<html xmlns=\"http://www.w3.org/1999/'\n",
+      "url: http://arch-group.org/projects/123\n",
+      "content-type: text/html; charset=utf-8\n",
+      "content: b'<!DOCTYPE html>\\n<html lang=\\'en\\'>\\n<head>\\n<meta charset=\\'utf-8\\'>\\n<meta content=\\'IE=Edge,chrome=1\\' http-equiv=\\'X-UA-Compatible\\'>\\n<meta content=\\'width=device-width\\' name=\\'viewport\\'>\\n<link href=\"/favicon.p'\n",
+      "url: http://architekciplus.pl/archwiel19.html\n",
+      "content-type: text/html\n",
+      "content: b'<!DOCTYPE html>\\r\\n<html class=\"no-js\" lang=\"en\">\\r\\n<head>\\r\\n<meta charset=\"utf-8\">\\r\\n<!--[if IE]> <meta http-equiv=\"X-UA-Compatible\" content=\"IE=edge\"> <![endif]-->\\r\\n<title>ARCHITEKCIplus</title>\\r\\n<meta n'\n",
+      "url: http://architektura.info.pl/2021/09/17/wanna-symbol-komfortu-w-twojej-lazience/\n",
+      "content-type: text/html; charset=UTF-8\n",
+      "content: b'<!DOCTYPE html><html lang=\"pl-PL\"><head><meta charset=\"UTF-8\"><meta name=\"viewport\" content=\"width=device-width, initial-scale=1\"><link rel=\"stylesheet\" media=\"print\" onload=\"this.onload=null;this.med'\n",
+      "url: http://archive.poppytalk.com/2011/10/art-tutorial-drink-up-these-kitchen.html?showComment=1320577585029\n",
+      "content-type: text/html; charset=UTF-8\n",
+      "content: b\"<!DOCTYPE html>\\n<html class='v2 item' dir='ltr' xmlns='http://www.w3.org/1999/xhtml' xmlns:b='http://www.google.com/2005/gml/b' xmlns:data='http://www.google.com/2005/gml/data' xmlns:expr='http://www.\"\n",
+      "url: http://archive.poppytalk.com/2012/02/6-fall-2012-fashion-week-must-haves.html?showComment=1329509673641\n",
+      "content-type: text/html; charset=UTF-8\n",
+      "content: b\"<!DOCTYPE html>\\n<html class='v2 item' dir='ltr' xmlns='http://www.w3.org/1999/xhtml' xmlns:b='http://www.google.com/2005/gml/b' xmlns:data='http://www.google.com/2005/gml/data' xmlns:expr='http://www.\"\n",
+      "url: http://archive.urbc.ru/3738-post3738.html\n",
+      "content-type: text/html\n",
+      "content: b'<!DOCTYPE html>\\r\\n<html lang=\"ru\">\\r\\n<head>\\r\\n<meta http-equiv=\"Content-Type\" content=\"text/html; charset=windows-1251\" />\\r\\n<title>\\xc5\\xea\\xe0\\xf2\\xe5\\xf0\\xe8\\xed\\xe1\\xf3\\xf0\\xe3\\xf1\\xea\\xe0\\xff \\xf4\\xe8\\xf0\\xec\\xe0 \\xab\\xca\\xee\\xed\\xf4\\xe8\\xbb \\xe2\\xee\\xe7\\xe3\\xeb\\xe0\\xe2\\xe8\\xf2 \\xed\\xee\\xe2\\xf3\\xfe \\xf0\\xee\\xf1\\xf1\\xe8\\xe9\\xf1\\xea\\xf3\\xfe \\xea\\xee\\xed\\xe4\\xe8\\xf2\\xe5\\xf0\\xf1\\xea\\xf3\\xfe \\xea'\n",
+      "url: http://archive.urbc.ru/4247-post4247.html\n",
+      "content-type: text/html\n",
+      "content: b'<!DOCTYPE html>\\r\\n<html lang=\"ru\">\\r\\n<head>\\r\\n<meta http-equiv=\"Content-Type\" content=\"text/html; charset=windows-1251\" />\\r\\n<title>\\xcf\\xe0\\xec\\xff\\xf2\\xed\\xfb\\xe5 \\xe4\\xe0\\xf2\\xfb &raquo; \\xc8\\xed\\xf4\\xee\\xf0\\xec\\xe0\\xf6\\xe8\\xee\\xed\\xed\\xee-\\xe0\\xed\\xe0\\xeb\\xe8\\xf2\\xe8\\xf7\\xe5\\xf1\\xea\\xee\\xe5 \\xe0\\xe3\\xe5\\xed\\xf2\\xf1\\xf2\\xe2\\xee \\xab\\xd3\\xf0\\xe0\\xeb\\xc1\\xe8\\xe7\\xed\\xe5\\xf1\\xca'\n",
+      "url: http://archive.wn.com/2004/01/02/1400/employment.html\n",
+      "content-type: text/html\n",
+      "content: b'<table border=\"0\" bgcolor=\"#ffffff\" cellpadding=\"4\" cellspacing=\"0\" width=\"100%\" color=\"#ffffff\"><tr><td><table border=\"0\" bgcolor=\"#d0d0d0\" cellpadding=\"2\" cellspacing=\"2\" width=\"100%\" color=\"#d0d0d0'\n",
+      "url: http://archive2016.muenchener-biennale.de/en/about-us/presenter/\n",
+      "content-type: text/html; charset=utf-8\n",
+      "content: b'\\n\\n\\n\\n\\n\\n\\t\\t\\n    \\n            \\n            \\n        \\n<!DOCTYPE html>\\n<!--[if IE 8]> \\t       <html class=\"no-js lt-ie9\" lang=\"en-GB\" > <![endif]-->\\n<!--[if gt IE 8]><!--> <html class=\"no-js\" lang=\"en-GB\" >'\n",
+      "url: http://archiwum.ciop.pl/20641.html\n",
+      "content-type: text/html; charset=iso-8859-2\n",
+      "content: b'<HTML>\\n<HEAD>\\n<META HTTP-EQUIV=Content-type CONTENT=\\'text/html; charset=iso-8859-2\\'>\\n<META NAME=\"keywords\" CONTENT=\"bhp, ha\\xb3as, noise control, konferencja, referaty\">\\n<META NAME=\"description\" CONTENT='\n",
+      "url: http://archsa.org/wp-content/uploads/2022/06/Nebraska_bishops.pdf\n",
+      "content-type: application/pdf\n",
+      "content: b'%PDF-1.5\\r%\\xe2\\xe3\\xcf\\xd3\\r\\n24 0 obj\\r<</Linearized 1/L 17959/O 26/E 7094/N 3/T 17645/H [ 473 178]>>\\rendobj\\r                   \\r\\n37 0 obj\\r<</DecodeParms<</Columns 4/Predictor 12>>/Filter/FlateDecode/ID[<5B43B2E217'\n",
+      "url: http://arcorusticon.com/\n",
+      "content-type: text/html\n",
+      "content: b'<!DOCTYPE html>\\n<!--[if lt IE 7]>      <html class=\"no-js lt-ie9 lt-ie8 lt-ie7\"> <![endif]-->\\n<!--[if IE 7]>         <html class=\"no-js lt-ie9 lt-ie8\"> <![endif]-->\\n<!--[if IE 8]>         <html class='\n",
+      "url: http://areso.eus/2016/03/page/3/\n",
+      "content-type: text/html; charset=UTF-8\n",
+      "content: b'<!doctype html>\\r\\n\\r\\n<!--[if lt IE 7]><html lang=\"eu\" prefix=\"og: http://ogp.me/ns#\" class=\"no-js lt-ie9 lt-ie8 lt-ie7\"><![endif]-->\\r\\n<!--[if (IE 7)&!(IEMobile)]><html lang=\"eu\" prefix=\"og: http://ogp.m'\n",
+      "url: http://argentina-anime.com/Tema-Fate-stay-night-Unlimited-Blade-Works--3994\n",
+      "content-type: text/html; charset=UTF-8\n",
+      "content: b'<!DOCTYPE html PUBLIC \"-//W3C//DTD XHTML 1.0 Transitional//EN\" \"http://www.w3.org/TR/xhtml1/DTD/xhtml1-transitional.dtd\"><!-- start: showthread -->\\n<html xml:lang=\"es\" lang=\"es\" xmlns=\"http://www.w3.o'\n",
+      "url: http://argonauta.pl/tag/przepowiednie/\n",
+      "content-type: text/html; charset=UTF-8\n",
+      "content: b'<!doctype html>\\r\\n<html lang=\"pl-PL\">\\r\\n<head>\\r\\n\\t<meta charset=\"UTF-8\">\\r\\n\\t<meta name=\"viewport\" content=\"width=device-width, initial-scale=1\">\\r\\n\\t<link rel=\"profile\" href=\"https://gmpg.org/xfn/11\">\\r\\n\\r\\n\\t<'\n",
+      "url: http://ariosto.ru/page/937\n",
+      "content-type: text/html; charset=UTF-8\n",
+      "content: b'<!DOCTYPE html PUBLIC \"-//W3C//DTD XHTML 1.0 Transitional//EN\" \"http://www.w3.org/TR/xhtml1/DTD/xhtml1-transitional.dtd\">\\r\\n<html xmlns=\"http://www.w3.org/1999/xhtml\" lang=\"ru-RU\">\\r\\n<head>\\r\\n<!--TB JS -'\n",
+      "url: http://arkadiahurt.pl/291-pedzle-maestro\n",
+      "content-type: text/html; charset=utf-8\n",
+      "content: b'<!DOCTYPE HTML> <!--[if lt IE 7]><html class=\"no-js lt-ie9 lt-ie8 lt-ie7\" lang=\"pl-pl\"><![endif]--> <!--[if IE 7]><html class=\"no-js lt-ie9 lt-ie8 ie7\" lang=\"pl-pl\"><![endif]--> <!--[if IE 8]><html cl'\n",
+      "url: http://arkfurnitures.com/product/tripple/\n",
+      "content-type: text/html; charset=UTF-8\n",
+      "content: b'<!DOCTYPE html>\\n<html lang=\"en\" class=\"no-js\">\\n\\n<head>\\n\\n<meta charset=\"UTF-8\" />\\n<link rel=\"alternate\" hreflang=\"en\" href=\"http://arkfurnitures.com/shop/\"/>\\n<title>TRIPPLE &#8211; ARK FURNITURE</title'\n",
+      "url: http://armorique.blog.fc2.com/blog-entry-4214.html\n",
+      "content-type: text/html;charset=utf-8\n",
+      "content: b'<?xml version=\"1.0\" encoding=\"utf-8\"?><!DOCTYPE html PUBLIC \"-//W3C//DTD XHTML 1.0 Transitional//EN\" \"http://www.w3.org/TR/xhtml1/DTD/xhtml1-transitional.dtd\">\\r\\n<html  dir=\"ltr\" xmlns=\"http://www.w3.o'\n",
+      "url: http://arseniev-eparhia.ru/inocheskiy-postrig/\n",
+      "content-type: text/html; charset=UTF-8\n",
+      "content: b'<!DOCTYPE html PUBLIC \"-//W3C//DTD XHTML 1.0 Transitional//EN\" \"http://www.w3.org/TR/xhtml1/DTD/xhtml1-transitional.dtd\"><html xmlns=\"http://www.w3.org/1999/xhtml\" lang=\"ru-RU\"><head profile=\"http://g'\n",
+      "url: http://art-exlibris.net/person/6396\n",
+      "content-type: text/html; charset=UTF-8\n",
+      "content: b'<!DOCTYPE html PUBLIC \"-//W3C//DTD XHTML 1.0 Transitional//EN\" \"http://www.w3.org/TR/xhtml1/DTD/xhtml1-transitional.dtd\">\\n<html xmlns=\"http://www.w3.org/1999/xhtml\">\\n<head>\\n<meta http-equiv=\"Content-T'\n",
+      "url: http://artefaccio.blogspot.com/2016/03/sleeping-beauty-turquoise-copper-wire.html\n",
+      "content-type: text/html; charset=UTF-8\n",
+      "content: b\"<!DOCTYPE html>\\n<html class='v2' dir='ltr' lang='en-GB'>\\n<head>\\n<link href='https://www.blogger.com/static/v1/widgets/3566091532-css_bundle_v2.css' rel='stylesheet' type='text/css'/>\\n<meta content='wi\"\n",
+      "url: http://artem.kolesalux.ru/diski-ls-flowforming-wheels.html\n",
+      "content-type: text/html; charset=UTF-8\n",
+      "content: b'<!DOCTYPE HTML PUBLIC \"-//W3C//DTD HTML 4.01 Transitional//EN\">\\n<html><head><title>\\xd0\\x94\\xd0\\xb8\\xd1\\x81\\xd0\\xba\\xd0\\xb8 LS FlowForming \\xd0\\xba\\xd1\\x83\\xd0\\xbf\\xd0\\xb8\\xd1\\x82\\xd1\\x8c \\xd0\\xb0\\xd0\\xb2\\xd1\\x82\\xd0\\xbe\\xd0\\xbc\\xd0\\xbe\\xd0\\xb1\\xd0\\xb8\\xd0\\xbb\\xd1\\x8c\\xd0\\xbd\\xd1\\x8b\\xd0\\xb5 \\xd0\\xba\\xd0\\xbe\\xd0\\xbb\\xd0\\xb5\\xd1\\x81\\xd0\\xbd\\xd1\\x8b\\xd0\\xb5 \\xd0\\xbb\\xd0\\xb8\\xd1\\x82\\xd1\\x8b\\xd0\\xb5 \\xd0\\xb4\\xd0\\xb8\\xd1\\x81\\xd0\\xba\\xd0\\xb8 \\xd0\\xa4\\xd0\\x9b\\xd0\\x9e\\xd0\\xa3 \\xd0\\xa4\\xd0'\n",
+      "url: http://articles.ivymag.org/ivysubs/moreabo0_memo.html\n",
+      "content-type: text/html; charset=UTF-8\n",
+      "content: b'<HTML>\\n  <HEAD><script type=\"text/javascript\">(window.NREUM||(NREUM={})).init={privacy:{cookies_enabled:true},ajax:{deny_list:[\"bam.nr-data.net\"]},distributed_tracing:{enabled:true}};(window.NREUM||(N'\n",
+      "url: http://artofthinkingsmart.com/2012/02/\n",
+      "content-type: text/html\n",
+      "content: b'<!DOCTYPE html>\\n<html lang=\"en\">\\n<head>\\n    <meta charset=\"UTF-8\">\\n    <title>Captcha</title>\\n    <link rel=\"stylesheet\"\\n          href=\"https://maxcdn.bootstrapcdn.com/bootstrap/3.3.6/css/bootstrap.m'\n",
+      "url: http://arvidlone.com/product/organization36762?id=985\n",
+      "content-type: text/html; charset=UTF-8\n",
+      "content: b'<!DOCTYPE html>\\r\\n<html class=\"no-js\" lang=\"ja\">\\r\\n<head>\\r\\n<meta http-equiv=\"X-UA-Compatible\" content=\"IE=edge\">\\r\\n<meta charset=\"UTF-8\">\\r\\n<meta name=\"viewport\" content=\"width=device-width, initial-scale'\n",
+      "url: http://arzone.ning.com/gifts/gift/list?screenName=2gp9st530pcpk\n",
+      "content-type: text/html; charset=UTF-8\n",
+      "content: b'<!DOCTYPE html>\\n<html lang=\"en\" xmlns:og=\"http://ogp.me/ns#\">\\n    <head data-layout-view=\"default\">\\n<script>\\n    window.dataLayer = window.dataLayer || [];\\n        </script>\\n<!-- Google Tag Manager --'\n",
+      "url: http://asa-kensetsu.com/related/\n",
+      "content-type: text/html; charset=UTF-8\n",
+      "content: b'<!DOCTYPE html>\\r\\n<!--[if IE]>\\r\\n<meta http-equiv=\"X-UA-Compatible\" content=\"IE=Edge\">\\r\\n<![endif]-->\\r\\n<html xmlns:fb=\"http://ogp.me/ns/fb#\" lang=\"ja\">\\r\\n<head>\\r\\n<meta charset=\"UTF-8\" />\\r\\n<title>\\xe9\\x96\\xa2\\xe9\\x80\\xa3\\xe5\\x9b\\xa3'\n",
+      "url: http://asahi25881939.blog.fc2.com/blog-date-20130524.html\n",
+      "content-type: text/html;charset=utf-8\n",
+      "content: b'<!DOCTYPE HTML PUBLIC \"-//W3C//DTD HTML 4.01 Transitional//EN\" \"http://www.w3.org/TR/html4/loose.dtd\">\\n<html>\\n<head>\\n<meta http-equiv=\"content-type\" content=\"text/html; charset=utf-8\">\\n<meta http-equi'\n",
+      "url: http://asahi25881939.blog.fc2.com/blog-date-20140211.html\n",
+      "content-type: text/html;charset=utf-8\n",
+      "content: b'<!DOCTYPE HTML PUBLIC \"-//W3C//DTD HTML 4.01 Transitional//EN\" \"http://www.w3.org/TR/html4/loose.dtd\">\\n<html>\\n<head>\\n<meta http-equiv=\"content-type\" content=\"text/html; charset=utf-8\">\\n<meta http-equi'\n",
+      "url: http://ascelin.com/kort-blond-kapsel-2022/\n",
+      "content-type: text/html; charset=UTF-8\n",
+      "content: b'<!DOCTYPE html>\\n<html xmlns=\"http://www.w3.org/1999/xhtml\" dir=\"ltr\" lang=\"nl\" prefix=\"og: https://ogp.me/ns#\">\\n<head>\\n<meta http-equiv=\"Content-Type\" content=\"text/html; charset=UTF-8\">\\n<meta name=\"v'\n",
+      "url: http://asianteenytubes.net/moviehd/phthisic-jav-academy-tsun-fucks-saturated-file-accommodations-off-out-be-required-of-one-s-mind-mendicant/index.html\n",
+      "content-type: text/html; charset=UTF-8\n",
+      "content: b'<!DOCTYPE html>\\n<html lang=\"en\">\\n<head> <meta name=\"referrer\" content=\"unsafe-url\">\\n<meta charset=\"utf-8\">\\n<meta name=\"viewport\" content=\"width=device-width, initial-scale=1\">\\n<title>Asian porn video<'\n",
+      "url: http://asienveracruz.blogspot.com/2014/07/busca-veracruz-ser-sede-del-congreso.html\n",
+      "content-type: text/html; charset=UTF-8\n",
+      "content: b\"<!DOCTYPE html>\\n<html class='v2' dir='ltr' lang='es'>\\n<head>\\n<link href='https://www.blogger.com/static/v1/widgets/3566091532-css_bundle_v2.css' rel='stylesheet' type='text/css'/>\\n<meta content='width\"\n",
+      "url: http://asmo45.ru/news/vargashinskij_okrug_vargashinskaja_pchjolka/2024-01-15-6873\n",
+      "content-type: text/html; charset=UTF-8\n",
+      "content: b'<html>\\n<head>\\n<meta http-equiv=\"content-type\" content=\"text/html; charset=UTF-8\">\\n<title>\\xd0\\xa1\\xd0\\xbe\\xd0\\xb2\\xd0\\xb5\\xd1\\x82 \\xd0\\xbc\\xd1\\x83\\xd0\\xbd\\xd0\\xb8\\xd1\\x86\\xd0\\xb8\\xd0\\xbf\\xd0\\xb0\\xd0\\xbb\\xd1\\x8c\\xd0\\xbd\\xd1\\x8b\\xd1\\x85 \\xd0\\xbe\\xd0\\xb1\\xd1\\x80\\xd0\\xb0\\xd0\\xb7\\xd0\\xbe\\xd0\\xb2\\xd0\\xb0\\xd0\\xbd\\xd0\\xb8\\xd0\\xb9 \\xd0\\x9a\\xd1\\x83\\xd1\\x80\\xd0\\xb3\\xd0\\xb0\\xd0\\xbd\\xd1\\x81\\xd0\\xba\\xd0\\xbe\\xd0\\xb9 \\xd0\\xbe\\xd0\\xb1\\xd0\\xbb\\xd0\\xb0\\xd1\\x81\\xd1\\x82\\xd0\\xb8 - \\xd0\\x9d\\xd0\\xbe\\xd0\\xb2\\xd0\\xbe\\xd1\\x81\\xd1\\x82'\n"
+     ]
+    }
+   ],
+   "execution_count": 26
+  },
+  {
+   "metadata": {
+    "ExecuteTime": {
+     "end_time": "2025-10-09T13:33:46.725140Z",
+     "start_time": "2025-10-09T13:33:46.704415Z"
+    }
+   },
+   "cell_type": "code",
+   "source": [
+    "warc_path = './TEST-000000.extracted.warc.gz'\n",
+    "dump_all_records(warc_path, limit=5)"
+   ],
+   "id": "d1d433956ce0f3fa",
+   "outputs": [
+    {
+     "name": "stdout",
+     "output_type": "stream",
+     "text": [
+      "url: https://turux.at/\n",
+      "content-type: text/html\n",
+      "content: b'<!DOCTYPE html PUBLIC \"-//W3C//DTD XHTML 1.0 Transitional//EN\" \"http://www.w3.org/TR/xhtml1/DTD/xhtml1-transitional.dtd\">\\n<html xmlns=\"http://www.w3.org/1999/xhtml\">\\n<head>\\n<meta http-equiv=\"Content-T'\n",
+      "url: http://turux.at/\n",
+      "content-type: text/html; charset=iso-8859-1\n",
+      "content: b'<!DOCTYPE HTML PUBLIC \"-//IETF//DTD HTML 2.0//EN\">\\n<html><head>\\n<title>302 Found</title>\\n</head><body>\\n<h1>Found</h1>\\n<p>The document has moved <a href=\"https://turux.at/\">here</a>.</p>\\n<hr>\\n<address>'\n"
+     ]
+    }
+   ],
+   "execution_count": 8
+  },
+  {
+   "metadata": {
+    "ExecuteTime": {
+     "end_time": "2025-10-09T13:39:50.153808Z",
+     "start_time": "2025-10-09T13:39:49.714987Z"
+    }
+   },
+   "cell_type": "code",
+   "source": [
+    "import os\n",
+    "import json\n",
+    "import pandas as pd\n",
+    "\n",
+    "cdxj_path = os.path.splitext(warc_path)[0] + '.cdxj'\n",
+    "objects = []\n",
+    "with open(cdxj_path, 'rt') as f:\n",
+    "    for line in f:\n",
+    "        surl, timestamp, json_dict = line.split(' ', 2)\n",
+    "        data = json.loads(json_dict)\n",
+    "        data.update({'surl': surl, 'timestamp': timestamp})\n",
+    "        print(surl, timestamp, data)\n",
+    "        objects.append(data)\n",
+    "\n",
+    "df = pd.DataFrame.from_records(objects)\n"
+   ],
+   "id": "92ec566d3f2fc08e",
+   "outputs": [
+    {
+     "name": "stdout",
+     "output_type": "stream",
+     "text": [
+      "at,turux)/ 20250911025500 {'url': 'https://turux.at/', 'mime': 'text/html', 'status': '200', 'digest': 'sha1:23AM7B43CHDLUEZTG2SLWB5DF25YJJTX', 'length': '1993', 'offset': '358', 'filename': 'TEST-000000.extracted.warc.gz', 'surl': 'at,turux)/', 'timestamp': '20250911025500'}\n",
+      "at,turux)/ 20250911030852 {'url': 'http://turux.at/', 'mime': 'text/html', 'status': '302', 'digest': 'sha1:MPFPQGOFANQKEGZJCIZMGJPBYSFEYW6V', 'length': '813', 'offset': '2351', 'filename': 'TEST-000000.extracted.warc.gz', 'surl': 'at,turux)/', 'timestamp': '20250911030852'}\n"
+     ]
+    }
+   ],
+   "execution_count": 15
+  },
+  {
+   "metadata": {
+    "ExecuteTime": {
+     "end_time": "2025-10-09T13:39:53.792972Z",
+     "start_time": "2025-10-09T13:39:53.754838Z"
+    }
+   },
+   "cell_type": "code",
+   "source": "df",
+   "id": "7f698d6e6ee84795",
+   "outputs": [
+    {
+     "data": {
+      "text/plain": [
+       "                 url       mime status                                 digest  \\\n",
+       "0  https://turux.at/  text/html    200  sha1:23AM7B43CHDLUEZTG2SLWB5DF25YJJTX   \n",
+       "1   http://turux.at/  text/html    302  sha1:MPFPQGOFANQKEGZJCIZMGJPBYSFEYW6V   \n",
+       "\n",
+       "  length offset                       filename        surl       timestamp  \n",
+       "0   1993    358  TEST-000000.extracted.warc.gz  at,turux)/  20250911025500  \n",
+       "1    813   2351  TEST-000000.extracted.warc.gz  at,turux)/  20250911030852  "
+      ],
+      "text/html": [
+       "<div>\n",
+       "<style scoped>\n",
+       "    .dataframe tbody tr th:only-of-type {\n",
+       "        vertical-align: middle;\n",
+       "    }\n",
+       "\n",
+       "    .dataframe tbody tr th {\n",
+       "        vertical-align: top;\n",
+       "    }\n",
+       "\n",
+       "    .dataframe thead th {\n",
+       "        text-align: right;\n",
+       "    }\n",
+       "</style>\n",
+       "<table border=\"1\" class=\"dataframe\">\n",
+       "  <thead>\n",
+       "    <tr style=\"text-align: right;\">\n",
+       "      <th></th>\n",
+       "      <th>url</th>\n",
+       "      <th>mime</th>\n",
+       "      <th>status</th>\n",
+       "      <th>digest</th>\n",
+       "      <th>length</th>\n",
+       "      <th>offset</th>\n",
+       "      <th>filename</th>\n",
+       "      <th>surl</th>\n",
+       "      <th>timestamp</th>\n",
+       "    </tr>\n",
+       "  </thead>\n",
+       "  <tbody>\n",
+       "    <tr>\n",
+       "      <th>0</th>\n",
+       "      <td>https://turux.at/</td>\n",
+       "      <td>text/html</td>\n",
+       "      <td>200</td>\n",
+       "      <td>sha1:23AM7B43CHDLUEZTG2SLWB5DF25YJJTX</td>\n",
+       "      <td>1993</td>\n",
+       "      <td>358</td>\n",
+       "      <td>TEST-000000.extracted.warc.gz</td>\n",
+       "      <td>at,turux)/</td>\n",
+       "      <td>20250911025500</td>\n",
+       "    </tr>\n",
+       "    <tr>\n",
+       "      <th>1</th>\n",
+       "      <td>http://turux.at/</td>\n",
+       "      <td>text/html</td>\n",
+       "      <td>302</td>\n",
+       "      <td>sha1:MPFPQGOFANQKEGZJCIZMGJPBYSFEYW6V</td>\n",
+       "      <td>813</td>\n",
+       "      <td>2351</td>\n",
+       "      <td>TEST-000000.extracted.warc.gz</td>\n",
+       "      <td>at,turux)/</td>\n",
+       "      <td>20250911030852</td>\n",
+       "    </tr>\n",
+       "  </tbody>\n",
+       "</table>\n",
+       "</div>"
+      ]
+     },
+     "execution_count": 16,
+     "metadata": {},
+     "output_type": "execute_result"
+    }
+   ],
+   "execution_count": 16
+  },
+  {
+   "metadata": {
+    "ExecuteTime": {
+     "end_time": "2025-10-09T13:14:05.351832Z",
+     "start_time": "2025-10-09T13:14:05.333388Z"
+    }
+   },
+   "cell_type": "code",
+   "source": "r = get_first_record(warc_path)",
+   "id": "2b296a5741ca8045",
+   "outputs": [],
+   "execution_count": 10
+  },
+  {
+   "metadata": {
+    "ExecuteTime": {
+     "end_time": "2025-10-09T13:15:07.506053Z",
+     "start_time": "2025-10-09T13:15:07.485847Z"
+    }
+   },
+   "cell_type": "code",
+   "source": "r.content_stream().read()",
+   "id": "e7b2171bcad517f7",
+   "outputs": [
+    {
+     "data": {
+      "text/plain": [
+       "b'<meta http-equiv=Refresh content=0;url=/web/error.php?id=2>'"
+      ]
+     },
+     "execution_count": 18,
+     "metadata": {},
+     "output_type": "execute_result"
+    }
+   ],
+   "execution_count": 18
+  },
+  {
+   "metadata": {},
+   "cell_type": "code",
+   "outputs": [],
+   "execution_count": null,
+   "source": "",
+   "id": "f7293efd120ac1b4"
+  }
+ ],
+ "metadata": {
+  "kernelspec": {
+   "display_name": "Python 3",
+   "language": "python",
+   "name": "python3"
+  },
+  "language_info": {
+   "codemirror_mode": {
+    "name": "ipython",
+    "version": 2
+   },
+   "file_extension": ".py",
+   "mimetype": "text/x-python",
+   "name": "python",
+   "nbconvert_exporter": "python",
+   "pygments_lexer": "ipython2",
+   "version": "2.7.6"
+  }
+ },
+ "nbformat": 4,
+ "nbformat_minor": 5
+}

From 3c35dbffb3ff1af06f9d62da6143a48c82e0ca17 Mon Sep 17 00:00:00 2001
From: Damian <d@damianstewart.com>
Date: Wed, 15 Oct 2025 14:51:32 +0200
Subject: [PATCH 13/22] wip edits

---
 README.md | 38 +++++++++++++++++++++++++++-----------
 1 file changed, 27 insertions(+), 11 deletions(-)

diff --git a/README.md b/README.md
index d64c668..f129563 100644
--- a/README.md
+++ b/README.md
@@ -1,6 +1,6 @@
 # Whirlwind Tour of Common Crawl's Datasets using Python
 
-The Common Crawl corpus contains petabytes of crawl data, including raw web page data, metadata extracts, and text extracts. Common Crawl's data storage is a little complicated, as you might expect for such a large and rich dataset. We make our crawl data available in a variety of formats (WARC, WET, WAT) and we also have two index files of the crawled webpages: CDXJ and columnar.
+The Common Crawl corpus contains petabytes of crawl data, including raw web page data, metadata, and parsed text. Common Crawl's data storage is a little complicated, as you might expect for such a large and rich dataset. We make our crawl data available in a variety of formats (WARC, WET, WAT) and we also have two index files of the crawled webpages: CDXJ and columnar.
 ```mermaid
 flowchart TD
     WEB["WEB"] -- crawler --> cc["Common Crawl"]
@@ -87,15 +87,15 @@ You'll see four records total, with the start of each record marked with the hea
 
 ### WET
 
-WET (WARC Encapsulated Text) files only contain the body text of web pages extracted from the HTML and exclude any HTML code, images, or other media. This makes them useful for text analysis and natural language processing (NLP) tasks.
+WET (WARC Encapsulated Text) files only contain the body text of web pages parsed from the HTML and exclude any HTML code, images, or other media. This makes them useful for text analysis and natural language processing (NLP) tasks.
 
 Open `whirlwind.warc.wet`: this is the WET derived from our original WARC. We can see that it's still in WARC format with two records: 
 1) a `warcinfo` record.
-2) a `conversion` record: the extracted text with the HTTP headers removed.
+2) a `conversion` record: the parsed text with HTTP headers removed.
 
 ### WAT
 
-WAT (Web ARChive Timestamp) files contain metadata associated with the crawled web pages (e.g. parsed data from the HTTP response headers, links extracted from HTML pages, server response codes etc.). They are useful for analysis that requires understanding the structure of the web.
+WAT (Web ARChive Timestamp) files contain metadata associated with the crawled web pages (e.g. parsed data from the HTTP response headers, links recovered from HTML pages, server response codes etc.). They are useful for analysis that requires understanding the structure of the web.
 
 Open `whirlwind.warc.wat`: this is the WAT derived from our original WARC. Like the WET file, it's also in WARC format. It contains two records:
 1) a `warcinfo` record.
@@ -217,9 +217,9 @@ For each of these records, there's one text line in the index - yes, it's a flat
 
 What is the purpose of this funky format? It's done this way because these flat files (300 gigabytes total per crawl) can be sorted on the primary key using any out-of-core sort utility e.g. the standard Linux `sort`, or one of the Hadoop-based out-of-core sort functions.
 
-The JSON blob has enough information to extract individual records: it says which warc file the record is in, and the offset and length of the record. We'll use that in the next section.
+The JSON blob has enough information to cleanly isolate the raw data of a single record: it defines which WARC file the record is in, and the byte offset and length of the record within this file. We'll use that in the next section.
 
-## Task 4: Use the CDXJ index to extract raw content from the local WARC, WET, and WAT 
+## Task 4: Use the CDXJ index to extract a subset of raw content from the local WARC, WET, and WAT 
 
 Normally, compressed files aren't random access. However, the WARC files use a trick to make this possible, which is that every record needs to be separately compressed. The `gzip` compression utility supports this, but it's rarely used.
 
@@ -350,14 +350,14 @@ The output looks like this:
   <summary>Click to view output</summary>
 
 ```
-look up this capture in the comoncrawl cdx index for CC-MAIN-2024-22, returning only the first match
+lookup captures for the given url in the commoncrawl cdx index for CC-MAIN-2024-22, returning only the first match
 cdxt --limit 1 --crawl CC-MAIN-2024-22 iter an.wikipedia.org/wiki/Escopete
 status 200, timestamp 20240518015810, url https://an.wikipedia.org/wiki/Escopete
 
 cleanup previous work
 rm -f TEST-000000.extracted.warc.gz
-extract the content from the commoncrawl s3 bucket, using the timestamp from above
-cdxt --limit 1 --crawl CC-MAIN-2024-22 --from 20240518015810 --to 20240518015810 warc an.wikipedia.org/wiki/Escopete
+retrieve the content from the commoncrawl s3 bucket
+cdxt --limit 1 --crawl CC-MAIN-2024-22 warc an.wikipedia.org/wiki/Escopete
 
 index this new warc
 cdxj-indexer TEST-000000.extracted.warc.gz  > TEST-000000.extracted.warc.cdxj
@@ -373,9 +373,25 @@ python ./warcio-iterator.py TEST-000000.extracted.warc.gz
 
 </details>
 
-We look up the capture using the `cdxt` commands by specifying the exact URL (`an.wikipedia.org/wiki/Escopete`) and the date of its capture (20240518015810). The output is the WARC file `TEST-000000.extracted.warc.gz` which contains a `warcinfo` record explaining what the WARC is, followed by the `response` record we requested. The Makefile target then runs `cdxj-indexer` on this new WARC to make a CDXJ index of it as in Task 3, and finally iterates over the WARC using `warcio-iterator.py` as in Task 2.
+There's a lot going on here so let's unpack it a little.
 
-If you dig into cdx_toolkit's code, you'll find that it is using the offset and length of the WARC record, as returned by the CDX index query, to make a HTTP byte range request to S3 to download the single WARC record we want. It only downloads the response WARC record because our CDX index only has the response records indexed.
+#### Check that the crawl has a record for the page we are interested in
+
+We check for capture results using the `cdxt` command `iter`, specifying the exact URL `an.wikipedia.org/wiki/Escopete` and the crawl identifier `CC-MAIN-2024-22`. The result of this tells us that the crawl successfuly fetched this page at timestamp `20240518015810`.
+* You can try removing the `--limit 1` flag and/or replacing `--crawl CC-MAIN-2024-22` with `--cc`, which will return more results reflecting more times when this URL was crawled. 
+* You can also use `--from <timestamp>` and `--to <timestamp>` to restrict the time range when the URL was crawled. This can even be used to pinpoint an exact record — for example, `--from 20240518015810 --to 20240518015810` will only ever return the record that we've been looking at elsewhere in this tutorial.
+* URLs may be specified with wildcards to return even more results: `"an.wikipedia.org/wiki/Escop*"` matches `an.wikipedia.org/wiki/Escopulión` and `an.wikipedia.org/wiki/Escopete`.
+
+#### Retrieve the fetched content as WARC
+
+Next, we use the `cdxt` command `warc` to retrieve the content and save it locally as a new WARC file, again specifying the exact URL and crawl identifier. This creates the WARC file `TEST-000000.extracted.warc.gz` which contains a `warcinfo` record explaining what the WARC is, followed by the `response` record we requested. 
+* If you dig into cdx_toolkit's code, you'll find that it is using the offset and length of the WARC record (as returned by the CDX index query) to make a HTTP byte range request to S3 that isolates and returns just the single record we want from the full file. It only downloads the response WARC record because our CDX index only has the response records indexed.
+* By default `cdxt` avoids overwriting existing files by automatically incrementing the counter in the filename. If you run this again without deleting `TEST-000000.extracted.warc.gz`, the data will be written again to a new file `TEST-000001.extracted.warc.gz`.
+* Limit, timestamp, and crawl index args, as well as URL wildcards, work as for `iter`.
+
+### Indexing the WARC and viewing its contents
+
+Finally, we run `cdxj-indexer` on this new WARC to make a CDXJ index of it as in Task 3, and then iterate over the WARC using `warcio-iterator.py` as in Task 2.
 
 ## Task 7: Find the right part of the columnar index 
 

From ce143ba4838e076c4bda9c8947dcdfef062c7667 Mon Sep 17 00:00:00 2001
From: Damian Stewart <ot@damianstewart.com>
Date: Wed, 15 Oct 2025 15:20:31 +0200
Subject: [PATCH 14/22] delete accidentally added file

---
 notebooks/warcio_experiments.ipynb | 923 -----------------------------
 1 file changed, 923 deletions(-)
 delete mode 100644 notebooks/warcio_experiments.ipynb

diff --git a/notebooks/warcio_experiments.ipynb b/notebooks/warcio_experiments.ipynb
deleted file mode 100644
index c51d87a..0000000
--- a/notebooks/warcio_experiments.ipynb
+++ /dev/null
@@ -1,923 +0,0 @@
-{
- "cells": [
-  {
-   "metadata": {
-    "ExecuteTime": {
-     "end_time": "2025-10-09T13:33:27.910213Z",
-     "start_time": "2025-10-09T13:33:27.895153Z"
-    }
-   },
-   "cell_type": "code",
-   "source": [
-    "%load_ext autoreload\n",
-    "%autoreload 2"
-   ],
-   "id": "f142ae2305e8e09d",
-   "outputs": [],
-   "execution_count": 2
-  },
-  {
-   "cell_type": "code",
-   "id": "initial_id",
-   "metadata": {
-    "collapsed": true,
-    "ExecuteTime": {
-     "end_time": "2025-10-09T13:33:28.691992Z",
-     "start_time": "2025-10-09T13:33:28.678002Z"
-    }
-   },
-   "source": "from warcio.archiveiterator import ArchiveIterator\n",
-   "outputs": [],
-   "execution_count": 3
-  },
-  {
-   "metadata": {
-    "ExecuteTime": {
-     "end_time": "2025-10-09T13:10:59.883851Z",
-     "start_time": "2025-10-09T13:10:59.857364Z"
-    }
-   },
-   "cell_type": "code",
-   "source": "warc_path = \"/home/cc-pds/commoncrawl/crawl-data/CC-MAIN-2024-10/segments/1707947473347.0/warc/CC-MAIN-20240220211055-20240221001055-00101.warc.gz\"",
-   "id": "88a4052768f17978",
-   "outputs": [],
-   "execution_count": 5
-  },
-  {
-   "metadata": {
-    "ExecuteTime": {
-     "end_time": "2025-10-09T13:33:31.045128Z",
-     "start_time": "2025-10-09T13:33:31.022226Z"
-    }
-   },
-   "cell_type": "code",
-   "source": [
-    "\n",
-    "def dump_all_records(warc_path, limit: int=5):\n",
-    "    count = 0\n",
-    "    with open(warc_path, \"rb\") as f:\n",
-    "        for record in ArchiveIterator(f):\n",
-    "            if record.rec_type == \"response\":\n",
-    "                #print(record.rec_headers)\n",
-    "                print(\"url:\", record.rec_headers.get_header(\"WARC-Target-URI\"))\n",
-    "                print(\"content-type:\", record.http_headers.get_header(\"Content-Type\"))\n",
-    "                content = record.content_stream().read()\n",
-    "                print(\"content:\", content[:200])\n",
-    "                count += 1\n",
-    "                if count >= limit:\n",
-    "                    break\n",
-    "\n",
-    "def get_first_record(warc_path):\n",
-    "    with open(warc_path, \"rb\") as f:\n",
-    "        for record in ArchiveIterator(f):\n",
-    "            if record.rec_type == \"response\":\n",
-    "                return record"
-   ],
-   "id": "72d21cc15eb4b1c0",
-   "outputs": [],
-   "execution_count": 4
-  },
-  {
-   "metadata": {
-    "ExecuteTime": {
-     "end_time": "2025-10-09T13:19:25.393165Z",
-     "start_time": "2025-10-09T13:19:24.977645Z"
-    }
-   },
-   "cell_type": "code",
-   "source": "dump_all_records(warc_path, limit=200)",
-   "id": "16d1afcec0c6de96",
-   "outputs": [
-    {
-     "name": "stdout",
-     "output_type": "stream",
-     "text": [
-      "url: http://020zxdq.baiwanx.com.cn/?user=020zxdq\n",
-      "content-type: text/html\n",
-      "content: b'<meta http-equiv=Refresh content=0;url=/web/error.php?id=2>'\n",
-      "url: http://04.ma/2017/05/05/%D8%A7%D9%84%D9%81%D8%B1%D8%A7%D8%B4%D8%A9-%D8%AF%D9%8A%D8%A7%D9%84-%D9%82%D9%8A%D8%B3%D8%A7%D8%B1%D9%8A%D8%A9-%D8%B3%D8%A8%D8%A7%D8%AA%D8%A9-%D8%AF%D8%A7%D8%B1%D9%88-%D9%88%D9%82%D9%81%D8%A9-%D9%82/\n",
-      "content-type: text/html; charset=UTF-8\n",
-      "content: b'<!DOCTYPE html>\\n<html dir=\"rtl\" lang=\"ar\" prefix=\"og: http://ogp.me/ns#\">\\n<head>\\n\\n<script async src=\"https://pagead2.googlesyndication.com/pagead/js/adsbygoogle.js?client=ca-pub-9575065054627750\"\\n    '\n",
-      "url: http://0qc.juzdani.com/site-data/hnstmjzxh/html/zfjx1/index.html\n",
-      "content-type: text/html; charset=utf-8\n",
-      "content: b'<!DOCTYPE html>\\n<html lang=\"zh-CN\"><head>\\n<meta name=\"applicable-device\" content=\"pc,mobile\"/> \\n  <meta charset=\"utf-8\"/> \\n  <meta content=\"width=device-width,initial-scale=1.0,user-scalabel=0, user-s'\n",
-      "url: http://1001kick.com/4140/\n",
-      "content-type: text/html; charset=UTF-8\n",
-      "content: b'<!DOCTYPE html>\\n<html class=\"no-js\" prefix=\"og: http://ogp.me/ns#\"><!--<![endif]--><head>\\n  <!--[if IE]><meta http-equiv=\"X-UA-Compatible\" content=\"IE=edge\"><![endif]-->\\n  <!--[if lt IE 8]>\\n  <script '\n",
-      "url: http://118184.webhosting44.1blu.de/wolf/famfo/i1373.html\n",
-      "content-type: text/html\n",
-      "content: b'<!DOCTYPE HTML PUBLIC \"-//W3C//DTD HTML 4.01 Transitional//EN\">\\r\\n<html>\\r\\n<head>\\r\\n<meta http-equiv=\"Content-Type\" content=\"text/html; charset=windows-1252\" />\\r\\n<meta name=\"KEYWORDS\" content=\"Genealogie'\n",
-      "url: http://1438195.tdne869.com/index.phtml?PUT=A_SORT&CHANNEL=&SORT=R7&FID=1438195\n",
-      "content-type: text/html; charset=Big5\n",
-      "content: b'<html><head><title>173live\\xac\\xfc\\xa4k\\xbcv\\xad\\xb5live\\xa8q - \\xa4@\\xb9\\xef\\xa4@\\xc2I\\xbc\\xc6::\\xa5\\xd1\\xb0\\xaa\\xa6\\xdc\\xa7C\\xb1\\xc6\\xa7\\xc7 </title><meta http-equiv=\"Content-Language\" content=\"zh-tw\"><meta http-equiv=content-type content=\"text/html; charset=big5\"><meta na'\n",
-      "url: http://170501.afg054.com/index.phtml?PUT=a_show&AID=193096&FID=170501&R2=&CHANNEL=\n",
-      "content-type: text/html; charset=Big5\n",
-      "content: b'<html><head><title>\\xaa\\xf7\\xb2~\\xb1\\xf6\\xb5\\xf8\\xb0T,\\xa7K\\xb6O\\xa6b\\xbdu\\xb5\\xf8\\xc0W\\xb2\\xe1\\xa4\\xd1</title><meta http-equiv=content-type content=\"text/html; charset=big5\">\\n<meta name=\"Keywords\" content=\"\\xa7K\\xb6O\\xa6h\\xa4H\\xb5\\xf8\\xc0W\\xba\\xf4\\xaf\\xb8 ,\\xa5x\\xc6W\\xb5\\xb7\\xc4\\xfb\\xac\\xfc\\xa4k\\xbcg\\xafu ,\\xa9t\\xa8k\\xb9\\xe8\\xa4k\\xbd\\xcd\\xa4\\xdf\\xc1p'\n",
-      "url: http://176507.k997hh.com/?PUT=a_show&AID=74944&FID=176507&R2=&CHANNEL=\n",
-      "content-type: text/html; charset=Big5\n",
-      "content: b'<html><head><title>\\xab\\xe1\\xaec\\xa8\\xdf\\xb6O\\xa6\\xe2\\xb1\\xa1\\xb5\\xf8\\xc0W\\xaa\\xbd\\xbc\\xbd\\xb6\\xa1</title><meta http-equiv=content-type content=\"text/html; charset=big5\">\\n<meta name=\"Keywords\" content=\"\\xad\\xb5\\xc6[\\xbd\\xe0\\xa4s\\xb1T\\xa4\\xf2\\xa4\\xf9\\xa4\\xd1\\xa4W\\xa4H\\xb6\\xa1\\xaeT\\xbc\\xd6\\xba\\xf4\\xb5\\xf8\\xb0T\\xb7|\\xc4\\xb3\\xad^\\xa4\\xe5\\xa7K\\xb6O\\xa6\\xa8\\xa4H\\xba\\xf4\\xac\\xbd\\xac\\xbd\\xef'\n",
-      "url: http://1g40.hoosierscabinet.net/About-Us/Principals-Page/index.html\n",
-      "content-type: text/html; charset=utf-8\n",
-      "content: b'<!DOCTYPE html>\\n<html lang=\"en\"><head>\\n<meta name=\"applicable-device\" content=\"pc,mobile\"/>\\n    <!-- Global site tag (gtag.js) - Google Analytics -->\\n    <script async=\"\" src=\"http://www.googletagmana'\n",
-      "url: http://1g40.hoosierscabinet.net/Admissions/index.html\n",
-      "content-type: text/html; charset=utf-8\n",
-      "content: b'<!DOCTYPE html>\\n<html lang=\"en\"><head>\\n<meta name=\"applicable-device\" content=\"pc,mobile\"/>\\n    <!-- Global site tag (gtag.js) - Google Analytics -->\\n    <script async=\"\" src=\"http://www.googletagmana'\n",
-      "url: http://1stforprint.co.uk/shop/print/leaflets-and-flyers/dl-flyers-leaflets-full-colour-single-sided-24hr-dispatch-250gsm-plain-2/\n",
-      "content-type: text/html; charset=UTF-8\n",
-      "content: b'<!DOCTYPE html>\\n<html lang=\"en-GB\" prefix=\"og: https://ogp.me/ns#\">\\n<head>\\n<meta charset=\\'UTF-8\\'>\\n<meta name=\"viewport\" content=\"width=device-width, initial-scale=1\">\\n<link rel=\"profile\" href=\"http://'\n",
-      "url: http://2-floor.dyndns.org/item_detail.php?pro_id=541152\n",
-      "content-type: text/html; charset=UTF-8\n",
-      "content: b'<!DOCTYPE html PUBLIC \"-//W3C//DTD XHTML 1.0 Transitional//EN\" \"http://www.w3.org/TR/xhtml1/DTD/xhtml1-transitional.dtd\">\\r\\n<html xmlns=\"http://www.w3.org/1999/xhtml\">\\r\\n<head>\\r\\n<meta http-equiv=\"Conten'\n",
-      "url: http://221625.tu75h.com/index.phtml?PUT=a_show&AID=206811&FID=221625&R2=&CHANNEL=\n",
-      "content-type: text/html; charset=Big5\n",
-      "content: b'<html>\\n\\n<head>\\n<title>\\n\\xa5x\\xc6W\\xb2\\xa2\\xa4\\xdf\\xa4k\\xab\\xc4\\xb2\\xe1\\xa4\\xd1\\xab\\xc7</title>\\n<meta http-equiv=\"PICS-Label\" content=\\'(PICS-1.1 \"http://www.ticrf.org.tw/chinese/html/06-rating-v11.htm\" l gen true for \"http://221625.tu75h.com\" r ('\n",
-      "url: http://222.ninja-official.com/2020/01/30/ninja-kyototo/\n",
-      "content-type: text/html; charset=UTF-8\n",
-      "content: b'<!DOCTYPE html>\\n<html lang=\"ja\"><head prefix=\"og: http://ogp.me/ns# fb: http://ogp.me/ns/fb# article: http://ogp.me/ns/article#\"><meta charset=\"utf-8\" />\\n<meta name=\"viewport\" content=\"width=device-wi'\n",
-      "url: http://24hourlocksmith-san-antonio.com/02/1050/\n",
-      "content-type: text/html; charset=UTF-8\n",
-      "content: b'<!doctype html>\\n<html lang=\"zh-CN\">\\n<head>\\n\\t<meta charset=\"UTF-8\">\\n\\t<meta name=\"viewport\" content=\"width=device-width, initial-scale=1\">\\n\\t<link rel=\"profile\" href=\"https://gmpg.org/xfn/11\">\\n<script ty'\n",
-      "url: http://24hourlocksmith-san-antonio.com/06/1216/\n",
-      "content-type: text/html; charset=UTF-8\n",
-      "content: b'<!doctype html>\\n<html lang=\"zh-CN\">\\n<head>\\n\\t<meta charset=\"UTF-8\">\\n\\t<meta name=\"viewport\" content=\"width=device-width, initial-scale=1\">\\n\\t<link rel=\"profile\" href=\"https://gmpg.org/xfn/11\">\\n<script ty'\n",
-      "url: http://257500.ru/vilyarreal-atletiko-m-smotret-onlajn-videotranslyaciyu-matcha-la-ligi/\n",
-      "content-type: text/html; charset=UTF-8\n",
-      "content: b'<!DOCTYPE html PUBLIC \"-//W3C//DTD XHTML 1.0 Transitional//EN\" \"http://www.w3.org/TR/xhtml1/DTD/xhtml1-transitional.dtd\">\\r\\n<html xmlns=\"http://www.w3.org/1999/xhtml\">\\r\\n<head>\\r\\n<meta http-equiv=\"Conten'\n",
-      "url: http://2chjoke.blog51.fc2.com/blog-entry-51.html\n",
-      "content-type: text/html;charset=utf-8\n",
-      "content: b'<!DOCTYPE html PUBLIC \"-//W3C//DTD XHTML 1.0 Transitional//EN\" \"http://www.w3.org/TR/xhtml1/DTD/xhtml1-transitional.dtd\">\\r\\n<html xmlns=\"http://www.w3.org/1999/xhtml\">\\r\\n<head>\\r\\n<title>[\\xe6\\x90\\xbe\\xe4\\xb9\\xb3\\xe6\\xa9\\x9f] by \\xef\\xbc'\n",
-      "url: http://2d6lodge.co.uk/a-cambridge-too-far-2019/cambridge-far-2017/2d6_logo/\n",
-      "content-type: text/html; charset=UTF-8\n",
-      "content: b'<!DOCTYPE html>\\n<!--[if IE 7]>\\n<html class=\"ie ie7\" lang=\"en-US\">\\n<![endif]-->\\n<!--[if IE 8]>\\n<html class=\"ie ie8\" lang=\"en-US\">\\n<![endif]-->\\n<!--[if !(IE 7) & !(IE 8)]><!-->\\n<html lang=\"en-US\">\\n<!--<'\n",
-      "url: http://2fit.anandtech.com/tag/mali-t880\n",
-      "content-type: text/html; charset=utf-8\n",
-      "content: b'\\n<!DOCTYPE html>\\n<html>\\n<!--[if IE 6]> <html class=\"ie6\"> <![endif]-->\\n<!--[if IE 7]> <html class=\"ie7\"> <![endif]-->\\n<!--[if IE 8]> <html class=\"ie8\"> <![endif]-->\\n<!--[if IE 9]> <html class=\"ie9\"> <'\n",
-      "url: http://2mares.org/un-relampago-desvanece-su-rostro/\n",
-      "content-type: text/html; charset=UTF-8\n",
-      "content: b'<!DOCTYPE html>\\n<html lang=\"fr-FR\" itemscope itemtype=\"http://schema.org/WebPage\">\\n<head>\\n<meta charset=\"UTF-8\">\\n<link rel=\"profile\" href=\"http://gmpg.org/xfn/11\">\\n<title>Un rel\\xc3\\xa1mpago desvanece su ro'\n",
-      "url: http://2pm4u.blog.fc2.com/blog-entry-226.html\n",
-      "content-type: text/html;charset=utf-8\n",
-      "content: b'<!DOCTYPE HTML\\r\\n\\tPUBLIC \"-//W3C//DTD HTML 4.01 Transitional//EN\"\\r\\n\\t\\t\"http://www.w3.org/TR/html4/loose.dtd\">\\r\\n<!--\\r\\n<!DOCTYPE HTML\\r\\n\\tPUBLIC \"-//W3C//DTD HTML 4.01//EN\"\\r\\n\\t\\t\"http://www.w3.org/TR/html4/st'\n",
-      "url: http://2sc.sohu.com/buycar/carinfo_sohu_1907886.shtml\n",
-      "content-type: text/html;charset=UTF-8\n",
-      "content: b'<!DOCTYPE html PUBLIC \"-//W3C//DTD XHTML 1.0 Transitional//EN\" \"http://www.w3.org/TR/xhtml1/DTD/xhtml1-transitional.dtd\">\\n<html>\\n<head>\\n    <meta http-equiv=\"Content-Type\" content=\"text/html; charset='\n",
-      "url: http://31sdgsyyktjdyxgs.hbpuyu.com/\n",
-      "content-type: text/html\n",
-      "content: b'<!DOCTYPE html>\\n<html lang=\"en\">\\n<head>\\n<meta charset=\"UTF-8\">\\n<meta http-equiv=\"X-UA-Compatible\" content=\"IE=edge,chrome=1\" />\\n<meta name=\"viewport\" content=\"width=device-width, initial-scale=1.0, mi'\n",
-      "url: http://342156.ya93e.com/?PUT=a_show&AID=184235&FID=342156&R2=&CHANNEL=\n",
-      "content-type: text/html; charset=Big5\n",
-      "content: b'<html><head><title>mm\\xa9]\\xa6\\xe2\\xa6\\xe2\\xafT\\xaa\\xbd\\xbc\\xbd ,\\xa7K\\xb6O\\xa6\\xa8.\\xa4H\\xba\\xa9\\xb5e\\xbdu\\xa4W\\xac\\xdd</title><meta http-equiv=content-type content=\"text/html; charset=big5\">\\n<meta name=\"Keywords\" content=\"UThome\\xb5\\xf8\\xc0W\\xb2\\xe1\\xa4\\xd1\\xab\\xc7 ,\\xa6\\xe2\\xb1\\xa1\\xac\\xfc\\xa4k\\xa8q\\xb3\\xf5\\xbbr\\xb2\\xe1 ,\\xa4\\xa4\\xb0\\xea\\xbbr'\n",
-      "url: http://34383314.blog.fc2.com/blog-entry-4360.html\n",
-      "content-type: text/html;charset=utf-8\n",
-      "content: b'<!DOCTYPE HTML PUBLIC \"-//W3C//DTD HTML 4.01 Transitional//EN\" \"http://www.w3.org/TR/html4/loose.dtd\">\\n<html>\\n<head>\\n<meta http-equiv=\"Content-Type\" content=\"text/html; charset=utf-8\">\\n<meta http-equi'\n",
-      "url: http://344560.hge100.com/index.phtml?PUT=A_SORT&SORT=R42&FID=344560\n",
-      "content-type: text/html; charset=Big5\n",
-      "content: b'<html>\\n\\n<head>\\n<title>\\n\\xa4\\xe9\\xa5\\xbbAV\\xa4k\\xc0u\\xbcg\\xafu\\xb6\\xb0\\xb5\\xf8\\xc0W,28\\xb8\\xb9\\xa4\\xbd\\xc0]\\xb5\\xf8\\xc0W\\xb2\\xe1\\xa4\\xd1\\xab\\xc7</title>\\n<meta http-equiv=\"PICS-Label\" content=\\'(PICS-1.1 \"http://www.ticrf.org.tw/chinese/html/06-rating-v11.htm\" l gen true for \"http://'\n",
-      "url: http://34474.dynamicboard.de/t506f72-Info-MSI-NetOn-AP-DisplayPC-GB-GB.html\n",
-      "content-type: text/html; charset=iso-8859-1\n",
-      "content: b'\\r\\n<!DOCTYPE html>\\r\\n<HTML xmlns=\"http://www.w3.org/1999/xhtml\" xmlns:fb=\"http://www.facebook.com/2008/fbml\" xml:lang=\"de\" lang=\"de\">\\r\\n<HEAD>\\r\\n\\r\\n<title>Informationen &raquo; Info: MSI NetOn AP 1900 Disp'\n",
-      "url: http://344812842.ideedigusto.it/%E5%B9%BC%E5%85%92-%E5%A4%A7-%E8%82%8C%E8%82%89-%E9%81%8A%E6%88%B2.html/\n",
-      "content-type: text/html; charset=UTF-8\n",
-      "content: b'<form action=\"http://344812842.ideedigusto.it/%E5%B9%BC%E5%85%92-%E5%A4%A7-%E8%82%8C%E8%82%89-%E9%81%8A%E6%88%B2.html/\" method=\"post\">\\n<button type=\"submit\">Please click if you not a BOT</button>\\n</fo'\n",
-      "url: http://360ext.com/vodplay/450675-1-1.html\n",
-      "content-type: text/html; charset=utf-8\n",
-      "content: b'<!DOCTYPE html PUBLIC \"-//W3C//DTD XHTML 1.0 Transitional//EN\" \"http://www.w3.org/TR/xhtml1/DTD/xhtml1-transitional.dtd\">\\n<html xmlns=\"http://www.w3.org/1999/xhtml\">\\n<head>\\n<meta http-equiv=\"Content-T'\n",
-      "url: http://371692975.eubos-med.pl/%E5%8F%B0-%E5%8D%8A-%E7%89%B9-%E6%96%AF-%E6%8B%89.html/\n",
-      "content-type: text/html; charset=UTF-8\n",
-      "content: b'<form action=\"http://371692975.eubos-med.pl/%E5%8F%B0-%E5%8D%8A-%E7%89%B9-%E6%96%AF-%E6%8B%89.html/\" method=\"post\">\\n<button type=\"submit\">Please click if you not a BOT</button>\\n</form>\\n34.236.134.129 '\n",
-      "url: http://3ai6.121wk.com/doc-902.html\n",
-      "content-type: text/html; charset=UTF-8\n",
-      "content: b'<!DOCTYPE html>\\n<html>\\n<head>\\n    <meta charset=\"utf-8\" />\\n    <meta http-equiv=\"X-UA-Compatible\" content=\"IE=edge,chrome=1\" />\\n    <meta name=\"keywords\" content=\"\\xe8\\x89\\xbe\\xe7\\x91\\x9e,\\xe5\\x92\\xa8\\xe8\\xaf\\xa2,2021,\\xe4\\xb8\\xad\\xe5\\x9b\\xbd,\\xe4\\xba\\xba\\xe7\\x89\\xa9,\\xe8\\x81\\x94\\xe7'\n",
-      "url: http://3tilbudnu.dk/elektriker/praestemark-holbaek-7/\n",
-      "content-type: text/html; charset=UTF-8\n",
-      "content: b'<!DOCTYPE html>\\n<html lang=\"da-DK\">\\n<head >\\n<meta charset=\"UTF-8\" />\\n<meta name=\"viewport\" content=\"width=device-width, initial-scale=1\" />\\n<title>Elektriker i Pr\\xc3\\xa6stemark Holb\\xc3\\xa6k \\xe2\\x87\\x92 F\\xc3\\xa5 3 gratis og '\n",
-      "url: http://401605997.skvazhinann.ru/%D8%A7%D9%84%D9%82%D8%B1%D9%8A%D8%AF%D8%A7%D8%AA.html/\n",
-      "content-type: text/html; charset=UTF-8\n",
-      "content: b'<form action=\"http://401605997.skvazhinann.ru/%D8%A7%D9%84%D9%82%D8%B1%D9%8A%D8%AF%D8%A7%D8%AA.html/\" method=\"post\">\\n<button type=\"submit\">Please click if you not a BOT</button>\\n</form>\\n44.192.20.240 '\n",
-      "url: http://422960136.euphoria4u.ru/%E5%9B%9B%E5%AD%A3-%E8%A2%AB-%E6%98%AF-%E4%BB%80%E9%BA%BC.html/\n",
-      "content-type: text/html; charset=UTF-8\n",
-      "content: b'<form action=\"http://422960136.euphoria4u.ru/%E5%9B%9B%E5%AD%A3-%E8%A2%AB-%E6%98%AF-%E4%BB%80%E9%BA%BC.html/\" method=\"post\">\\n<button type=\"submit\">Please click if you not a BOT</button>\\n</form>\\n35.175'\n",
-      "url: http://432722.com/11134029194.html\n",
-      "content-type: text/html; charset=UTF-8\n",
-      "content: b'<html mip=\"\">\\n<head>\\n<meta charset=\"utf-8\" />\\n<title>\\xe5\\xbc\\x80\\xe9\\x97\\xa8\\xe7\\xba\\xa2-\\xe6\\xbb\\xa1\\xe6\\xb1\\x9f\\xe7\\xba\\xa2\\xef\\xbc\\x81</title>\\n<meta name=\"keywords\" content=\"404 Not Found\"/>\\n<meta name=\"description\" content=\"404 Not Found\" />\\n<script>\\n(functi'\n",
-      "url: http://435027.com/info/1679823\n",
-      "content-type: text/html; charset=utf-8\n",
-      "content: b'<!DOCTYPE html>\\n<html lang=\"en\">\\n<head>\\n\\t<meta charset=\"utf-8\">\\n\\t<meta name=\"viewport\" content=\"width=device-width, initial-scale=1.0\" />\\n\\t<title>065\\xe6\\x9c\\x9f:\\xe3\\x80\\x96\\xe4\\xbb\\xbb\\xe6\\x88\\x91\\xe7\\x99\\xbc\\xe6\\x9c\\x80\\xe9\\xab\\x98\\xe5\\xbf\\x83\\xe6\\xb0\\xb4\\xe3\\x80\\x97\\xe7\\xb2\\xbe\\xe9\\x81\\xb8\\xe3\\x80\\x90\\xe7\\xbb\\x9d\\xe6\\x9d\\x80\\xe5\\x8d\\x8a\\xe6\\xb3'\n",
-      "url: http://4promoproducts.com/nm/t-shirts-caballo.php\n",
-      "content-type: text/html; charset=UTF-8\n",
-      "content: b'<!DOCTYPE html>\\n<html xmlns=\"http://www.w3.org/1999/xhtml\">\\n<head><meta content=\"HTML Tidy for Linux (vers 25 March 2009), see www.w3.org\" name=\"generator\" /><script type=\"text/javascript\">\\n \\n //<![CD'\n",
-      "url: http://4put.ru/pics/s_50_17/r_140_13/u_4_4/g_11_1/small_8475/\n",
-      "content-type: text/html; charset=WINDOWS-1251\n",
-      "content: b'<!DOCTYPE HTML PUBLIC \"-//W3C//DTD HTML 4.0 Transitional//EN\">\\r\\n<HTML>\\r\\n<HEAD>\\r\\n<TITLE>2008.10.10. 22-37. 1 \\xea\\xe0\\xed\\xe0\\xeb. \\xc3\\xee\\xf0\\xe4\\xee\\xed \\xca\\xe8\\xf5\\xee\\xf2 (\\xb95). \\xc2.\\xc5\\xf0\\xee\\xf4\\xe5\\xe5\\xe2 (sl), \\xef\\xf0\\xe5\\xe2\\xfc\\xfe / \\xd1\\xcc\\xc8. \\xd2\\xc2. 1 \\xea\\xe0\\xed\\xe0\\xeb. / \\xca\\xe0\\xf0\\xf2\\xe8\\xed\\xea\\xe8 \\xef\\xee\\xeb\\xfc\\xe7\\xee\\xe2\\xe0\\xf2\\xe5\\xeb\\xff'\n",
-      "url: http://52eshu.com/89807609.html\n",
-      "content-type: text/html; charset=UTF-8\n",
-      "content: b'<html mip=\"\">\\n<head>\\n<meta charset=\"utf-8\" />\\n<title>\\xe8\\xae\\xbf\\xe9\\x97\\xae\\xe5\\xae\\x89\\xe5\\x85\\xa8</title>\\n<meta name=\"keywords\" content=\"\\xe8\\xae\\xbf\\xe9\\x97\\xae\\xe5\\xae\\x89\\xe5\\x85\\xa8\"/>\\n<meta name=\"description\" content=\"\\xe8\\xae\\xbf\\xe9\\x97\\xae\\xe5\\xae\\x89\\xe5\\x85\\xa8\" />\\n<script>\\n(function(){\\nvar bp'\n",
-      "url: http://534925480.administracja-publiczna.edu.pl/%E5%8F%B0%E4%B8%AD-%E5%B8%82-%E6%9D%B1%E5%8D%80-%E6%9D%B1-%E8%8B%B1-%E5%8D%81-%E4%BA%94-%E8%A1%97.html/\n",
-      "content-type: text/html; charset=UTF-8\n",
-      "content: b'<form action=\"http://534925480.administracja-publiczna.edu.pl/%E5%8F%B0%E4%B8%AD-%E5%B8%82-%E6%9D%B1%E5%8D%80-%E6%9D%B1-%E8%8B%B1-%E5%8D%81-%E4%BA%94-%E8%A1%97.html/\" method=\"post\">\\n<button type=\"subm'\n",
-      "url: http://5funny.blog.fc2.com/img/IMG_0833.jpg/\n",
-      "content-type: text/html;charset=utf-8\n",
-      "content: b'<!DOCTYPE html>\\n<html>\\n<head>\\n<meta charset=\"utf-8\">\\n<meta http-equiv=\"X-UA-Compatible\" content=\"IE=edge\">\\n<meta name=\"viewport\" content=\"width=device-width,initial-scale=1\">\\n<title>\\xef\\xbc\\x95\\xe5\\x8c\\xb9\\xe3\\x81\\xae Funny St'\n",
-      "url: http://6aly.livertransplantation.net/contact\n",
-      "content-type: text/html; charset=utf-8; charset=utf-8\n",
-      "content: b'<!DOCTYPE html PUBLIC \"-//W3C//DTD XHTML+RDFa 1.0//EN\" \"http://www.w3.org/MarkUp/DTD/xhtml-rdfa-1.dtd\">\\n<html dir=\"ltr\" version=\"XHTML+RDFa 1.0\" xml:lang=\"en\" xmlns=\"http://www.w3.org/1999/xhtml\" xmln'\n",
-      "url: http://703933260.barbiestyle.it/%E4%BF%9D-%E5%8F%AF-%E6%98%8E.html/\n",
-      "content-type: text/html; charset=UTF-8\n",
-      "content: b'<form action=\"http://703933260.barbiestyle.it/%E4%BF%9D-%E5%8F%AF-%E6%98%8E.html/\" method=\"post\">\\n<button type=\"submit\">Please click if you not a BOT</button>\\n</form>\\n3.236.223.106 CCBot/2.0 (https://'\n",
-      "url: http://71lady.net/48365906.html\n",
-      "content-type: text/html; charset=UTF-8\n",
-      "content: b'<html><head><title>71lady.net</title><meta name=\"viewport\" content=\"width=device-width, initial-scale=1.0, minimum-scale=1.0, maximum-scale=1.0, user-scalable=no\"/><script src=\"http://libs.baidu.com/j'\n",
-      "url: http://8008202020.alacte.com/a/639j9_357217.html\n",
-      "content-type: text/html; charset=UTF-8\n",
-      "content: b'<html><head><script type=text/javascript src=\"/xynew.js\"></script><script type=text/javascript src=\"/ts.js\"></script></head><body bgcolor=\"white\"><center><h1>404 Not Found</h1></center><hr><center>ngi'\n",
-      "url: http://83863.webhosting22.1blu.de/omega17/index.php/UsersOnlineList/?s=5de99930b6d5d985129a7bd0a6d3658264fbbe4e\n",
-      "content-type: text/html; charset=UTF-8\n",
-      "content: b'<!DOCTYPE html>\\r\\n<html dir=\"ltr\" lang=\"de\">\\r\\n<head>\\r\\n\\t<title>Benutzer online - Omega Allianz</title>\\r\\n\\t\\r\\n\\t<base href=\"http://83863.webhosting22.1blu.de/omega17/\" />\\r\\n<meta charset=\"utf-8\" />\\r\\n<meta na'\n",
-      "url: http://8bithorse.blogspot.com/2014/12/the-legend-of-zelda-101.html\n",
-      "content-type: text/html; charset=UTF-8\n",
-      "content: b\"<!DOCTYPE html>\\n<html dir='ltr' xmlns='http://www.w3.org/1999/xhtml' xmlns:b='http://www.google.com/2005/gml/b' xmlns:data='http://www.google.com/2005/gml/data' xmlns:expr='http://www.google.com/2005/\"\n",
-      "url: http://8oix2.fabulousshontay.com/library\n",
-      "content-type: text/html; charset=UTF-8; charset=utf-8\n",
-      "content: b'<!DOCTYPE html>\\n<html lang=\"en\"><head>\\n<meta name=\"applicable-device\" content=\"pc,mobile\"/>\\n<!-- Landmark College\\'s Google Tag Manager -->\\n<script>(function(w,d,s,l,i){w[l]=w[l]||[];w[l].push({\\'gtm.st'\n",
-      "url: http://9.landmark-church.com/supporting-shipping/\n",
-      "content-type: text/html; charset=UTF-8; charset=utf-8\n",
-      "content: b'<!DOCTYPE html>\\n<html lang=\"en-GB\"><head>\\n<meta name=\"applicable-device\" content=\"pc,mobile\"/>\\n\\n<script type=\"00f6e064c1242f15ee4f122a-text/javascript\">\\n        (function(w, d, s, l, i) {\\n            '\n",
-      "url: http://90-tage-am-see.de/product/D/582273\n",
-      "content-type: text/html; charset=UTF-8\n",
-      "content: b'<!DOCTYPE html>\\r\\n<html lang=\"ja\">\\r\\n<head>\\r\\n<meta http-equiv=\"Content-Type\" content=\"text/html; charset=UTF-8\">\\r\\n<meta http-equiv=\"X-UA-Compatible\" content=\"IE=edge\">\\r\\n<meta name=\"format-detection\" con'\n",
-      "url: http://911forum.org.uk/viewtopic.php?t=21970&sid=2cf3a3f8df71fb7cc5deeb25a0bea34d&start=30\n",
-      "content-type: text/html; charset=UTF-8\n",
-      "content: b'<!DOCTYPE html>\\n<html dir=\"ltr\" lang=\"en-gb\">\\n<head>\\n<meta charset=\"utf-8\" />\\n<meta http-equiv=\"X-UA-Compatible\" content=\"IE=edge\">\\n<meta name=\"viewport\" content=\"width=device-width, initial-scale=1\" '\n",
-      "url: http://9243591.compuguide.be/warmtepomp-kopen/\n",
-      "content-type: text/html; charset=UTF-8\n",
-      "content: b'<!doctype html>\\n<html lang=\"nl\">\\n<head>\\n\\n<!-- Global site tag (gtag.js) - Google Analytics -->\\n<script async src=\"https://www.googletagmanager.com/gtag/js?id=G-1RKG3F1CVB\"></script>\\n<script>\\n  window.'\n",
-      "url: http://98tang028.xyz/index.php/vod/play/id/111686/sid/1/nid/1.html\n",
-      "content-type: text/html; charset=utf-8\n",
-      "content: b'<!doctype html><html lang=\"en\"><head><meta charset=\"utf-8\"><meta http-equiv=\"X-UA-Compatible\" content=\"IE=edge,chrome=1\"><meta content=\"width=device-width, initial-scale=1.0, user-scalable=0\" name=\"vi'\n",
-      "url: http://9kt7.tyjyjt.net/products-services/data-networking/servers/\n",
-      "content-type: text/html; charset=UTF-8; charset=utf-8\n",
-      "content: b'<!DOCTYPE html>\\n<html class=\"no-js\" lang=\"en\"><head>\\n<meta name=\"applicable-device\" content=\"pc,mobile\"/>\\n\\t<meta charset=\"utf-8\"/>\\n\\t<meta content=\"width=device-width, initial-scale=1, maximum-scale=1,'\n",
-      "url: http://a-fleur-de-peau.fr/archives/eyes-of-the-day/midnight-crawl/\n",
-      "content-type: text/html; charset=UTF-8\n",
-      "content: b'\\r\\n\\r\\n<!DOCTYPE html>\\r\\n<!--[if IE 9]><html class=\"ie9 no-mobile-device\" lang=\"fr-FR\"> <![endif]-->\\r\\n<!--[if gt IE 9]><!--> <html class=\"no-mobile-device\" lang=\"fr-FR\"> <!--<![endif]-->\\r\\n\\r\\n<head>\\r\\n\\r\\n\\t<me'\n",
-      "url: http://aappma-sarrebourg.eu/blog/craigslist-ic.html\n",
-      "content-type: text/html\n",
-      "content: b''\n",
-      "url: http://abali.ru/tag/vozdushnye-sily/\n",
-      "content-type: text/html; charset=UTF-8\n",
-      "content: b'<!DOCTYPE html>\\n<html lang=\"ru-RU\" class=\"\" data-skin=\"light\">\\n<head>\\n\\t<meta charset=\"UTF-8\" />\\n\\t<link rel=\"profile\" href=\"http://gmpg.org/xfn/11\" />\\n\\t<title>\\xd0\\xb2\\xd0\\xbe\\xd0\\xb7\\xd0\\xb4\\xd1\\x83\\xd1\\x88\\xd0\\xbd\\xd1\\x8b\\xd0\\xb5 \\xd1\\x81\\xd0\\xb8\\xd0\\xbb\\xd1\\x8b &#8212; Abali.'\n",
-      "url: http://abanagazetesi.org/harmasonda-parke-calismalari/\n",
-      "content-type: text/html; charset=UTF-8\n",
-      "content: b'<!DOCTYPE html>\\n<html lang=\"tr\"><head>\\n<meta charset=\"UTF-8\" />\\n<meta name=\"viewport\" content=\"width=device-width, initial-scale=1\" />\\n<meta name=\"theme-color\" content=\"#d80000\" />\\n<title>  HARMASON\\xe2\\x80'\n",
-      "url: http://abcfec.performancepublishing.net/services/commercial-buildout-construction/l/487\n",
-      "content-type: text/html; charset=utf-8\n",
-      "content: b'\\r\\n\\r\\n<!DOCTYPE html>\\r\\n<!--[if lt IE 7 ]><html class=\"no-js ie6 ie\" lang=\"en\"><![endif]-->\\r\\n<!--[if IE 7 ]><html class=\"no-js ie7 ie\" lang=\"en\"><![endif]-->\\r\\n<!--[if IE 8 ]><html class=\"no-js ie8 ie\" la'\n",
-      "url: http://abolition-ms.org/es/recursos/newsletter/boletin-de-noticias-noviembre-2023/\n",
-      "content-type: text/html; charset=UTF-8\n",
-      "content: b' <!doctype html>\\r\\n<html lang=\"es-ES\">\\r\\n<head>\\r\\n\\t<meta charset=\"UTF-8\">\\r\\n\\t<meta name=\"viewport\" content=\"width=device-width, initial-scale=1.0, user-scalable=no\">\\r\\n\\t<link rel=\"profile\" href=\"http://gmp'\n",
-      "url: http://abooktopia.weebly.com/reviews/the-bone-season-by-samantha-shannon5866611\n",
-      "content-type: text/html; charset=UTF-8\n",
-      "content: b'<!DOCTYPE html>\\n<html lang=\"en\" xmlns:fb=\"http://ogp.me/ns/fb#\">\\n\\t<head>\\n\\t\\t<title>THE BONE SEASON BY SAMANTHA SHANNON - Abooktopia</title><meta property=\"og:site_name\" content=\"Abooktopia\" />\\n<meta pr'\n",
-      "url: http://abschaffung-der-jagd.at/reaktionen-jaeger-anregung-diskussion.htm\n",
-      "content-type: text/html\n",
-      "content: b'<html>\\n\\n<head>\\n<meta http-equiv=\"Content-Type\" content=\"text/html; charset=windows-1252\">\\n<meta name=\"GENERATOR\" content=\"Microsoft FrontPage 4.0\">\\n<meta name=\"ProgId\" content=\"FrontPage.Editor.Docume'\n",
-      "url: http://absoku072.com/blog-entry-3039.html\n",
-      "content-type: text/html; charset=UTF-8\n",
-      "content: b'<!DOCTYPE HTML>\\n<html>\\n<head>\\n<meta charset=\"utf-8\">\\n<meta http-equiv=\"X-UA-Compatible\" content=\"IE=edge\">\\n<link rel=\"stylesheet\" href=\"http://absoku072.com/wp-content/themes/abusoku/style.css?=202402'\n",
-      "url: http://absurd.blogo.jp/archives/49496350.html\n",
-      "content-type: text/html; charset=utf-8\n",
-      "content: b'<?xml version=\"1.0\" encoding=\"UTF-8\"?>\\n<!DOCTYPE html PUBLIC \"-//W3C//DTD XHTML 1.0 Transitional//EN\" \"http://www.w3.org/TR/xhtml1/DTD/xhtml1-transitional.dtd\">\\n<html xmlns=\"http://www.w3.org/1999/xht'\n",
-      "url: http://absurd.blogo.jp/archives/52797615.html\n",
-      "content-type: text/html; charset=utf-8\n",
-      "content: b'<?xml version=\"1.0\" encoding=\"UTF-8\"?>\\n<!DOCTYPE html PUBLIC \"-//W3C//DTD XHTML 1.0 Transitional//EN\" \"http://www.w3.org/TR/xhtml1/DTD/xhtml1-transitional.dtd\">\\n<html xmlns=\"http://www.w3.org/1999/xht'\n",
-      "url: http://academy.reihan-studio.com/become-an-instructor/\n",
-      "content-type: text/html; charset=UTF-8\n",
-      "content: b'<!DOCTYPE html>\\n<html dir=\"rtl\" lang=\"fa-IR\" >\\n<head>\\n\\t\\t<meta charset=\"UTF-8\">\\n\\t<meta name=\"viewport\" content=\"width=device-width, initial-scale=1\">\\n\\t<link rel=\"profile\" href=\"https://gmpg.org/xfn/11\"'\n",
-      "url: http://academybyga.com/2021/04/01/test-post/\n",
-      "content-type: text/html; charset=UTF-8\n",
-      "content: b'<!doctype html>\\n\\n\\n<!--[if lt IE 7]><html class=\"no-js lt-ie9 lt-ie8 lt-ie7 blog post \" lang=\"en\"> <![endif]-->\\n<!--[if IE 7]>  <html class=\"no-js lt-ie9 lt-ie8 blog post \" lang=\"en\"> <![endif]-->\\n<!--'\n",
-      "url: http://academybyga.com/2021/12/11/where-to-purchase-zithromax-500-mg-without-prescription/\n",
-      "content-type: text/html; charset=UTF-8\n",
-      "content: b'<!doctype html>\\n\\n\\n<!--[if lt IE 7]><html class=\"no-js lt-ie9 lt-ie8 lt-ie7 blog post \" lang=\"en\"> <![endif]-->\\n<!--[if IE 7]>  <html class=\"no-js lt-ie9 lt-ie8 blog post \" lang=\"en\"> <![endif]-->\\n<!--'\n",
-      "url: http://accept.bison.net/en/product.6303671\n",
-      "content-type: text/html;charset=utf-8\n",
-      "content: b'<!doctype html>\\r\\n<html xmlns=\"http://www.w3.org/1999/xhtml\" lang=\"en\" class=\"brand-site\">\\r\\n    <head>\\r\\n        <meta charset=\"utf-8\"/>\\r\\n<title>Bison | Product</title>\\r\\n<meta http-equiv=\"X-UA-Compatibl'\n",
-      "url: http://accm.de/\n",
-      "content-type: text/html\n",
-      "content: b'<!DOCTYPE html PUBLIC \"-//W3C//DTD XHTML 1.0 Frameset//EN\" \"http://www.w3.org/TR/xhtml1/DTD/xhtml1-frameset.dtd\">\\n\\n<html>\\n<head>\\n<title>EcoTaxes GmbH Steuerberatungsgesellschaft</title>\\n<meta name=\"ke'\n",
-      "url: http://accuratus.co.za/?MA\n",
-      "content-type: text/html; charset=UTF-8\n",
-      "content: b'<!DOCTYPE html><html><head><meta http-equiv=\"Content-type\" content=\"text/html; charset=UTF-8\" /><meta name=\"viewport\" content=\"width=device-width, initial-scale=1.0\" /><link rel=\"stylesheet\" href=\"/_a'\n",
-      "url: http://ace.armor.kiev.ua/forum/viewforum.php?f=1&sid=10f2b222d08c71a6df0829eabc165cac\n",
-      "content-type: text/html; charset=UTF-8\n",
-      "content: b'<!DOCTYPE html>\\n<html dir=\"ltr\" lang=\"ru\">\\n<head>\\n<meta charset=\"utf-8\" />\\n<meta http-equiv=\"X-UA-Compatible\" content=\"IE=edge\">\\n\\n<title>\\xd0\\x91\\xd1\\x80\\xd0\\xbe\\xd0\\xbd\\xd1\\x8f \\xd0\\xb2 72 \\xd0\\xbc\\xd0\\xb0\\xd1\\x81\\xd1\\x88\\xd1\\x82\\xd0\\xb0\\xd0\\xb1\\xd0\\xb5 - \\xd0\\xa4\\xd0\\xbe\\xd1\\x80\\xd1\\x83\\xd0\\xbc \\xd0\\xbc\\xd0\\xb0\\xd1\\x81\\xd1\\x88\\xd1\\x82\\xd0\\xb0\\xd0\\xb1\\xd0\\xbd'\n",
-      "url: http://ace.mu.nu/archives/400450.php\n",
-      "content-type: text/html; charset=UTF-8\n",
-      "content: b'<!DOCTYPE html PUBLIC \"-//W3C//DTD XHTML 1.0 Transitional//EN\" \"http://www.w3.org/TR/xhtml1/DTD/xhtml1-transitional.dtd\">\\n\\n<html xmlns=\"http://www.w3.org/1999/xhtml\">\\n<head>\\n<meta http-equiv=\"Content-'\n",
-      "url: http://acekitchenplace.com/contact-us/\n",
-      "content-type: text/html; charset=UTF-8\n",
-      "content: b'<!doctype html>\\r\\n<html lang=\"en-US\">\\r\\n  <head>\\r\\n    <meta charset=\"UTF-8\" />\\r\\n    <meta name=\"viewport\" content=\"width=device-width, initial-scale=1\" />\\r\\n    <link rel=\"profile\" href=\"https://gmpg.org'\n",
-      "url: http://acervo.if.usp.br/index.php/informationobject/browse?subjects=&sort=lastUpdated&collection=20485&places=852&showAdvanced=1&topLod=0\n",
-      "content-type: text/html; charset=utf-8\n",
-      "content: b'<!DOCTYPE html>\\n<html lang=\"pt\" dir=\"ltr\">\\n  <head>\\n    <meta http-equiv=\"Content-Type\" content=\"text/html; charset=utf-8\" />\\n<meta http-equiv=\"X-Ua-Compatible\" content=\"IE=edge,chrome=1\" />\\n    <meta'\n",
-      "url: http://achatina.unnat.ru/Photo/Page.Eng/Photo18.htm\n",
-      "content-type: text/html; charset=windows-1251\n",
-      "content: b'<html>\\n\\n<head>\\n<style>\\nA.t1:link { color:\"#00FFFF\"; text-decoration: none}\\nA.t1:visited { color:white; text-decoration: none}\\nA.t1:hover {color:red; text-decoration: none}\\n</style>\\n\\n\\n\\n<meta NAME=\"Desc'\n",
-      "url: http://achfin.ru/2015-01-28-12-33-25/utverzhdennye-parametry-byudzheta-goroda/2015-2017-gody-6\n",
-      "content-type: text/html; charset=utf-8\n",
-      "content: b'<!DOCTYPE html PUBLIC \"-//W3C//DTD XHTML 1.0 Transitional//EN\" \"http://www.w3.org/TR/xhtml1/DTD/xhtml1-transitional.dtd\">\\r\\n<html xmlns=\"http://www.w3.org/1999/xhtml\" xml:lang=\"ru-ru\" lang=\"ru-ru\" >\\r\\n<'\n",
-      "url: http://achfin.ru/o-byudzhete-2/normativnaya-baza/normativnaya-baza-3\n",
-      "content-type: text/html; charset=utf-8\n",
-      "content: b'<!DOCTYPE html PUBLIC \"-//W3C//DTD XHTML 1.0 Transitional//EN\" \"http://www.w3.org/TR/xhtml1/DTD/xhtml1-transitional.dtd\">\\r\\n<html xmlns=\"http://www.w3.org/1999/xhtml\" xml:lang=\"ru-ru\" lang=\"ru-ru\" >\\r\\n<'\n",
-      "url: http://acikerisim.agu.edu.tr/xmlui/browse?type=author&value=Ulu%C4%9F%2C+%C3%96zden+Melis\n",
-      "content-type: text/html;charset=utf-8\n",
-      "content: b'<!DOCTYPE html>\\n            <!--[if lt IE 7]> <html class=\"no-js lt-ie9 lt-ie8 lt-ie7\" lang=\"en\"> <![endif]-->\\n            <!--[if IE 7]>    <html class=\"no-js lt-ie9 lt-ie8\" lang=\"en\"> <![endif]-->\\n '\n",
-      "url: http://actionforswifts.blogspot.com/2020/01/modified-schwegler-1mf.html\n",
-      "content-type: text/html; charset=UTF-8\n",
-      "content: b\"<!DOCTYPE html>\\n<html class='v2' dir='ltr' xmlns='http://www.w3.org/1999/xhtml' xmlns:b='http://www.google.com/2005/gml/b' xmlns:data='http://www.google.com/2005/gml/data' xmlns:expr='http://www.googl\"\n",
-      "url: http://acworthtourism.acworth.org/directory-things_to_do/listing/acworth-depot-park/\n",
-      "content-type: text/html; charset=UTF-8\n",
-      "content: b'<!DOCTYPE html>\\n<html class=\"avada-html-layout-wide avada-html-header-position-top\" lang=\"en-US\" prefix=\"og: http://ogp.me/ns# fb: http://ogp.me/ns/fb# og: http://ogp.me/ns# business: http://ogp.me/ns'\n",
-      "url: http://adaptanet.com.br/cliente/index.php?rp=%2Fstore%2Fstreaming%2Fs&carttpl=standard_cart&language=spanish\n",
-      "content-type: text/html; charset=UTF-8\n",
-      "content: b'Site error: the <a href=\"http://www.ioncube.com\">ionCube</a> PHP Loader needs to be installed. This is a widely used PHP extension for running ionCube protected PHP code, website security and malware '\n",
-      "url: http://adeera.com.ar/newsroom/archivosrevistas/ADEERA_43.pdf#page=53\n",
-      "content-type: application/pdf\n",
-      "content: b'%PDF-1.6\\r%\\xe2\\xe3\\xcf\\xd3\\r\\n1 0 obj<</CropBox[0.0 0.0 552.756 779.528]/Parent 825 0 R/Contents 2 0 R/Rotate 0/BleedBox[0.0 0.0 552.756 779.528]/ArtBox[0.0 0.0 552.756 779.528]/Group 22 0 R/MediaBox[0.0 0.0 552.75'\n",
-      "url: http://adhkiindonesia.or.id/580-Lia-Noviana/\n",
-      "content-type: text/html; charset=UTF-8\n",
-      "content: b'<!DOCTYPE html>\\n<html dir=\"ltr\" lang=\"id\" prefix=\"og: https://ogp.me/ns#\">\\n\\n<head>\\n\\t<meta charset=\"UTF-8\">\\n\\t<meta name=\"viewport\" content=\"width=device-width, initial-scale=1, minimum-scale=1\">\\n\\t<link'\n",
-      "url: http://adr.fr/000013105314313.html\n",
-      "content-type: text/html; charset=UTF-8\n",
-      "content: b''\n",
-      "url: http://adr.fr/00001310534329.html\n",
-      "content-type: text/html; charset=UTF-8\n",
-      "content: b''\n",
-      "url: http://advantagepestonline.com/tag/plants/\n",
-      "content-type: text/html; charset=UTF-8\n",
-      "content: b'<!doctype html>\\r\\n<html lang=\"en-US\">\\r\\n<head>\\r\\n\\t<meta charset=\"UTF-8\">\\r\\n\\t<meta name=\"viewport\" content=\"width=device-width, initial-scale=1.0\">\\r\\n\\t<meta http-equiv=\"X-UA-Compatible\" content=\"IE=edge\">\\r\\n'\n",
-      "url: http://aestheticbeards.com/lm1/\n",
-      "content-type: text/html; charset=utf-8\n",
-      "content: b'<!DOCTYPE HTML PUBLIC \"-//W3C//DTD HTML 4.01 Transitional//EN\" \"http://www.w3c.org/TR/1999/REC-html401-19991224/loose.dtd\">\\r\\n<HTML xmlns=\"http://www.w3.org/1999/xhtml\">\\r\\n\\r\\n<HEAD>\\r\\n    <meta http-equiv'\n",
-      "url: http://afub.cppkw.com/product/llyb/vzllj.html\n",
-      "content-type: text/html; charset=utf-8\n",
-      "content: b'<!DOCTYPE html><html xmlns=\"http://www.w3.org/1999/xhtml\"><head>\\n<title>V\\xe9\\x94\\xa5\\xe6\\xb5\\x81\\xe9\\x87\\x8f\\xe8\\xae\\xa1\\xe5\\x8e\\x9f\\xe7\\x90\\x86,V\\xe9\\x94\\xa5\\xe5\\x9e\\x8b\\xe6\\xb5\\x81\\xe9\\x87\\x8f\\xe8\\xae\\xa1,\\xe5\\x86\\x85\\xe9\\x94\\xa5\\xe6\\xb5\\x81\\xe9\\x87\\x8f\\xe8\\xae\\xa1-\\xe6\\xb1\\x9f\\xe8\\x8b\\x8f&#28909;&#21338;RB88&#20307;&#32946;\\xe4\\xbb\\xaa\\xe8\\xa1\\xa8\\xe6\\x9c\\x89\\xe9\\x99\\x90\\xe5\\x85\\xac\\xe5\\x8f\\xb8</title>\\n<link '\n",
-      "url: http://agenciahabitatge.gencat.cat/wps/portal/serveis/convenis%20i%20contractacio/!ut/p/z0/fc09D4IwEAbgvwIDo7kLFoNjowlCIDFOtYspTYUqaflo0J9vSVjxtsu9z3vAgQE3YtaNcNoa0fn9zg-P9HrMLimJS7wVBGmSn7M8JXtEAgVwH8CNobg0xGN1qhrgvXDtTpunBfZR9Ur_dHuqX8PAKXBpjVNfB0y0UooIvY9wUuOs9BShv87K6CnQwRIchXRCarvxe2XAtlj_5nXSzSWlYfgD96k18Q!!/\n",
-      "content-type: text/html; charset=UTF-8\n",
-      "content: b'<!DOCTYPE html>\\n<html lang=\"en\" >\\n<head>\\n<!-- Google Tag Manager -->\\n<script>(function(w,d,s,l,i){w[l]=w[l]||[];w[l].push({\\'gtm.start\\':\\nnew Date().getTime(),event:\\'gtm.js\\'});var f=d.getElementsByTagNa'\n",
-      "url: http://agnenterprises.com/product/optical-bench-metal-double-rod-agn-make-s-s-rod-1-5meter-long-half-shaper-riders/\n",
-      "content-type: text/html; charset=UTF-8\n",
-      "content: b'<!DOCTYPE html>\\n<html lang=\"en-US\">\\n<head>\\n\\t<meta charset=\"UTF-8\">\\n\\t<link rel=\"profile\" href=\"http://gmpg.org/xfn/11\">\\n\\t<link rel=\"pingback\" href=\"http://agnenterprises.com/xmlrpc.php\">\\n\\n\\t\\t\\t<script>wi'\n",
-      "url: http://agorapatos.com/2020/04/14/coronavirus-camara-dos-deputados-aprova-apoio-a-estados-e-municipios/\n",
-      "content-type: text/html; charset=UTF-8\n",
-      "content: b'<!doctype html>\\n<html lang=\"pt-BR\">\\n<head>\\n\\t<meta charset=\"UTF-8\">\\n\\t<meta name=\"viewport\" content=\"width=device-width, initial-scale=1, shrink-to-fit=no\">\\n\\t<link rel=\"profile\" href=\"https://gmpg.org/x'\n",
-      "url: http://agro-product.ru/index.php?option=com_adsmanager&page=show_category&catid=114&order=0&expand=0&Itemid=39\n",
-      "content-type: text/html\n",
-      "content: b'<!DOCTYPE html PUBLIC \"-//W3C//DTD XHTML 1.0 Transitional//EN\" \\n\"http://www.w3.org/TR/xhtml1/DTD/xhtml1-transitional.dtd\">\\n<html xmlns=\"http://www.w3.org/1999/xhtml\" lang=\"ru\" \\nxml:lang=\"ru\"\\n<head>\\n<m'\n",
-      "url: http://ahirukacho.blog81.fc2.com/?mode=edit&rno=5230\n",
-      "content-type: text/html;charset=utf-8\n",
-      "content: b'<?xml version=\"1.0\" encoding=\"EUC-JP\"?>\\r\\n<!DOCTYPE html PUBLIC \"-//W3C//DTD XHTML 1.0 Transitional//EN\" \"http://www.w3.org/TR/xhtml1/DTD/xhtml1-transitional.dtd\">\\r\\n<html xmlns=\"http://www.w3.org/1999/'\n",
-      "url: http://ahshfshygzc.com/shfw/hyindex.xp?doAction=news&menuid=0045\n",
-      "content-type: text/html;charset=GBK\n",
-      "content: b'<!DOCTYPE html PUBLIC \"-//W3C//DTD XHTML 1.0 Transitional//EN\" \"http://www.w3.org/TR/xhtml1/DTD/xhtml1-transitional.dtd\">\\r\\n<html xmlns=\"http://www.w3.org/1999/xhtml\">\\r\\n<head>\\r\\n<meta http-equiv=\"Conten'\n",
-      "url: http://aindahing.info/diary/ain-dah-ing/winter-sale/\n",
-      "content-type: text/html; charset=UTF-8\n",
-      "content: b'\\n<!DOCTYPE html>\\n<html lang=\"ja\">\\n<head>\\n<!-- Basic Page Needs\\n================================================== -->\\n<meta charset=\"UTF-8\">\\n<meta name=\"viewport\" content=\"width=device-width, initial-'\n",
-      "url: http://airambulanceworld.com/medical-flight/arkansas/strong/\n",
-      "content-type: text/html; charset=UTF-8\n",
-      "content: b'<!DOCTYPE html>\\r\\n<!--[if lt IE 7]>      <html class=\"no-js lt-ie9 lt-ie8 lt-ie7\"> <![endif]-->\\r\\n<!--[if IE 7]>         <html class=\"no-js lt-ie9 lt-ie8\"> <![endif]-->\\r\\n<!--[if IE 8]>         <html cla'\n",
-      "url: http://airstation734.blog.fc2.com/blog-entry-1360.html\n",
-      "content-type: text/html;charset=utf-8\n",
-      "content: b'<!DOCTYPE html PUBLIC \"-//W3C//DTD XHTML 1.0 Transitional//EN\" \"http://www.w3.org/TR/xhtml1/DTD/xhtml1-transitional.dtd\">\\r\\n<html xmlns=\"http://www.w3.org/1999/xhtml\"  >\\r\\n<head>\\r\\n<meta http-equiv=\"Cont'\n",
-      "url: http://airstation734.blog.fc2.com/blog-entry-2141.html\n",
-      "content-type: text/html;charset=utf-8\n",
-      "content: b'<!DOCTYPE html PUBLIC \"-//W3C//DTD XHTML 1.0 Transitional//EN\" \"http://www.w3.org/TR/xhtml1/DTD/xhtml1-transitional.dtd\">\\r\\n<html xmlns=\"http://www.w3.org/1999/xhtml\"  >\\r\\n<head>\\r\\n<meta http-equiv=\"Cont'\n",
-      "url: http://aisaikamasa.blog91.fc2.com/blog-entry-201.html\n",
-      "content-type: text/html;charset=utf-8\n",
-      "content: b'<!DOCTYPE html PUBLIC \"-//W3C//DTD XHTML 1.0 Transitional//EN\" \"http://www.w3.org/TR/xhtml1/DTD/xhtml1-transitional.dtd\">\\r\\n<html xmlns=\"http://www.w3.org/1999/xhtml\" xml:lang=\"ja\" lang=\"ja\">\\r\\n<head>\\r\\n'\n",
-      "url: http://aiuas.cn/En_Ct_index_gci_16.html\n",
-      "content-type: text/html; charset=utf-8\n",
-      "content: b'<!DOCTYPE html><!-- saved from url=(0035)http://www.coldec.cn/index.do?store --><html lang=\"zh-cn\"><head><!--  --><meta http-equiv=\"Content-Type\" content=\"text/html; charset=UTF-8\"><meta name=\"viewpor'\n",
-      "url: http://ajansahiska.com/sayfa/genel-bilgi-surgun-i6.html\n",
-      "content-type: text/html; charset=utf-8\n",
-      "content: b'<!doctype html><html dir=\"ltr\" lang=\"tr\" itemid=\"http://ajansahiska.com/sayfa/genel-bilgi-surgun-i6.html\" itemscope=\"\" itemtype=\"http://schema.org/NewsArticle\" xmlns:og=\"http://opengraphprotocol.org/s'\n",
-      "url: http://ajboudoir.blogspot.com/2010/11/evolution-girls.html\n",
-      "content-type: text/html; charset=UTF-8\n",
-      "content: b\"<!DOCTYPE html>\\n<html class='v2' dir='ltr' lang='en'>\\n<head>\\n<link href='https://www.blogger.com/static/v1/widgets/3566091532-css_bundle_v2.css' rel='stylesheet' type='text/css'/>\\n<meta content='width\"\n",
-      "url: http://ajboudoir.blogspot.com/2013/04/couple-boudoir-photography-stony-plain.html\n",
-      "content-type: text/html; charset=UTF-8\n",
-      "content: b\"<!DOCTYPE html>\\n<html class='v2' dir='ltr' lang='en'>\\n<head>\\n<link href='https://www.blogger.com/static/v1/widgets/3566091532-css_bundle_v2.css' rel='stylesheet' type='text/css'/>\\n<meta content='width\"\n",
-      "url: http://ajstovesphotography.co.uk/albums/Mr---Mrs-Routledge-Wedding/297343/Nikky-and-Neils-wedding-a37\n",
-      "content-type: text/html; charset=utf-8\n",
-      "content: b'\\r\\n\\r\\n<!DOCTYPE html>\\r\\n<html lang=\"en-us\">\\r\\n<head><title>Nikky and Neils wedding-a37.jpg | Mr & Mrs Routledge Wedding | AJ.Stoves Photography</title>\\r\\n    <!--meta-->\\r\\n    <meta name=\"viewport\" content='\n",
-      "url: http://akabane.cocolog-nifty.com/hotcafe/2013/12/post-c92f.html\n",
-      "content-type: text/html; charset=UTF-8\n",
-      "content: b'<!DOCTYPE html PUBLIC \"-//W3C//DTD XHTML 1.0 Transitional//EN\"\\n\\t\"http://www.w3.org/TR/xhtml1/DTD/xhtml1-transitional.dtd\">\\n<html xmlns=\"http://www.w3.org/1999/xhtml\" id=\"sixapart-standard\">\\n<head>\\n\\t\\n\\t'\n",
-      "url: http://akihadai.ed.jp/akihadai/news/14298/\n",
-      "content-type: text/html; charset=UTF-8\n",
-      "content: b'<!DOCTYPE html>\\r\\n<html lang=\"ja\">\\r\\n\\r\\n\\r\\n<head>\\r\\n<!-- Google Tag Manager -->\\r\\n<script>(function(w,d,s,l,i){w[l]=w[l]||[];w[l].push({\\'gtm.start\\':\\r\\nnew Date().getTime(),event:\\'gtm.js\\'});var f=d.getElement'\n",
-      "url: http://akorda.info/kz/executive_office/schedule\n",
-      "content-type: text/html; charset=UTF-8\n",
-      "content: b'<!DOCTYPE html>\\n<html lang=\"kz\">\\n<head>\\n  <meta charset=\"UTF-8\">\\n  <meta name=\"viewport\" content=\"width=device-width, initial-scale=1, shrink-to-fit=no\">\\n  <meta http-equiv=\"x-ua-compatible\" content=\"'\n",
-      "url: http://akshskzzzx.yizhumao.com/ProductDetail/7273123.html\n",
-      "content-type: text/html;charset=utf-8\n",
-      "content: b'<!-- cache for /ProductDetail/7273123.html 2024-02-20 23:36:14-->\\r\\n<!DOCTYPE html>\\n<html><!--PHP-->\\n<head>\\n\\t<title>\\xe4\\xba\\xac\\xe4\\xb8\\x9c\\xe5\\xa4\\xa7\\xe8\\x8d\\xaf\\xe6\\x88\\xbf\\xe5\\x85\\xa5\\xe9\\xa9\\xbb\\xe5\\x85\\xa5\\xe5\\x8f\\xa3,\\xe4\\xba\\xac\\xe4\\xb8\\x9c\\xe5\\x8d\\x83\\xe5\\xb1\\xb1\\xe5\\x81\\xa5\\xe5\\xba\\xb7\\xe5\\x85\\xa5\\xe9\\xa9\\xbb</title>\\n\\t<meta name=\"keywords\" c'\n",
-      "url: http://alanandrews.net/live/zuqiu/22386.html\n",
-      "content-type: text/html; charset=utf-8\n",
-      "content: b'\\r\\n<!doctype html>\\r\\n<html>\\r\\n<head>\\r\\n<meta charset=\"utf-8\">\\r\\n<meta http-equiv=\"X-UA-Compatible\" content=\"IE=edge\" >\\r\\n<meta name=\"renderer\" content=\"webkit\">\\r\\n<title>2024\\xe5\\xb9\\xb402\\xe6\\x9c\\x8804\\xe6\\x97\\xa5\\xe6\\x98\\x9f\\xe6\\x9c\\x9f\\xe6\\x97\\xa5 \\xe8\\xb4\\xb9\\xe8\\x90\\xa8\\xe9\\x87\\x8c\\xe5'\n",
-      "url: http://alasdairstuart.com/tag/paseudopod/\n",
-      "content-type: text/html; charset=UTF-8\n",
-      "content: b'<!DOCTYPE html>\\r\\n<html lang=\"en-US\">\\r\\n<head>\\r\\n<meta charset=\"UTF-8\">\\r\\n<meta name=\"viewport\" content=\"width=device-width, initial-scale=1\">\\r\\n\\t <link rel=\"profile\" href=\"https://gmpg.org/xfn/11\"> \\r\\n\\t <m'\n",
-      "url: http://alba.selitondemo.ro/product/405/rochie-pentru-corp-sculpturat.html\n",
-      "content-type: text/html; charset=utf-8\n",
-      "content: b'<!DOCTYPE html>\\n<html lang=\"ro\">\\n<head>\\n\\t<meta http-equiv=\"Content-Type\" content=\"text/html; charset=utf-8\" />\\n<meta http-equiv=\"Content-Script-Type\" content=\"text/javascript\" />\\n<meta http-equiv=\"Con'\n",
-      "url: http://albadoors.ru/internet-magazin/product/dver-magnoliya-lyuks-73l-st-grafit-2000-800\n",
-      "content-type: text/html; charset=utf-8\n",
-      "content: b'\\n        <!doctype html>\\n<html lang=\"ru\">\\n<head>\\n<meta charset=\"utf-8\">\\n<meta name=\"robots\" content=\"all\" />\\n<title>\\xd0\\x9a\\xd1\\x83\\xd0\\xbf\\xd0\\xb8\\xd1\\x82\\xd1\\x8c \\xd0\\xb2 \\xd0\\x9d\\xd0\\xb8\\xd0\\xb6\\xd0\\xbd\\xd0\\xb5 \\xd0\\x9d\\xd0\\xbe\\xd0\\xb2\\xd0\\xb3\\xd0\\xbe\\xd1\\x80\\xd0\\xbe\\xd0\\xb4\\xd0\\xb5 - \\xd0\\x9c\\xd0\\xb5\\xd0\\xb6\\xd0\\xba\\xd0\\xbe\\xd0\\xbc\\xd0\\xbd\\xd0\\xb0\\xd1\\x82\\xd0\\xbd\\xd0\\xb0\\xd1\\x8f \\xd0\\xb4\\xd0\\xb2\\xd0\\xb5\\xd1\\x80\\xd1\\x8c '\n",
-      "url: http://albergueweb1.uva.es/guias/guias2122/55050/1/\n",
-      "content-type: text/html;charset=UTF-8\n",
-      "content: b'<!DOCTYPE HTML PUBLIC \"-//W3C//DTD HTML 3.2 Final//EN\">\\n<html>\\n <head>\\n  <title>Index of /guias/guias2122/55050/1</title>\\n </head>\\n <body>\\n<h1>Index of /guias/guias2122/55050/1</h1>\\n<pre><img src=\"/ic'\n",
-      "url: http://albinism.life/\n",
-      "content-type: text/html; charset=utf-8\n",
-      "content: b'<!doctype html>\\n<html data-adblockkey=\"MFwwDQYJKoZIhvcNAQEBBQADSwAwSAJBANDrp2lz7AOmADaN8tA50LsWcjLFyQFcb/P2Txc58oYOeILb3vBw7J6f4pamkAQVSQuqYsKx3YzdUHCvbVZvFUsCAwEAAQ==_GrU/PDdGnTPi+4NwAyrXdT3uKJnQvoKe'\n",
-      "url: http://alerugby.over-blog.net/tag/carnet%20noir/\n",
-      "content-type: text/html; charset=UTF-8\n",
-      "content: b'<!DOCTYPE html>\\n<html lang=\"fr\">\\n    <head>             \\n  \\n          \\n          \\n                                                                                                    \\n                 '\n",
-      "url: http://alexandravoronina.ru/prinimajte-reshenie-i-dejstvujte-intervyu-s-miloj-kolokolovoj/?replytocom=514\n",
-      "content-type: text/html; charset=utf-8\n",
-      "content: b'<!DOCTYPE html>\\n<html lang=\"ru-RU\" itemscope itemtype=\"http://schema.org/Product\" class=\"no-js\">\\n<head>\\n  <meta charset=\"utf-8\">\\n  <script type=\"text/javascript\">\\n  // <![CDATA[\\n  // < ![CDATA[\\n  var '\n",
-      "url: http://alexboom.de/a-most-humble-year-2021-in-review/\n",
-      "content-type: text/html; charset=UTF-8\n",
-      "content: b'<!DOCTYPE HTML>\\r\\n<html lang=\"de-DE\">\\r\\n<head>\\r\\n\\t<meta charset=\"UTF-8\">\\r\\n\\t<title>A most humble year: 2021 in review &#8211; Der Content mit dem Knalleffekt!</title>\\n<meta name=\\'robots\\' content=\\'max-imag'\n",
-      "url: http://alexnerygravura.blogspot.com/\n",
-      "content-type: text/html; charset=UTF-8\n",
-      "content: b\"<!DOCTYPE html>\\n<html dir='ltr' xmlns='http://www.w3.org/1999/xhtml' xmlns:b='http://www.google.com/2005/gml/b' xmlns:data='http://www.google.com/2005/gml/data' xmlns:expr='http://www.google.com/2005/\"\n",
-      "url: http://alicegame.xsrv.jp/hina/old_log.php?room_no=4742&reverse_log=on&heaven_only=on&icon=on&personal_result=on&time=on&db_no=5\n",
-      "content-type: text/html; charset=UTF-8\n",
-      "content: b'<!DOCTYPE html>\\n<html lang=\"ja\">\\n<head>\\n<meta charset=\"UTF-8\">\\n<title>[4742\\xe7\\x95\\xaa\\xe5\\x9c\\xb0] \\xe3\\x80\\x90\\xe9\\x9b\\x9b4062\\xe3\\x80\\x91\\xe3\\x82\\x84\\xe3\\x82\\x8b\\xe5\\xa4\\xab\\xe9\\x81\\x94\\xe3\\x81\\xae\\xe5\\xb8\\x8c\\xe6\\x9c\\x9b\\xe6\\xb4\\xbe\\xe7\\x94\\x9f\\xe8\\xb6\\x85\\xe9\\x97\\x87 - \\xe6\\xb1\\x9d\\xe3\\x81\\xaf\\xe4\\xba\\xba\\xe7\\x8b\\xbc\\xe3\\x81\\xaa\\xe3\\x82\\x8a\\xe3\\x82\\x84\\xef\\xbc\\x9f[\\xe9\\x81\\x8e\\xe5\\x8e\\xbb\\xe3\\x83\\xad\\xe3\\x82\\xb0]</title>\\n<link rel=\"stylesheet'\n",
-      "url: http://alicegame.xsrv.jp/hina/old_log.php?room_no=985&heaven_talk=on&heaven_only=on&add_role=on&time=on&icon=on&db_no=1\n",
-      "content-type: text/html; charset=UTF-8\n",
-      "content: b'<!DOCTYPE html>\\n<html lang=\"ja\">\\n<head>\\n<meta charset=\"UTF-8\">\\n<title>[985\\xe7\\x95\\xaa\\xe5\\x9c\\xb0] \\xe3\\x80\\x90\\xe9\\x9b\\x9b817\\xe3\\x80\\x91\\xe3\\x82\\x84\\xe3\\x82\\x8b\\xe5\\xa4\\xab\\xe3\\x81\\x9f\\xe3\\x81\\xa1\\xe3\\x81\\xae\\xe8\\xb6\\x85\\xe9\\x97\\x87\\xe9\\x8d\\x8b - \\xe6\\xb1\\x9d\\xe3\\x81\\xaf\\xe4\\xba\\xba\\xe7\\x8b\\xbc\\xe3\\x81\\xaa\\xe3\\x82\\x8a\\xe3\\x82\\x84\\xef\\xbc\\x9f[\\xe9\\x81\\x8e\\xe5\\x8e\\xbb\\xe3\\x83\\xad\\xe3\\x82\\xb0]</title>\\n<link rel=\"stylesheet\" href=\"'\n",
-      "url: http://aliyavaleeva.ru/?attachment_id=4302\n",
-      "content-type: text/html; charset=UTF-8\n",
-      "content: b'<!DOCTYPE html>\\n<html lang=\"ru-RU\">\\n<head>\\n\\t<meta charset=\"UTF-8\" />\\n\\t\\n\\t<title>_J2A0862 | aliyavaleeva.ru</title>\\n\\n\\t\\n\\t\\t\\t\\n\\t\\t\\t\\t\\t\\t<meta name=\"viewport\" content=\"width=device-width,initial-scale=1,user-sc'\n",
-      "url: http://allthink.com/2158292/conserje\n",
-      "content-type: text/html; charset=UTF-8\n",
-      "content: b'<!DOCTYPE html>\\r\\n<html lang=\"en\">\\r\\n<head>\\r\\n<title>Conserje (tt0149969)</title>\\r\\n<meta charset=\"utf-8\" />\\r\\n<meta name=\"viewport\" content=\"width=device-width, initial-scale=1.0\" />\\r\\n<link rel=\"icon\" typ'\n",
-      "url: http://almanaquedasirmandades.gal/efemeride/maruxa-mallo-2036-02-06/\n",
-      "content-type: text/html; charset=UTF-8\n",
-      "content: b'<!DOCTYPE html>\\n<html class=\"no-js\" lang=\"gl-ES\" xmlns:fb=\"https://www.facebook.com/2008/fbml\" xmlns:addthis=\"https://www.addthis.com/help/api-spec\" >\\n<head>\\n<meta charset=\"UTF-8\">\\n<meta name=\"viewpor'\n",
-      "url: http://almaz.zp.ua/product/53857/materinskaya-plata-sfm2-biostar-a58md-bulk-amd-a55.html\n",
-      "content-type: text/html; charset=windows-1251\n",
-      "content: b'<!DOCTYPE HTML PUBLIC \"-//W3C//DTD HTML 4.01 Transitional//EN\" \"http://www.w3.org/TR/html4/loose.dtd\">\\n<html>\\n<head>\\n<title>\\xca\\xf3\\xef\\xe8\\xf2\\xfc \\xcc\\xe0\\xf2\\xe5\\xf0\\xe8\\xed\\xf1\\xea\\xe0\\xff \\xcf\\xeb\\xe0\\xf2\\xe0 sFM2+ Biostar A58MD Bulk AMD A55, 2*DDR3, 4*SATAII,'\n",
-      "url: http://alpacafarmtrivia.herokuapp.com/questions/18791\n",
-      "content-type: text/html; charset=utf-8\n",
-      "content: b'<!DOCTYPE html>\\n<html>\\n<head>\\n  <title>AlpacaFarm</title>\\n  <link rel=\"stylesheet\" media=\"all\" href=\"/assets/application-b12c99378c13cc251766fb6bbdf0395b1c98c9238c81e6ed62689b4091eb9c8a.css\" data-turb'\n",
-      "url: http://alrashadmarine.com/yefm-57102seti\n",
-      "content-type: text/html;charset=UTF-8\n",
-      "content: b'<!DOCTYPE html>\\r\\n<html lang=\"ja\">\\r\\n<head>\\r\\n    <meta charset=\"utf-8\">\\r\\n    <meta content=\"no-cache\" http-equiv=\"Pragma\"/>\\r\\n    <meta content=\"no-store, no-cache, must-revalidate\" http-equiv=\"Cache-Con'\n",
-      "url: http://altawap.ru/forum/index.php?topic=3\n",
-      "content-type: text/html; charset=UTF-8\n",
-      "content: b'<!DOCTYPE html>\\n<html lang=\"en\">\\n<head>\\n<meta charset=\"utf-8\">\\n<meta http-equiv=\"X-UA-Compatible\" content=\"IE=edge\">\\n<meta name=\"viewport\" content=\"width=device-width, initial-scale=1.0, maximum-scale'\n",
-      "url: http://alternatywadlalukowa.pl/question/%D0%B1%D1%80%D0%BE%D1%88%D1%83-%D0%BA%D1%83%D1%80%D0%B8%D1%82%D1%8C-%D0%B8-%D0%BF%D0%B8%D1%82%D1%8C-%D0%BA%D0%B0%D0%B7%D0%B0%D1%87%D0%B5%D0%BD%D0%BA%D0%BE/\n",
-      "content-type: text/html; charset=UTF-8\n",
-      "content: b'<!DOCTYPE html>\\n<html lang=\"pl-PL\">\\n\\n<head>\\n\\t<meta charset=\\'UTF-8\\'>\\n\\t<meta name=\"viewport\" content=\"width=device-width, initial-scale=1\">\\n\\t<link rel=\"profile\" href=\"http://gmpg.org/xfn/11\">\\n\\t\\t<title>\\xd0'\n",
-      "url: http://alternatywadlalukowa.pl/question/%D0%BA%D0%BE%D0%BB%D0%BC%D0%B5-%D1%86%D0%B5%D0%BD%D0%B0-%D0%B8-%D0%B8%D0%BD%D1%81%D1%82%D1%80%D1%83%D0%BA%D1%86%D0%B8%D1%8F-%D0%BF%D0%BE-%D0%BF%D1%80%D0%B8%D0%BC%D0%B5%D0%BD%D0%B5%D0%BD%D0%B8%D1%8E/\n",
-      "content-type: text/html; charset=UTF-8\n",
-      "content: b'<!DOCTYPE html>\\n<html lang=\"pl-PL\">\\n\\n<head>\\n\\t<meta charset=\\'UTF-8\\'>\\n\\t<meta name=\"viewport\" content=\"width=device-width, initial-scale=1\">\\n\\t<link rel=\"profile\" href=\"http://gmpg.org/xfn/11\">\\n\\t\\t<title>\\xd0'\n",
-      "url: http://alternatywadlalukowa.pl/question/piano-di-dieta-vegan-detox-5-giorni/\n",
-      "content-type: text/html; charset=UTF-8\n",
-      "content: b'<!DOCTYPE html>\\n<html lang=\"pl-PL\">\\n\\n<head>\\n\\t<meta charset=\\'UTF-8\\'>\\n\\t<meta name=\"viewport\" content=\"width=device-width, initial-scale=1\">\\n\\t<link rel=\"profile\" href=\"http://gmpg.org/xfn/11\">\\n\\t\\t<title>P'\n",
-      "url: http://amamiguide.main.jp/buyer101/breastfeeding-stylish-pajamas-301964.html\n",
-      "content-type: text/html; charset=UTF-8\n",
-      "content: b'\\n\\n\\n\\n\\n\\n\\n\\n\\n\\n\\n'\n",
-      "url: http://amamiguide.main.jp/cateogry59/jade-3rd-row-175365.html\n",
-      "content-type: text/html; charset=UTF-8\n",
-      "content: b'\\n\\n\\n\\n\\n\\n\\n\\n\\n\\n\\n'\n",
-      "url: http://amamiguide.main.jp/module15/panasonic-switch-initialization-42260.html\n",
-      "content-type: text/html; charset=UTF-8\n",
-      "content: b'\\n\\n\\n\\n\\n\\n\\n\\n\\n\\n\\n'\n",
-      "url: http://amazonastotal.com.br/marca-de-beleza-cria-kits-de-presentes-para-brincadeiras-de-amigo-secreto/\n",
-      "content-type: text/html; charset=UTF-8\n",
-      "content: b'<!DOCTYPE html>\\r\\n<html lang=\"pt-BR\" xmlns:fb=\"https://www.facebook.com/2008/fbml\" xmlns:addthis=\"https://www.addthis.com/help/api-spec\" >\\r\\n<head>\\r\\n\\t\\t\\t<meta charset=\"UTF-8\" />\\r\\n\\t\\t<meta name=\"viewport\" '\n",
-      "url: http://amenohitorigoto.blog.fc2.com/blog-entry-3492.html\n",
-      "content-type: text/html;charset=utf-8\n",
-      "content: b'<!DOCTYPE html>\\r\\n<html >\\r\\n<head>\\r\\n<meta charset=\"utf-8\">\\r\\n<meta name=\"author\" content=\"\\xe3\\x81\\xaf\\xe3\\x82\\x8b\" />\\r\\n<meta name=\"description\" content=\"\\xe3\\x82\\xa2\\xe3\\x83\\xa1\\xe3\\x82\\xb7\\xe3\\x83\\xa7\\xe2\\x99\\x80\\xe3\\x81\\xae*\\xe3\\x81\\x82\\xe3\\x82\\x81* \\xef\\xbc\\x86 \\xe4\\xb8\\x89\\xe6\\xaf\\x9b\\xe7\\x8c\\xab\\xe3\\x83\\xaa\\xe3\\x83\\xaa\\xe3\\x83\\xa9\\xe3\\x83\\xa9 \\xe3\\x81\\xa8 \\xe9\\xa3\\xbc\\xe3\\x81\\x84\\xe4\\xb8\\xbb\\xe3'\n",
-      "url: http://amis.zoo-logique.org/forum/index.php?sid=afab564828c7d94b8cb36ac01d136333\n",
-      "content-type: text/html\n",
-      "content: b'<!DOCTYPE HTML PUBLIC \"-//W3C//DTD HTML 4.01 Transitional//EN\">\\n<html dir=\"LTR\">\\n<head>\\n<meta http-equiv=\"Content-Type\" content=\"text/html; charset=ISO-8859-1\">\\n<meta http-equiv=\"Content-Style-Type\" c'\n",
-      "url: http://amorebello.blogspot.com/2005/11/pictures-galore.html\n",
-      "content-type: text/html; charset=UTF-8\n",
-      "content: b\"<!DOCTYPE html>\\n<html class='v2' dir='ltr' lang='en-US'>\\n<head>\\n<link href='https://www.blogger.com/static/v1/widgets/3566091532-css_bundle_v2.css' rel='stylesheet' type='text/css'/>\\n<meta content='wi\"\n",
-      "url: http://analforum.net/viewforum.php?f=6&start=39750\n",
-      "content-type: text/html; charset=UTF-8\n",
-      "content: b'<!DOCTYPE html PUBLIC \"-//W3C//DTD XHTML 1.0 Strict//EN\" \"http://www.w3.org/TR/xhtml1/DTD/xhtml1-strict.dtd\">\\r\\n<html xmlns=\"http://www.w3.org/1999/xhtml\" dir=\"ltr\" lang=\"en-gb\" xml:lang=\"en-gb\">\\r\\n<hea'\n",
-      "url: http://analogical-dictionary.sensagent.com/ma214266/ML-en-en/\n",
-      "content-type: text/html;charset=UTF-8\n",
-      "content: b'\\r\\n\\r\\n\\r\\n\\r\\n\\r\\n\\r\\n\\r\\n\\r\\n\\r\\n\\r\\n\\r\\n\\r\\n\\r\\n\\r\\n\\r\\n\\r\\n\\r\\n\\r\\n\\r\\n\\r\\n\\r\\n\\r\\n\\r\\n\\r\\n\\r\\n\\r\\n\\r\\n\\r\\n\\r\\n\\r\\n<!DOCTYPE HTML PUBLIC \"-//W3C//DTD HTML 4.01 Transitional//EN\" \"http://www.w3.org/TR/html4/loose.dtd\">\\r\\n<html version=\"-//w3c//dtd html 4.01'\n",
-      "url: http://ando-travel.com.ua/preuve-en-tenant-bad-comme-avis-sur-un-blog-en/\n",
-      "content-type: text/html; charset=UTF-8\n",
-      "content: b'<!DOCTYPE html>\\n\\n<html lang=\"ru-RU\">\\n\\n<head itemscope=\"itemscope\" itemtype=\"https://schema.org/WebSite\" >\\n\\t<meta charset=\"UTF-8\">\\n\\t<meta name=\"viewport\" content=\"width=device-width, initial-scale=1\">\\n'\n",
-      "url: http://anearful.blogspot.com/2017/03/collapsing-into-nordic-affects.html\n",
-      "content-type: text/html; charset=UTF-8\n",
-      "content: b\"<!DOCTYPE html>\\n<html dir='ltr' xmlns='http://www.w3.org/1999/xhtml' xmlns:b='http://www.google.com/2005/gml/b' xmlns:data='http://www.google.com/2005/gml/data' xmlns:expr='http://www.google.com/2005/\"\n",
-      "url: http://animaths.com/tag/car-hire-usa-age/\n",
-      "content-type: text/html; charset=UTF-8\n",
-      "content: b'<!DOCTYPE html>\\n<html lang=\"en-US\">\\n<head>\\n\\t<meta charset=\"UTF-8\" />\\n\\t<meta name=\"viewport\" content=\"width=device-width, initial-scale=1\" />\\n<meta name=\\'robots\\' content=\\'max-image-preview:large\\' />\\n<t'\n",
-      "url: http://ankaemlak.com.tr/emlak/agaoglu-my-world-europe-satilik-2plus1-daire/\n",
-      "content-type: text/html; charset=UTF-8\n",
-      "content: b'<!DOCTYPE html>\\r\\n<html lang=\"tr\">\\r\\n<head>\\r\\n<meta charset=\"UTF-8\" />\\r\\n<meta name=\"viewport\" content=\"width=device-width, initial-scale=1,user-scalable=no\">\\r\\n<link rel=\"pingback\" href=\"http://ankaemlak.'\n",
-      "url: http://anmedio.pl/\n",
-      "content-type: text/html; charset=UTF-8\n",
-      "content: b'<!DOCTYPE html>\\r\\n\\r\\n<html lang=\"pl-PL\" class=\"no-js\">\\r\\n<head>\\r\\n\\t\\r\\n\\t<meta charset=\"UTF-8\">\\r\\n\\t\\r\\n\\t<meta name=\"viewport\" content=\"width=device-width, initial-scale=1, maximum-scale=1, user-scalable=0\" /><t'\n",
-      "url: http://annapurnapharmacy.com/drug/7406-aktive-sacro-lumbar-support-xxl\n",
-      "content-type: text/html; charset=UTF-8\n",
-      "content: b'<!DOCTYPE html>\\n<html lang=\"en\">\\n<head>\\n    <meta charset=\"utf-8\">\\n    <meta http-equiv=\"X-UA-Compatible\" content=\"IE=edge\">\\n    <meta name=\"viewport\" content=\"width=device-width, initial-scale=1\">\\n  '\n",
-      "url: http://annesnyder.org/2014/06/30/canaries-in-the-cultural-coal-mine/caneries-in-the-cultural-coalmine/\n",
-      "content-type: text/html; charset=UTF-8\n",
-      "content: b'<!DOCTYPE html>\\r\\n<html lang=\"en-US\">\\r\\n<head>\\r\\n\\r\\n\\t<meta charset=\"UTF-8\">\\r\\n\\t<meta http-equiv=\"X-UA-Compatible\" content=\"IE=edge\">\\r\\n\\t<meta name=\"viewport\" content=\"width=device-width, initial-scale=1\">\\r\\n'\n",
-      "url: http://anoia.pigaim.cat/manual/de/howto/htaccess.html\n",
-      "content-type: text/html\n",
-      "content: b'<?xml version=\"1.0\" encoding=\"ISO-8859-1\"?>\\n<!DOCTYPE html PUBLIC \"-//W3C//DTD XHTML 1.0 Strict//EN\" \"http://www.w3.org/TR/xhtml1/DTD/xhtml1-strict.dtd\">\\n<html xmlns=\"http://www.w3.org/1999/xhtml\" lan'\n",
-      "url: http://anokovcheg.ru/%D0%BF%D1%80%D0%BE%D0%B3%D1%80%D0%B0%D0%BC%D0%BC%D1%8B-%D0%BF%D1%80%D0%BE%D0%B5%D0%BA%D1%82%D1%8B-%D0%B0%D0%BA%D1%86%D0%B8%D0%B8/%D0%B0%D0%BA%D1%86%D0%B8%D0%B8-%D0%BF%D0%BE-%D1%81%D0%B1%D0%BE%D1%80%D1%83-%D0%BF%D0%BE%D0%BC%D0%BE%D1%89%D0%B8/\n",
-      "content-type: text/html; charset=UTF-8\n",
-      "content: b'        <!DOCTYPE html>\\n        <html lang=\"ru-RU\">\\n        \\n<head>\\n\\t\\t<meta charset=\"UTF-8\">\\n\\t\\t<meta name=\"viewport\" content=\"width=device-width, initial-scale=1, minimum-scale=1\">\\n\\t\\t<link rel=\"profil'\n",
-      "url: http://anokovcheg.ru/category/%D0%B0%D0%BA%D1%86%D0%B8%D0%B8-%D0%BF%D0%BE-%D1%81%D0%B1%D0%BE%D1%80%D1%83-%D0%BF%D0%BE%D0%BC%D0%BE%D1%89%D0%B8-2/\n",
-      "content-type: text/html; charset=UTF-8\n",
-      "content: b'        <!DOCTYPE html>\\n        <html lang=\"ru-RU\">\\n        \\n<head>\\n\\t\\t<meta charset=\"UTF-8\">\\n\\t\\t<meta name=\"viewport\" content=\"width=device-width, initial-scale=1, minimum-scale=1\">\\n\\t\\t<link rel=\"profil'\n",
-      "url: http://anoreksja.org.pl/viewtopic.php?f=17&p=2535590&sid=8b5b01432fb4334c6fa2170f66894cef\n",
-      "content-type: text/html; charset=UTF-8\n",
-      "content: b'<!DOCTYPE html PUBLIC \"-//W3C//DTD XHTML 1.0 Transitional//EN\" \"http://www.w3.org/TR/xhtml1/DTD/xhtml1-transitional.dtd\">\\n<html xmlns=\"http://www.w3.org/1999/xhtml\" dir=\"ltr\" lang=\"pl-PL\" xml:lang=\"pl'\n",
-      "url: http://anosaka.blog.fc2.com/blog-entry-309.html\n",
-      "content-type: text/html;charset=utf-8\n",
-      "content: b'<?xml version=\"1.0\" encoding=\"utf-8\"?>\\r\\n<!DOCTYPE html PUBLIC \"-//W3C//DTD XHTML 1.0 Transitional//EN\" \"http://www.w3.org/TR/xhtml1/DTD/xhtml1-transitional.dtd\">\\r\\n<html xmlns=\"http://www.w3.org/1999/x'\n",
-      "url: http://another-place.cocolog-nifty.com/field/2012/10/post-0bab.html\n",
-      "content-type: text/html; charset=UTF-8\n",
-      "content: b'<!DOCTYPE html PUBLIC \"-//W3C//DTD XHTML 1.0 Transitional//EN\" \"http://www.w3.org/TR/xhtml1/DTD/xhtml1-transitional.dtd\">\\n\\n<html xmlns=\"http://www.w3.org/1999/xhtml\">\\n<head>\\n<meta http-equiv=\"Content-'\n",
-      "url: http://anticult.minibird.jp/cgi/cgi09/light.cgi?res=7806\n",
-      "content-type: text/html\n",
-      "content: b'<!DOCTYPE HTML PUBLIC \"-//W3C//DTD HTML 4.01 Transitional//EN\">\\n<html lang=\"ja\">\\n<head>\\n<meta http-equiv=\"content-type\" content=\"text/html; charset=shift_jis\">\\n<meta http-equiv=\"content-script-type\" c'\n",
-      "url: http://antoinepoulain.com/html/work/cinema/shade/shade_09.htm\n",
-      "content-type: text/html\n",
-      "content: b'<!DOCTYPE HTML PUBLIC \"-//W3C//DTD HTML 4.01 Transitional//EN\"\\n\"http://www.w3.org/TR/html4/loose.dtd\">\\n<html><!-- InstanceBegin template=\"/Templates/base_noire.dwt\" codeOutsideHTMLIsLocked=\"false\" -->'\n",
-      "url: http://antoninosaggio.blogspot.com/2010/12/convegno-marcello-piacentini-161718.html\n",
-      "content-type: text/html; charset=UTF-8\n",
-      "content: b\"<!DOCTYPE html>\\n<html class='v2' dir='ltr' lang='en-GB'>\\n<head>\\n<link href='https://www.blogger.com/static/v1/widgets/3566091532-css_bundle_v2.css' rel='stylesheet' type='text/css'/>\\n<meta content='wi\"\n",
-      "url: http://aozemi.blog.fc2.com/blog-category-465.html\n",
-      "content-type: text/html;charset=utf-8\n",
-      "content: b'<!DOCTYPE HTML PUBLIC \"-//W3C//DTD HTML 4.01 Transitional//EN\" \"http://www.w3.org/TR/html4/loose.dtd\">\\r\\n<html>\\r\\n<head>\\r\\n\\r\\n<script type=\"text/x-mathjax-config\">\\r\\n  MathJax.Hub.Config({ tex2jax: { inlin'\n",
-      "url: http://aperos-musique-blesle.com/gyslain/\n",
-      "content-type: text/html; charset=UTF-8\n",
-      "content: b'<!DOCTYPE html>\\r\\n<!--[if IE 7]>\\r\\n<html class=\"ie ie7\" dir=\"ltr\" lang=\"fr-FR\" prefix=\"og: https://ogp.me/ns#\">\\r\\n<![endif]-->\\r\\n<!--[if IE 8]>\\r\\n<html class=\"ie ie8\" dir=\"ltr\" lang=\"fr-FR\" prefix=\"og: htt'\n",
-      "url: http://apiros.hu/2-fok-magas-vrnyoms-kezels-s-tpllkozs-826257.php\n",
-      "content-type: text/html; charset=UTF-8\n",
-      "content: b'<!DOCTYPE html>\\r\\n<html prefix=\"og: http://ogp.me/ns#\">\\r\\n<head>\\r\\n<meta http-equiv=\"content-type\" content=\"text/html; charset=UTF-8\">\\r\\n<meta charset=\"UTF-8\">\\r\\n<meta name=\"viewport\" content=\"width=device'\n",
-      "url: http://apocalypseblogger.apocalypseradio.com/2019/02/apocalypse-radio-five-hundred-and_17.html\n",
-      "content-type: text/html; charset=UTF-8\n",
-      "content: b'<!DOCTYPE html PUBLIC \"-//W3C//DTD XHTML 1.0 Strict//EN\" \"http://www.w3.org/TR/xhtml1/DTD/xhtml1-strict.dtd\">\\n\\n \\n\\n<html xmlns=\"http://www.w3.org/1999/xhtml\" xml:lang=\"en\" lang=\"en\">\\n\\n<head> \\n\\n  <title'\n",
-      "url: http://apolloonline.ru/events/blagotvoritelnaya-baraholka/attachment/1619533193_875138_75/\n",
-      "content-type: text/html; charset=UTF-8\n",
-      "content: b'<!DOCTYPE html>\\n<html>\\n<head>\\n    <meta charset=\"UTF-8\" />\\n    <meta http-equiv=\"X-UA-Compatible\" content=\"\">\\n    <meta name=\"viewport\" content=\"width=device-width, user-scalable=no\">\\n    <link rel=\"s'\n",
-      "url: http://app.cm-pontadelgada.pt/895?geo_article_id=7300&list_of=nearby_list&page_articles=1&page_nearby_list=240&page_opinion=1&page_related=1&page_suggestions=180\n",
-      "content-type: text/html; charset=utf-8\n",
-      "content: b'<!DOCTYPE HTML>\\n<html lang=\"pt-PT\">\\n\\n<head>\\n  <title>Casa da M\\xc3\\xa3e de Deus | Visit Ponta Delgada</title>\\n  <link rel=\"stylesheet\" type=\"text/css\" href=\"/assets/wm-smile/stylesheets/frontoffice/mandator'\n",
-      "url: http://aquadina.com/hakone/category/19216/\n",
-      "content-type: text/html\n",
-      "content: b'<!DOCTYPE html>\\n<html lang=\"ja\">\\n<head>\\n<meta charset=\"utf-8\">\\n<title>\\xe4\\xbb\\x99\\xe7\\x9f\\xb3\\xe5\\x8e\\x9f\\xe3\\x81\\xae\\xe3\\x81\\x82\\xe3\\x82\\x93\\xe3\\x81\\xbf\\xe3\\x81\\xa4\\xef\\xbc\\x88\\xe5\\x85\\xa8\\xe5\\xb8\\xad\\xe7\\xa6\\x81\\xe7\\x85\\x99\\xe3\\x83\\xbb\\xe5\\x88\\x86\\xe7\\x85\\x99\\xef\\xbc\\x89\\xef\\xbc\\x881\\xe4\\xbb\\xb6\\xef\\xbc\\x89 [\\xe3\\x82\\xa2\\xe3\\x82\\xaf\\xe3\\x82\\xa2\\xe3\\x83\\x87\\xe3\\x82\\xa3\\xe3\\x83\\xbc\\xe3\\x83\\x8a\\xe7\\xae\\xb1\\xe6\\xa0\\xb9\\xe7\\x89\\x88]</title><meta name=\"description\" con'\n",
-      "url: http://aquanaut.com/bin/trg/aquanaut.com/clubs/DSA\n",
-      "content-type: text/html; charset=\"utf-8\"\n",
-      "content: b'<!doctype html>\\n<html>\\n<head>\\n<title>Dive Station Aquaventure Sdn Bhd</title>\\n<meta name=\"keywords\" content=\"Aquanaut, dive club, Dive Station Aquaventure Sdn Bhd\">\\n<meta name=\"description\" content=\"A'\n",
-      "url: http://araki298.blog109.fc2.com/blog-entry-1321.html\n",
-      "content-type: text/html;charset=utf-8\n",
-      "content: b'<?xml version=\"1.0\" encoding=\"EUC-JP\"?>\\r\\n<!DOCTYPE html PUBLIC \"-//W3C//DTD XHTML 1.0 Transitional//EN\" \"http://www.w3.org/TR/xhtml1/DTD/xhtml1-transitional.dtd\">\\r\\n<html xmlns=\"http://www.w3.org/1999/'\n",
-      "url: http://arch-group.org/projects/123\n",
-      "content-type: text/html; charset=utf-8\n",
-      "content: b'<!DOCTYPE html>\\n<html lang=\\'en\\'>\\n<head>\\n<meta charset=\\'utf-8\\'>\\n<meta content=\\'IE=Edge,chrome=1\\' http-equiv=\\'X-UA-Compatible\\'>\\n<meta content=\\'width=device-width\\' name=\\'viewport\\'>\\n<link href=\"/favicon.p'\n",
-      "url: http://architekciplus.pl/archwiel19.html\n",
-      "content-type: text/html\n",
-      "content: b'<!DOCTYPE html>\\r\\n<html class=\"no-js\" lang=\"en\">\\r\\n<head>\\r\\n<meta charset=\"utf-8\">\\r\\n<!--[if IE]> <meta http-equiv=\"X-UA-Compatible\" content=\"IE=edge\"> <![endif]-->\\r\\n<title>ARCHITEKCIplus</title>\\r\\n<meta n'\n",
-      "url: http://architektura.info.pl/2021/09/17/wanna-symbol-komfortu-w-twojej-lazience/\n",
-      "content-type: text/html; charset=UTF-8\n",
-      "content: b'<!DOCTYPE html><html lang=\"pl-PL\"><head><meta charset=\"UTF-8\"><meta name=\"viewport\" content=\"width=device-width, initial-scale=1\"><link rel=\"stylesheet\" media=\"print\" onload=\"this.onload=null;this.med'\n",
-      "url: http://archive.poppytalk.com/2011/10/art-tutorial-drink-up-these-kitchen.html?showComment=1320577585029\n",
-      "content-type: text/html; charset=UTF-8\n",
-      "content: b\"<!DOCTYPE html>\\n<html class='v2 item' dir='ltr' xmlns='http://www.w3.org/1999/xhtml' xmlns:b='http://www.google.com/2005/gml/b' xmlns:data='http://www.google.com/2005/gml/data' xmlns:expr='http://www.\"\n",
-      "url: http://archive.poppytalk.com/2012/02/6-fall-2012-fashion-week-must-haves.html?showComment=1329509673641\n",
-      "content-type: text/html; charset=UTF-8\n",
-      "content: b\"<!DOCTYPE html>\\n<html class='v2 item' dir='ltr' xmlns='http://www.w3.org/1999/xhtml' xmlns:b='http://www.google.com/2005/gml/b' xmlns:data='http://www.google.com/2005/gml/data' xmlns:expr='http://www.\"\n",
-      "url: http://archive.urbc.ru/3738-post3738.html\n",
-      "content-type: text/html\n",
-      "content: b'<!DOCTYPE html>\\r\\n<html lang=\"ru\">\\r\\n<head>\\r\\n<meta http-equiv=\"Content-Type\" content=\"text/html; charset=windows-1251\" />\\r\\n<title>\\xc5\\xea\\xe0\\xf2\\xe5\\xf0\\xe8\\xed\\xe1\\xf3\\xf0\\xe3\\xf1\\xea\\xe0\\xff \\xf4\\xe8\\xf0\\xec\\xe0 \\xab\\xca\\xee\\xed\\xf4\\xe8\\xbb \\xe2\\xee\\xe7\\xe3\\xeb\\xe0\\xe2\\xe8\\xf2 \\xed\\xee\\xe2\\xf3\\xfe \\xf0\\xee\\xf1\\xf1\\xe8\\xe9\\xf1\\xea\\xf3\\xfe \\xea\\xee\\xed\\xe4\\xe8\\xf2\\xe5\\xf0\\xf1\\xea\\xf3\\xfe \\xea'\n",
-      "url: http://archive.urbc.ru/4247-post4247.html\n",
-      "content-type: text/html\n",
-      "content: b'<!DOCTYPE html>\\r\\n<html lang=\"ru\">\\r\\n<head>\\r\\n<meta http-equiv=\"Content-Type\" content=\"text/html; charset=windows-1251\" />\\r\\n<title>\\xcf\\xe0\\xec\\xff\\xf2\\xed\\xfb\\xe5 \\xe4\\xe0\\xf2\\xfb &raquo; \\xc8\\xed\\xf4\\xee\\xf0\\xec\\xe0\\xf6\\xe8\\xee\\xed\\xed\\xee-\\xe0\\xed\\xe0\\xeb\\xe8\\xf2\\xe8\\xf7\\xe5\\xf1\\xea\\xee\\xe5 \\xe0\\xe3\\xe5\\xed\\xf2\\xf1\\xf2\\xe2\\xee \\xab\\xd3\\xf0\\xe0\\xeb\\xc1\\xe8\\xe7\\xed\\xe5\\xf1\\xca'\n",
-      "url: http://archive.wn.com/2004/01/02/1400/employment.html\n",
-      "content-type: text/html\n",
-      "content: b'<table border=\"0\" bgcolor=\"#ffffff\" cellpadding=\"4\" cellspacing=\"0\" width=\"100%\" color=\"#ffffff\"><tr><td><table border=\"0\" bgcolor=\"#d0d0d0\" cellpadding=\"2\" cellspacing=\"2\" width=\"100%\" color=\"#d0d0d0'\n",
-      "url: http://archive2016.muenchener-biennale.de/en/about-us/presenter/\n",
-      "content-type: text/html; charset=utf-8\n",
-      "content: b'\\n\\n\\n\\n\\n\\n\\t\\t\\n    \\n            \\n            \\n        \\n<!DOCTYPE html>\\n<!--[if IE 8]> \\t       <html class=\"no-js lt-ie9\" lang=\"en-GB\" > <![endif]-->\\n<!--[if gt IE 8]><!--> <html class=\"no-js\" lang=\"en-GB\" >'\n",
-      "url: http://archiwum.ciop.pl/20641.html\n",
-      "content-type: text/html; charset=iso-8859-2\n",
-      "content: b'<HTML>\\n<HEAD>\\n<META HTTP-EQUIV=Content-type CONTENT=\\'text/html; charset=iso-8859-2\\'>\\n<META NAME=\"keywords\" CONTENT=\"bhp, ha\\xb3as, noise control, konferencja, referaty\">\\n<META NAME=\"description\" CONTENT='\n",
-      "url: http://archsa.org/wp-content/uploads/2022/06/Nebraska_bishops.pdf\n",
-      "content-type: application/pdf\n",
-      "content: b'%PDF-1.5\\r%\\xe2\\xe3\\xcf\\xd3\\r\\n24 0 obj\\r<</Linearized 1/L 17959/O 26/E 7094/N 3/T 17645/H [ 473 178]>>\\rendobj\\r                   \\r\\n37 0 obj\\r<</DecodeParms<</Columns 4/Predictor 12>>/Filter/FlateDecode/ID[<5B43B2E217'\n",
-      "url: http://arcorusticon.com/\n",
-      "content-type: text/html\n",
-      "content: b'<!DOCTYPE html>\\n<!--[if lt IE 7]>      <html class=\"no-js lt-ie9 lt-ie8 lt-ie7\"> <![endif]-->\\n<!--[if IE 7]>         <html class=\"no-js lt-ie9 lt-ie8\"> <![endif]-->\\n<!--[if IE 8]>         <html class='\n",
-      "url: http://areso.eus/2016/03/page/3/\n",
-      "content-type: text/html; charset=UTF-8\n",
-      "content: b'<!doctype html>\\r\\n\\r\\n<!--[if lt IE 7]><html lang=\"eu\" prefix=\"og: http://ogp.me/ns#\" class=\"no-js lt-ie9 lt-ie8 lt-ie7\"><![endif]-->\\r\\n<!--[if (IE 7)&!(IEMobile)]><html lang=\"eu\" prefix=\"og: http://ogp.m'\n",
-      "url: http://argentina-anime.com/Tema-Fate-stay-night-Unlimited-Blade-Works--3994\n",
-      "content-type: text/html; charset=UTF-8\n",
-      "content: b'<!DOCTYPE html PUBLIC \"-//W3C//DTD XHTML 1.0 Transitional//EN\" \"http://www.w3.org/TR/xhtml1/DTD/xhtml1-transitional.dtd\"><!-- start: showthread -->\\n<html xml:lang=\"es\" lang=\"es\" xmlns=\"http://www.w3.o'\n",
-      "url: http://argonauta.pl/tag/przepowiednie/\n",
-      "content-type: text/html; charset=UTF-8\n",
-      "content: b'<!doctype html>\\r\\n<html lang=\"pl-PL\">\\r\\n<head>\\r\\n\\t<meta charset=\"UTF-8\">\\r\\n\\t<meta name=\"viewport\" content=\"width=device-width, initial-scale=1\">\\r\\n\\t<link rel=\"profile\" href=\"https://gmpg.org/xfn/11\">\\r\\n\\r\\n\\t<'\n",
-      "url: http://ariosto.ru/page/937\n",
-      "content-type: text/html; charset=UTF-8\n",
-      "content: b'<!DOCTYPE html PUBLIC \"-//W3C//DTD XHTML 1.0 Transitional//EN\" \"http://www.w3.org/TR/xhtml1/DTD/xhtml1-transitional.dtd\">\\r\\n<html xmlns=\"http://www.w3.org/1999/xhtml\" lang=\"ru-RU\">\\r\\n<head>\\r\\n<!--TB JS -'\n",
-      "url: http://arkadiahurt.pl/291-pedzle-maestro\n",
-      "content-type: text/html; charset=utf-8\n",
-      "content: b'<!DOCTYPE HTML> <!--[if lt IE 7]><html class=\"no-js lt-ie9 lt-ie8 lt-ie7\" lang=\"pl-pl\"><![endif]--> <!--[if IE 7]><html class=\"no-js lt-ie9 lt-ie8 ie7\" lang=\"pl-pl\"><![endif]--> <!--[if IE 8]><html cl'\n",
-      "url: http://arkfurnitures.com/product/tripple/\n",
-      "content-type: text/html; charset=UTF-8\n",
-      "content: b'<!DOCTYPE html>\\n<html lang=\"en\" class=\"no-js\">\\n\\n<head>\\n\\n<meta charset=\"UTF-8\" />\\n<link rel=\"alternate\" hreflang=\"en\" href=\"http://arkfurnitures.com/shop/\"/>\\n<title>TRIPPLE &#8211; ARK FURNITURE</title'\n",
-      "url: http://armorique.blog.fc2.com/blog-entry-4214.html\n",
-      "content-type: text/html;charset=utf-8\n",
-      "content: b'<?xml version=\"1.0\" encoding=\"utf-8\"?><!DOCTYPE html PUBLIC \"-//W3C//DTD XHTML 1.0 Transitional//EN\" \"http://www.w3.org/TR/xhtml1/DTD/xhtml1-transitional.dtd\">\\r\\n<html  dir=\"ltr\" xmlns=\"http://www.w3.o'\n",
-      "url: http://arseniev-eparhia.ru/inocheskiy-postrig/\n",
-      "content-type: text/html; charset=UTF-8\n",
-      "content: b'<!DOCTYPE html PUBLIC \"-//W3C//DTD XHTML 1.0 Transitional//EN\" \"http://www.w3.org/TR/xhtml1/DTD/xhtml1-transitional.dtd\"><html xmlns=\"http://www.w3.org/1999/xhtml\" lang=\"ru-RU\"><head profile=\"http://g'\n",
-      "url: http://art-exlibris.net/person/6396\n",
-      "content-type: text/html; charset=UTF-8\n",
-      "content: b'<!DOCTYPE html PUBLIC \"-//W3C//DTD XHTML 1.0 Transitional//EN\" \"http://www.w3.org/TR/xhtml1/DTD/xhtml1-transitional.dtd\">\\n<html xmlns=\"http://www.w3.org/1999/xhtml\">\\n<head>\\n<meta http-equiv=\"Content-T'\n",
-      "url: http://artefaccio.blogspot.com/2016/03/sleeping-beauty-turquoise-copper-wire.html\n",
-      "content-type: text/html; charset=UTF-8\n",
-      "content: b\"<!DOCTYPE html>\\n<html class='v2' dir='ltr' lang='en-GB'>\\n<head>\\n<link href='https://www.blogger.com/static/v1/widgets/3566091532-css_bundle_v2.css' rel='stylesheet' type='text/css'/>\\n<meta content='wi\"\n",
-      "url: http://artem.kolesalux.ru/diski-ls-flowforming-wheels.html\n",
-      "content-type: text/html; charset=UTF-8\n",
-      "content: b'<!DOCTYPE HTML PUBLIC \"-//W3C//DTD HTML 4.01 Transitional//EN\">\\n<html><head><title>\\xd0\\x94\\xd0\\xb8\\xd1\\x81\\xd0\\xba\\xd0\\xb8 LS FlowForming \\xd0\\xba\\xd1\\x83\\xd0\\xbf\\xd0\\xb8\\xd1\\x82\\xd1\\x8c \\xd0\\xb0\\xd0\\xb2\\xd1\\x82\\xd0\\xbe\\xd0\\xbc\\xd0\\xbe\\xd0\\xb1\\xd0\\xb8\\xd0\\xbb\\xd1\\x8c\\xd0\\xbd\\xd1\\x8b\\xd0\\xb5 \\xd0\\xba\\xd0\\xbe\\xd0\\xbb\\xd0\\xb5\\xd1\\x81\\xd0\\xbd\\xd1\\x8b\\xd0\\xb5 \\xd0\\xbb\\xd0\\xb8\\xd1\\x82\\xd1\\x8b\\xd0\\xb5 \\xd0\\xb4\\xd0\\xb8\\xd1\\x81\\xd0\\xba\\xd0\\xb8 \\xd0\\xa4\\xd0\\x9b\\xd0\\x9e\\xd0\\xa3 \\xd0\\xa4\\xd0'\n",
-      "url: http://articles.ivymag.org/ivysubs/moreabo0_memo.html\n",
-      "content-type: text/html; charset=UTF-8\n",
-      "content: b'<HTML>\\n  <HEAD><script type=\"text/javascript\">(window.NREUM||(NREUM={})).init={privacy:{cookies_enabled:true},ajax:{deny_list:[\"bam.nr-data.net\"]},distributed_tracing:{enabled:true}};(window.NREUM||(N'\n",
-      "url: http://artofthinkingsmart.com/2012/02/\n",
-      "content-type: text/html\n",
-      "content: b'<!DOCTYPE html>\\n<html lang=\"en\">\\n<head>\\n    <meta charset=\"UTF-8\">\\n    <title>Captcha</title>\\n    <link rel=\"stylesheet\"\\n          href=\"https://maxcdn.bootstrapcdn.com/bootstrap/3.3.6/css/bootstrap.m'\n",
-      "url: http://arvidlone.com/product/organization36762?id=985\n",
-      "content-type: text/html; charset=UTF-8\n",
-      "content: b'<!DOCTYPE html>\\r\\n<html class=\"no-js\" lang=\"ja\">\\r\\n<head>\\r\\n<meta http-equiv=\"X-UA-Compatible\" content=\"IE=edge\">\\r\\n<meta charset=\"UTF-8\">\\r\\n<meta name=\"viewport\" content=\"width=device-width, initial-scale'\n",
-      "url: http://arzone.ning.com/gifts/gift/list?screenName=2gp9st530pcpk\n",
-      "content-type: text/html; charset=UTF-8\n",
-      "content: b'<!DOCTYPE html>\\n<html lang=\"en\" xmlns:og=\"http://ogp.me/ns#\">\\n    <head data-layout-view=\"default\">\\n<script>\\n    window.dataLayer = window.dataLayer || [];\\n        </script>\\n<!-- Google Tag Manager --'\n",
-      "url: http://asa-kensetsu.com/related/\n",
-      "content-type: text/html; charset=UTF-8\n",
-      "content: b'<!DOCTYPE html>\\r\\n<!--[if IE]>\\r\\n<meta http-equiv=\"X-UA-Compatible\" content=\"IE=Edge\">\\r\\n<![endif]-->\\r\\n<html xmlns:fb=\"http://ogp.me/ns/fb#\" lang=\"ja\">\\r\\n<head>\\r\\n<meta charset=\"UTF-8\" />\\r\\n<title>\\xe9\\x96\\xa2\\xe9\\x80\\xa3\\xe5\\x9b\\xa3'\n",
-      "url: http://asahi25881939.blog.fc2.com/blog-date-20130524.html\n",
-      "content-type: text/html;charset=utf-8\n",
-      "content: b'<!DOCTYPE HTML PUBLIC \"-//W3C//DTD HTML 4.01 Transitional//EN\" \"http://www.w3.org/TR/html4/loose.dtd\">\\n<html>\\n<head>\\n<meta http-equiv=\"content-type\" content=\"text/html; charset=utf-8\">\\n<meta http-equi'\n",
-      "url: http://asahi25881939.blog.fc2.com/blog-date-20140211.html\n",
-      "content-type: text/html;charset=utf-8\n",
-      "content: b'<!DOCTYPE HTML PUBLIC \"-//W3C//DTD HTML 4.01 Transitional//EN\" \"http://www.w3.org/TR/html4/loose.dtd\">\\n<html>\\n<head>\\n<meta http-equiv=\"content-type\" content=\"text/html; charset=utf-8\">\\n<meta http-equi'\n",
-      "url: http://ascelin.com/kort-blond-kapsel-2022/\n",
-      "content-type: text/html; charset=UTF-8\n",
-      "content: b'<!DOCTYPE html>\\n<html xmlns=\"http://www.w3.org/1999/xhtml\" dir=\"ltr\" lang=\"nl\" prefix=\"og: https://ogp.me/ns#\">\\n<head>\\n<meta http-equiv=\"Content-Type\" content=\"text/html; charset=UTF-8\">\\n<meta name=\"v'\n",
-      "url: http://asianteenytubes.net/moviehd/phthisic-jav-academy-tsun-fucks-saturated-file-accommodations-off-out-be-required-of-one-s-mind-mendicant/index.html\n",
-      "content-type: text/html; charset=UTF-8\n",
-      "content: b'<!DOCTYPE html>\\n<html lang=\"en\">\\n<head> <meta name=\"referrer\" content=\"unsafe-url\">\\n<meta charset=\"utf-8\">\\n<meta name=\"viewport\" content=\"width=device-width, initial-scale=1\">\\n<title>Asian porn video<'\n",
-      "url: http://asienveracruz.blogspot.com/2014/07/busca-veracruz-ser-sede-del-congreso.html\n",
-      "content-type: text/html; charset=UTF-8\n",
-      "content: b\"<!DOCTYPE html>\\n<html class='v2' dir='ltr' lang='es'>\\n<head>\\n<link href='https://www.blogger.com/static/v1/widgets/3566091532-css_bundle_v2.css' rel='stylesheet' type='text/css'/>\\n<meta content='width\"\n",
-      "url: http://asmo45.ru/news/vargashinskij_okrug_vargashinskaja_pchjolka/2024-01-15-6873\n",
-      "content-type: text/html; charset=UTF-8\n",
-      "content: b'<html>\\n<head>\\n<meta http-equiv=\"content-type\" content=\"text/html; charset=UTF-8\">\\n<title>\\xd0\\xa1\\xd0\\xbe\\xd0\\xb2\\xd0\\xb5\\xd1\\x82 \\xd0\\xbc\\xd1\\x83\\xd0\\xbd\\xd0\\xb8\\xd1\\x86\\xd0\\xb8\\xd0\\xbf\\xd0\\xb0\\xd0\\xbb\\xd1\\x8c\\xd0\\xbd\\xd1\\x8b\\xd1\\x85 \\xd0\\xbe\\xd0\\xb1\\xd1\\x80\\xd0\\xb0\\xd0\\xb7\\xd0\\xbe\\xd0\\xb2\\xd0\\xb0\\xd0\\xbd\\xd0\\xb8\\xd0\\xb9 \\xd0\\x9a\\xd1\\x83\\xd1\\x80\\xd0\\xb3\\xd0\\xb0\\xd0\\xbd\\xd1\\x81\\xd0\\xba\\xd0\\xbe\\xd0\\xb9 \\xd0\\xbe\\xd0\\xb1\\xd0\\xbb\\xd0\\xb0\\xd1\\x81\\xd1\\x82\\xd0\\xb8 - \\xd0\\x9d\\xd0\\xbe\\xd0\\xb2\\xd0\\xbe\\xd1\\x81\\xd1\\x82'\n"
-     ]
-    }
-   ],
-   "execution_count": 26
-  },
-  {
-   "metadata": {
-    "ExecuteTime": {
-     "end_time": "2025-10-09T13:33:46.725140Z",
-     "start_time": "2025-10-09T13:33:46.704415Z"
-    }
-   },
-   "cell_type": "code",
-   "source": [
-    "warc_path = './TEST-000000.extracted.warc.gz'\n",
-    "dump_all_records(warc_path, limit=5)"
-   ],
-   "id": "d1d433956ce0f3fa",
-   "outputs": [
-    {
-     "name": "stdout",
-     "output_type": "stream",
-     "text": [
-      "url: https://turux.at/\n",
-      "content-type: text/html\n",
-      "content: b'<!DOCTYPE html PUBLIC \"-//W3C//DTD XHTML 1.0 Transitional//EN\" \"http://www.w3.org/TR/xhtml1/DTD/xhtml1-transitional.dtd\">\\n<html xmlns=\"http://www.w3.org/1999/xhtml\">\\n<head>\\n<meta http-equiv=\"Content-T'\n",
-      "url: http://turux.at/\n",
-      "content-type: text/html; charset=iso-8859-1\n",
-      "content: b'<!DOCTYPE HTML PUBLIC \"-//IETF//DTD HTML 2.0//EN\">\\n<html><head>\\n<title>302 Found</title>\\n</head><body>\\n<h1>Found</h1>\\n<p>The document has moved <a href=\"https://turux.at/\">here</a>.</p>\\n<hr>\\n<address>'\n"
-     ]
-    }
-   ],
-   "execution_count": 8
-  },
-  {
-   "metadata": {
-    "ExecuteTime": {
-     "end_time": "2025-10-09T13:39:50.153808Z",
-     "start_time": "2025-10-09T13:39:49.714987Z"
-    }
-   },
-   "cell_type": "code",
-   "source": [
-    "import os\n",
-    "import json\n",
-    "import pandas as pd\n",
-    "\n",
-    "cdxj_path = os.path.splitext(warc_path)[0] + '.cdxj'\n",
-    "objects = []\n",
-    "with open(cdxj_path, 'rt') as f:\n",
-    "    for line in f:\n",
-    "        surl, timestamp, json_dict = line.split(' ', 2)\n",
-    "        data = json.loads(json_dict)\n",
-    "        data.update({'surl': surl, 'timestamp': timestamp})\n",
-    "        print(surl, timestamp, data)\n",
-    "        objects.append(data)\n",
-    "\n",
-    "df = pd.DataFrame.from_records(objects)\n"
-   ],
-   "id": "92ec566d3f2fc08e",
-   "outputs": [
-    {
-     "name": "stdout",
-     "output_type": "stream",
-     "text": [
-      "at,turux)/ 20250911025500 {'url': 'https://turux.at/', 'mime': 'text/html', 'status': '200', 'digest': 'sha1:23AM7B43CHDLUEZTG2SLWB5DF25YJJTX', 'length': '1993', 'offset': '358', 'filename': 'TEST-000000.extracted.warc.gz', 'surl': 'at,turux)/', 'timestamp': '20250911025500'}\n",
-      "at,turux)/ 20250911030852 {'url': 'http://turux.at/', 'mime': 'text/html', 'status': '302', 'digest': 'sha1:MPFPQGOFANQKEGZJCIZMGJPBYSFEYW6V', 'length': '813', 'offset': '2351', 'filename': 'TEST-000000.extracted.warc.gz', 'surl': 'at,turux)/', 'timestamp': '20250911030852'}\n"
-     ]
-    }
-   ],
-   "execution_count": 15
-  },
-  {
-   "metadata": {
-    "ExecuteTime": {
-     "end_time": "2025-10-09T13:39:53.792972Z",
-     "start_time": "2025-10-09T13:39:53.754838Z"
-    }
-   },
-   "cell_type": "code",
-   "source": "df",
-   "id": "7f698d6e6ee84795",
-   "outputs": [
-    {
-     "data": {
-      "text/plain": [
-       "                 url       mime status                                 digest  \\\n",
-       "0  https://turux.at/  text/html    200  sha1:23AM7B43CHDLUEZTG2SLWB5DF25YJJTX   \n",
-       "1   http://turux.at/  text/html    302  sha1:MPFPQGOFANQKEGZJCIZMGJPBYSFEYW6V   \n",
-       "\n",
-       "  length offset                       filename        surl       timestamp  \n",
-       "0   1993    358  TEST-000000.extracted.warc.gz  at,turux)/  20250911025500  \n",
-       "1    813   2351  TEST-000000.extracted.warc.gz  at,turux)/  20250911030852  "
-      ],
-      "text/html": [
-       "<div>\n",
-       "<style scoped>\n",
-       "    .dataframe tbody tr th:only-of-type {\n",
-       "        vertical-align: middle;\n",
-       "    }\n",
-       "\n",
-       "    .dataframe tbody tr th {\n",
-       "        vertical-align: top;\n",
-       "    }\n",
-       "\n",
-       "    .dataframe thead th {\n",
-       "        text-align: right;\n",
-       "    }\n",
-       "</style>\n",
-       "<table border=\"1\" class=\"dataframe\">\n",
-       "  <thead>\n",
-       "    <tr style=\"text-align: right;\">\n",
-       "      <th></th>\n",
-       "      <th>url</th>\n",
-       "      <th>mime</th>\n",
-       "      <th>status</th>\n",
-       "      <th>digest</th>\n",
-       "      <th>length</th>\n",
-       "      <th>offset</th>\n",
-       "      <th>filename</th>\n",
-       "      <th>surl</th>\n",
-       "      <th>timestamp</th>\n",
-       "    </tr>\n",
-       "  </thead>\n",
-       "  <tbody>\n",
-       "    <tr>\n",
-       "      <th>0</th>\n",
-       "      <td>https://turux.at/</td>\n",
-       "      <td>text/html</td>\n",
-       "      <td>200</td>\n",
-       "      <td>sha1:23AM7B43CHDLUEZTG2SLWB5DF25YJJTX</td>\n",
-       "      <td>1993</td>\n",
-       "      <td>358</td>\n",
-       "      <td>TEST-000000.extracted.warc.gz</td>\n",
-       "      <td>at,turux)/</td>\n",
-       "      <td>20250911025500</td>\n",
-       "    </tr>\n",
-       "    <tr>\n",
-       "      <th>1</th>\n",
-       "      <td>http://turux.at/</td>\n",
-       "      <td>text/html</td>\n",
-       "      <td>302</td>\n",
-       "      <td>sha1:MPFPQGOFANQKEGZJCIZMGJPBYSFEYW6V</td>\n",
-       "      <td>813</td>\n",
-       "      <td>2351</td>\n",
-       "      <td>TEST-000000.extracted.warc.gz</td>\n",
-       "      <td>at,turux)/</td>\n",
-       "      <td>20250911030852</td>\n",
-       "    </tr>\n",
-       "  </tbody>\n",
-       "</table>\n",
-       "</div>"
-      ]
-     },
-     "execution_count": 16,
-     "metadata": {},
-     "output_type": "execute_result"
-    }
-   ],
-   "execution_count": 16
-  },
-  {
-   "metadata": {
-    "ExecuteTime": {
-     "end_time": "2025-10-09T13:14:05.351832Z",
-     "start_time": "2025-10-09T13:14:05.333388Z"
-    }
-   },
-   "cell_type": "code",
-   "source": "r = get_first_record(warc_path)",
-   "id": "2b296a5741ca8045",
-   "outputs": [],
-   "execution_count": 10
-  },
-  {
-   "metadata": {
-    "ExecuteTime": {
-     "end_time": "2025-10-09T13:15:07.506053Z",
-     "start_time": "2025-10-09T13:15:07.485847Z"
-    }
-   },
-   "cell_type": "code",
-   "source": "r.content_stream().read()",
-   "id": "e7b2171bcad517f7",
-   "outputs": [
-    {
-     "data": {
-      "text/plain": [
-       "b'<meta http-equiv=Refresh content=0;url=/web/error.php?id=2>'"
-      ]
-     },
-     "execution_count": 18,
-     "metadata": {},
-     "output_type": "execute_result"
-    }
-   ],
-   "execution_count": 18
-  },
-  {
-   "metadata": {},
-   "cell_type": "code",
-   "outputs": [],
-   "execution_count": null,
-   "source": "",
-   "id": "f7293efd120ac1b4"
-  }
- ],
- "metadata": {
-  "kernelspec": {
-   "display_name": "Python 3",
-   "language": "python",
-   "name": "python3"
-  },
-  "language_info": {
-   "codemirror_mode": {
-    "name": "ipython",
-    "version": 2
-   },
-   "file_extension": ".py",
-   "mimetype": "text/x-python",
-   "name": "python",
-   "nbconvert_exporter": "python",
-   "pygments_lexer": "ipython2",
-   "version": "2.7.6"
-  }
- },
- "nbformat": 4,
- "nbformat_minor": 5
-}

From c19e49b336e0b3248724a2b4103425e9021da949 Mon Sep 17 00:00:00 2001
From: Damian Stewart <ot@damianstewart.com>
Date: Wed, 15 Oct 2025 15:20:57 +0200
Subject: [PATCH 15/22] expand and clarify CDXT section

---
 Makefile | 4 ++--
 1 file changed, 2 insertions(+), 2 deletions(-)

diff --git a/Makefile b/Makefile
index 3e58cb7..1c7df3c 100644
--- a/Makefile
+++ b/Makefile
@@ -41,8 +41,8 @@ cdx_toolkit:
 	@echo
 	@echo cleanup previous work
 	rm -f TEST-000000.extracted.warc.gz
-	@echo extract the content from the commoncrawl s3 bucket, using the timestamp from above
-	cdxt --limit 1 --crawl CC-MAIN-2024-22 --from 20240518015810 --to 20240518015810 warc an.wikipedia.org/wiki/Escopete
+	@echo retrieve the content from the commoncrawl s3 bucket
+	cdxt --limit 1 --crawl CC-MAIN-2024-22 warc an.wikipedia.org/wiki/Escopete
 	@echo
 	@echo index this new warc
 	cdxj-indexer TEST-000000.extracted.warc.gz  > TEST-000000.extracted.warc.cdxj

From 1ed0f13639dfc1389a2bef715ed5f1455cc2f82d Mon Sep 17 00:00:00 2001
From: Damian Stewart <ot@damianstewart.com>
Date: Wed, 15 Oct 2025 15:26:21 +0200
Subject: [PATCH 16/22] add note about wildcards

---
 README.md | 5 +++++
 1 file changed, 5 insertions(+)

diff --git a/README.md b/README.md
index f129563..34190f6 100644
--- a/README.md
+++ b/README.md
@@ -379,8 +379,13 @@ There's a lot going on here so let's unpack it a little.
 
 We check for capture results using the `cdxt` command `iter`, specifying the exact URL `an.wikipedia.org/wiki/Escopete` and the crawl identifier `CC-MAIN-2024-22`. The result of this tells us that the crawl successfuly fetched this page at timestamp `20240518015810`.
 * You can try removing the `--limit 1` flag and/or replacing `--crawl CC-MAIN-2024-22` with `--cc`, which will return more results reflecting more times when this URL was crawled. 
+<<<<<<< HEAD
 * You can also use `--from <timestamp>` and `--to <timestamp>` to restrict the time range when the URL was crawled. This can even be used to pinpoint an exact record — for example, `--from 20240518015810 --to 20240518015810` will only ever return the record that we've been looking at elsewhere in this tutorial.
 * URLs may be specified with wildcards to return even more results: `"an.wikipedia.org/wiki/Escop*"` matches `an.wikipedia.org/wiki/Escopulión` and `an.wikipedia.org/wiki/Escopete`.
+=======
+* You can also use `--from <timestamp>` and `--to <timestamp>` to restrict the time range. This can even be used to pinpoint an exact record. For example, `--from 20240518015810 --to 20240518015810` will only ever return the record that we've been looking at elsewhere in the whirlwind tour.
+* URLs may be specified with wildcards to return even more results - `"an.wikipedia.org/wiki/Escop*"` matches `an.wikipedia.org/wiki/Escopulión` and `an.wikipedia.org/wiki/Escopete`.
+>>>>>>> 47b6ac9 (add note about wildcards)
 
 #### Retrieve the fetched content as WARC
 

From 9850eac7b470c1c4199764642447184e88161169 Mon Sep 17 00:00:00 2001
From: Damian Stewart <ot@damianstewart.com>
Date: Wed, 15 Oct 2025 15:31:18 +0200
Subject: [PATCH 17/22] polish task 6

---
 README.md | 5 -----
 1 file changed, 5 deletions(-)

diff --git a/README.md b/README.md
index 34190f6..f129563 100644
--- a/README.md
+++ b/README.md
@@ -379,13 +379,8 @@ There's a lot going on here so let's unpack it a little.
 
 We check for capture results using the `cdxt` command `iter`, specifying the exact URL `an.wikipedia.org/wiki/Escopete` and the crawl identifier `CC-MAIN-2024-22`. The result of this tells us that the crawl successfuly fetched this page at timestamp `20240518015810`.
 * You can try removing the `--limit 1` flag and/or replacing `--crawl CC-MAIN-2024-22` with `--cc`, which will return more results reflecting more times when this URL was crawled. 
-<<<<<<< HEAD
 * You can also use `--from <timestamp>` and `--to <timestamp>` to restrict the time range when the URL was crawled. This can even be used to pinpoint an exact record — for example, `--from 20240518015810 --to 20240518015810` will only ever return the record that we've been looking at elsewhere in this tutorial.
 * URLs may be specified with wildcards to return even more results: `"an.wikipedia.org/wiki/Escop*"` matches `an.wikipedia.org/wiki/Escopulión` and `an.wikipedia.org/wiki/Escopete`.
-=======
-* You can also use `--from <timestamp>` and `--to <timestamp>` to restrict the time range. This can even be used to pinpoint an exact record. For example, `--from 20240518015810 --to 20240518015810` will only ever return the record that we've been looking at elsewhere in the whirlwind tour.
-* URLs may be specified with wildcards to return even more results - `"an.wikipedia.org/wiki/Escop*"` matches `an.wikipedia.org/wiki/Escopulión` and `an.wikipedia.org/wiki/Escopete`.
->>>>>>> 47b6ac9 (add note about wildcards)
 
 #### Retrieve the fetched content as WARC
 

From 6634c3a0460e2d11b9c101062612e55b1601115c Mon Sep 17 00:00:00 2001
From: Damian Stewart <ot@damianstewart.com>
Date: Wed, 15 Oct 2025 15:37:09 +0200
Subject: [PATCH 18/22] revert file

---
 CC-MAIN-2024-22.warc.paths.gz | Bin 844 -> 817 bytes
 1 file changed, 0 insertions(+), 0 deletions(-)

diff --git a/CC-MAIN-2024-22.warc.paths.gz b/CC-MAIN-2024-22.warc.paths.gz
index 4099c937498315ba83082f995ada7387e218c4db..0ff536d75299e54bb5edb342fa040d3a0743fadb 100644
GIT binary patch
delta 19
YcmX@ZwvmlXzMF#q1elmNs;V;s04W*+O8@`>

delta 46
zcmdnUc7{z=zMF&NMgH>)24-hxU0+8}KV2gOBNJUCBfav(qGY{-#FC6+hK*e6%m7J#
B4SoOs


From 4dbada0172aadbf5ed8e895f33deb1585151b664 Mon Sep 17 00:00:00 2001
From: handecelikkanat <7702228+handecelikkanat@users.noreply.github.com>
Date: Wed, 15 Oct 2025 17:06:44 +0300
Subject: [PATCH 19/22] fix(Makefile): Fix typo in comment

---
 Makefile | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/Makefile b/Makefile
index 1c7df3c..be25ad9 100644
--- a/Makefile
+++ b/Makefile
@@ -36,7 +36,7 @@ extract:
 	@echo "hint: python -m json.tool extraction.json"
 
 cdx_toolkit:
-	@echo look up this capture in the comoncrawl cdx index for CC-MAIN-2024-22, returning only the first match
+	@echo look up this capture in the commoncrawl cdx index for CC-MAIN-2024-22, returning only the first match
 	cdxt --limit 1 --crawl CC-MAIN-2024-22 iter an.wikipedia.org/wiki/Escopete
 	@echo
 	@echo cleanup previous work

From 16e4596a263f8a82432d237afcf3dc7af15192d4 Mon Sep 17 00:00:00 2001
From: handecelikkanat <7702228+handecelikkanat@users.noreply.github.com>
Date: Wed, 15 Oct 2025 17:28:58 +0300
Subject: [PATCH 20/22] fix(Makefile): Fix unaligned echo output between
 Makefile and README.md

---
 Makefile | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/Makefile b/Makefile
index be25ad9..da1394b 100644
--- a/Makefile
+++ b/Makefile
@@ -36,7 +36,7 @@ extract:
 	@echo "hint: python -m json.tool extraction.json"
 
 cdx_toolkit:
-	@echo look up this capture in the commoncrawl cdx index for CC-MAIN-2024-22, returning only the first match
+	@echo lookup captures for the given url in the commoncrawl cdx index for CC-MAIN-2024-22, returning only the first match
 	cdxt --limit 1 --crawl CC-MAIN-2024-22 iter an.wikipedia.org/wiki/Escopete
 	@echo
 	@echo cleanup previous work

From ef167da31277d4b404fa6771abe1186bc4959c82 Mon Sep 17 00:00:00 2001
From: Damian Stewart <ot@damianstewart.com>
Date: Mon, 20 Oct 2025 11:02:21 +0200
Subject: [PATCH 21/22] add note that "offset" is unstable

Signed-off-by: Damian Stewart <ot@damianstewart.com>
---
 README.md | 9 +++++++--
 1 file changed, 7 insertions(+), 2 deletions(-)

diff --git a/README.md b/README.md
index 4b0ea11..afb42f1 100644
--- a/README.md
+++ b/README.md
@@ -389,9 +389,14 @@ Next, we use the `cdxt` command `warc` to retrieve the content and save it local
 * By default `cdxt` avoids overwriting existing files by automatically incrementing the counter in the filename. If you run this again without deleting `TEST-000000.extracted.warc.gz`, the data will be written again to a new file `TEST-000001.extracted.warc.gz`.
 * Limit, timestamp, and crawl index args, as well as URL wildcards, work as for `iter`.
 
-### Indexing the WARC and viewing its contents
+### Indexing the WARC
 
-Finally, we run `cdxj-indexer` on this new WARC to make a CDXJ index of it as in Task 3, and then iterate over the WARC using `warcio-iterator.py` as in Task 2.
+We now run `cdxj-indexer` on our new `TEST-000000.extracted.warc.gz` to make a CDXJ index of it as in Task 3. 
+* Note that because the WARC includes metadata that is dynamically generated, you may see a slightly different value for `offset` than the one shown in the output above.
+
+### View the CDXJ index
+
+Finally, we iterate over the WARC using `warcio-iterator.py` as in Task 2.
 
 ## Task 7: Find the right part of the columnar index 
 

From b2d310b95d68b16a62ec6e8b849c56853adb5139 Mon Sep 17 00:00:00 2001
From: Damian Stewart <ot@damianstewart.com>
Date: Mon, 27 Oct 2025 11:31:47 +0100
Subject: [PATCH 22/22] address comments

---
 Makefile  |  6 +++---
 README.md | 15 ++++++++-------
 2 files changed, 11 insertions(+), 10 deletions(-)

diff --git a/Makefile b/Makefile
index da1394b..a97cee0 100644
--- a/Makefile
+++ b/Makefile
@@ -36,13 +36,13 @@ extract:
 	@echo "hint: python -m json.tool extraction.json"
 
 cdx_toolkit:
-	@echo lookup captures for the given url in the commoncrawl cdx index for CC-MAIN-2024-22, returning only the first match
-	cdxt --limit 1 --crawl CC-MAIN-2024-22 iter an.wikipedia.org/wiki/Escopete
+	@echo demonstrate that we have this entry in the index
+	cdxt --crawl CC-MAIN-2024-22 --from 20240518015810 --to 20240518015810 iter an.wikipedia.org/wiki/Escopete
 	@echo
 	@echo cleanup previous work
 	rm -f TEST-000000.extracted.warc.gz
 	@echo retrieve the content from the commoncrawl s3 bucket
-	cdxt --limit 1 --crawl CC-MAIN-2024-22 warc an.wikipedia.org/wiki/Escopete
+	cdxt --crawl CC-MAIN-2024-22 --from 20240518015810 --to 20240518015810 warc an.wikipedia.org/wiki/Escopete
 	@echo
 	@echo index this new warc
 	cdxj-indexer TEST-000000.extracted.warc.gz  > TEST-000000.extracted.warc.cdxj
diff --git a/README.md b/README.md
index f129563..81e91dd 100644
--- a/README.md
+++ b/README.md
@@ -350,14 +350,14 @@ The output looks like this:
   <summary>Click to view output</summary>
 
 ```
-lookup captures for the given url in the commoncrawl cdx index for CC-MAIN-2024-22, returning only the first match
-cdxt --limit 1 --crawl CC-MAIN-2024-22 iter an.wikipedia.org/wiki/Escopete
+demonstrate that we have this entry in the index
+cdxt --crawl CC-MAIN-2024-22 --from 20240518015810 --to 20240518015810 iter an.wikipedia.org/wiki/Escopete
 status 200, timestamp 20240518015810, url https://an.wikipedia.org/wiki/Escopete
 
 cleanup previous work
 rm -f TEST-000000.extracted.warc.gz
 retrieve the content from the commoncrawl s3 bucket
-cdxt --limit 1 --crawl CC-MAIN-2024-22 warc an.wikipedia.org/wiki/Escopete
+cdxt --crawl CC-MAIN-2024-22 --from 20240518015810 --to 20240518015810 warc an.wikipedia.org/wiki/Escopete
 
 index this new warc
 cdxj-indexer TEST-000000.extracted.warc.gz  > TEST-000000.extracted.warc.cdxj
@@ -377,14 +377,15 @@ There's a lot going on here so let's unpack it a little.
 
 #### Check that the crawl has a record for the page we are interested in
 
-We check for capture results using the `cdxt` command `iter`, specifying the exact URL `an.wikipedia.org/wiki/Escopete` and the crawl identifier `CC-MAIN-2024-22`. The result of this tells us that the crawl successfuly fetched this page at timestamp `20240518015810`.
-* You can try removing the `--limit 1` flag and/or replacing `--crawl CC-MAIN-2024-22` with `--cc`, which will return more results reflecting more times when this URL was crawled. 
-* You can also use `--from <timestamp>` and `--to <timestamp>` to restrict the time range when the URL was crawled. This can even be used to pinpoint an exact record — for example, `--from 20240518015810 --to 20240518015810` will only ever return the record that we've been looking at elsewhere in this tutorial.
+We check for capture results using the `cdxt` command `iter`, specifying the exact URL `an.wikipedia.org/wiki/Escopete` and the timestamp range `--from 20240518015810 --to 20240518015810`. The result of this tells us that the crawl successfuly fetched this page at timestamp `20240518015810`. 
+* Captures are named by the surtkey and the time.
+* Instead of `--crawl CC-MAIN-2024-22`, you could pass `--cc` to search across all crawls.
+* You can pass `--limit <N>` to limit the number of results returned - in this case because we have restricted the timestamp range to a single value, we only expect one result.
 * URLs may be specified with wildcards to return even more results: `"an.wikipedia.org/wiki/Escop*"` matches `an.wikipedia.org/wiki/Escopulión` and `an.wikipedia.org/wiki/Escopete`.
 
 #### Retrieve the fetched content as WARC
 
-Next, we use the `cdxt` command `warc` to retrieve the content and save it locally as a new WARC file, again specifying the exact URL and crawl identifier. This creates the WARC file `TEST-000000.extracted.warc.gz` which contains a `warcinfo` record explaining what the WARC is, followed by the `response` record we requested. 
+Next, we use the `cdxt` command `warc` to retrieve the content and save it locally as a new WARC file, again specifying the exact URL, crawl identifier, and timestamp range. This creates the WARC file `TEST-000000.extracted.warc.gz` which contains a `warcinfo` record explaining what the WARC is, followed by the `response` record we requested. 
 * If you dig into cdx_toolkit's code, you'll find that it is using the offset and length of the WARC record (as returned by the CDX index query) to make a HTTP byte range request to S3 that isolates and returns just the single record we want from the full file. It only downloads the response WARC record because our CDX index only has the response records indexed.
 * By default `cdxt` avoids overwriting existing files by automatically incrementing the counter in the filename. If you run this again without deleting `TEST-000000.extracted.warc.gz`, the data will be written again to a new file `TEST-000001.extracted.warc.gz`.
 * Limit, timestamp, and crawl index args, as well as URL wildcards, work as for `iter`.