diff --git a/lessons/02_web_scraping.ipynb b/lessons/02_web_scraping.ipynb index 696e0b0..525cc44 100644 --- a/lessons/02_web_scraping.ipynb +++ b/lessons/02_web_scraping.ipynb @@ -4,21 +4,21 @@ "cell_type": "markdown", "metadata": {}, "source": [ - "# Web Scraping con Beautiful Soup\n", + "# Web Scraping with Beautiful Soup\n", "\n", - "* * *\n", + "* * * \n", "\n", - "### Íconos usados ​​en este cuaderno\n", - "🔔 **Pregunta**: Una pregunta rápida para ayudarte a entender qué está pasando.
\n", - "🥊 **Desafío**: Ejercicio interactivo. ¡Lo resolveremos en el taller!
\n", - "⚠️ **Advertencia**: Atención sobre aspectos complicados o errores comunes.
\n", - "💡 **Consejo**: Cómo hacer algo de forma más eficiente o efectiva.
\n", - "🎬 **Demostración**: ¡Mostrando algo más avanzado para que sepas para qué se puede usar Python!
\n", + "### Icons used in this notebook\n", + "🔔 **Question**: A quick question to help you understand what's going on.
\n", + "🥊 **Challenge**: Interactive exercise. We'll work through these in the workshop!
\n", + "⚠️ **Warning**: Heads-up about tricky stuff or common mistakes.
\n", + "💡 **Tip**: How to do something a bit more efficiently or effectively.
\n", + "🎬 **Demo**: Showing off something more advanced – so you know what Python can be used for!
\n", "\n", - "### Objetivos de aprendizaje\n", - "1. [Reflexión: Escapar o no raspar](#when)\n", - "2. [Extracción y análisis de HTML](#extract)\n", - "3. [Desmantelando la Asamblea General de Illinois](#scrape)" + "### Learning Objectives\n", + "1. [Reflection: To Scape Or Not To Scrape](#when)\n", + "2. [Extracting and Parsing HTML](#extract)\n", + "3. [Scraping the Illinois General Assembly](#scrape)" ] }, { @@ -27,27 +27,27 @@ "source": [ "\n", "\n", - "# Scraping o no scraping\n", + "# To Scrape Or Not To Scrape\n", "\n", - "Para acceder a datos de la web, primero debemos asegurarnos de que el sitio web que nos interesa ofrezca una API web. Plataformas como Twitter, Reddit y el New York Times ofrecen API. **Consulta el taller de D-Lab sobre [API web de Python](https://github.com/dlab-berkeley/Python-Web-APIs) si quieres aprender a usar las API.**\n", + "When we'd like to access data from the web, we first have to make sure if the website we are interested in offers a Web API. Platforms like Twitter, Reddit, and the New York Times offer APIs. **Check out D-Lab's [Python Web APIs](https://github.com/dlab-berkeley/Python-Web-APIs) workshop if you want to learn how to use APIs.**\n", "\n", - "Sin embargo, a menudo no existe una API web. En estos casos, podemos recurrir al web scraping, donde extraemos el HTML subyacente de una página web y obtenemos directamente la información que buscamos. Existen varios paquetes en Python que podemos usar para realizar estas tareas. Nos centraremos en dos paquetes: Requests y Beautiful Soup.\n", + "However, there are often cases when a Web API does not exist. In these cases, we may have to resort to web scraping, where we extract the underlying HTML from a web page, and directly obtain the information we want. There are several packages in Python we can use to accomplish these tasks. We'll focus two packages: Requests and Beautiful Soup.\n", "\n", - "Nuestro estudio de caso recopilará información sobre los senadores estatales de Illinois (http://www.ilga.gov/senate), así como la lista de proyectos de ley patrocinados por cada senador. Antes de comenzar, revise estos sitios web para conocer su estructura." + "Our case study will be scraping information on the [state senators of Illinois](http://www.ilga.gov/senate), as well as the [list of bills](http://www.ilga.gov/senate/SenatorBills.asp?MemberID=1911&GA=98&Primary=True) each senator has sponsored. Before we get started, peruse these websites to take a look at their structure." ] }, { "cell_type": "markdown", "metadata": {}, "source": [ - "## Instalación\n", + "## Installation\n", "\n", - "Usaremos dos paquetes principales: [Requests](http://docs.python-requests.org/en/latest/user/quickstart/) y [Beautiful Soup](http://www.crummy.com/software/BeautifulSoup/bs4/doc/). Continúe instalando estos paquetes, si aún no lo ha hecho:" + "We will use two main packages: [Requests](http://docs.python-requests.org/en/latest/user/quickstart/) and [Beautiful Soup](http://www.crummy.com/software/BeautifulSoup/bs4/doc/). Go ahead and install these packages, if you haven't already:" ] }, { "cell_type": "code", - "execution_count": null, + "execution_count": 140, "metadata": {}, "outputs": [ { @@ -71,31 +71,16 @@ }, { "cell_type": "code", - "execution_count": null, + "execution_count": 141, "metadata": {}, "outputs": [ { "name": "stdout", "output_type": "stream", "text": [ - "Collecting beautifulsoup4\n", - " Downloading beautifulsoup4-4.13.4-py3-none-any.whl.metadata (3.8 kB)\n", - "Collecting soupsieve>1.2 (from beautifulsoup4)\n", - " Downloading soupsieve-2.7-py3-none-any.whl.metadata (4.6 kB)\n", - "Collecting typing-extensions>=4.0.0 (from beautifulsoup4)\n", - " Downloading typing_extensions-4.14.1-py3-none-any.whl.metadata (3.0 kB)\n", - "Downloading beautifulsoup4-4.13.4-py3-none-any.whl (187 kB)\n", - "Downloading soupsieve-2.7-py3-none-any.whl (36 kB)\n", - "Downloading typing_extensions-4.14.1-py3-none-any.whl (43 kB)\n", - "Installing collected packages: typing-extensions, soupsieve, beautifulsoup4\n", - "\n", - " ------------- -------------------------- 1/3 [soupsieve]\n", - " -------------------------- ------------- 2/3 [beautifulsoup4]\n", - " -------------------------- ------------- 2/3 [beautifulsoup4]\n", - " -------------------------- ------------- 2/3 [beautifulsoup4]\n", - " ---------------------------------------- 3/3 [beautifulsoup4]\n", - "\n", - "Successfully installed beautifulsoup4-4.13.4 soupsieve-2.7 typing-extensions-4.14.1\n", + "Requirement already satisfied: beautifulsoup4 in c:\\users\\jjala\\appdata\\local\\programs\\python\\python313\\lib\\site-packages (4.13.4)\n", + "Requirement already satisfied: soupsieve>1.2 in c:\\users\\jjala\\appdata\\local\\programs\\python\\python313\\lib\\site-packages (from beautifulsoup4) (2.7)\n", + "Requirement already satisfied: typing-extensions>=4.0.0 in c:\\users\\jjala\\appdata\\local\\programs\\python\\python313\\lib\\site-packages (from beautifulsoup4) (4.14.1)\n", "Note: you may need to restart the kernel to use updated packages.\n" ] } @@ -110,19 +95,19 @@ "cell_type": "markdown", "metadata": {}, "source": [ - "También instalaremos el paquete `lxml`, que ayuda a soportar parte del análisis que realiza Beautiful Soup:" + "We'll also install the `lxml` package, which helps support some of the parsing that Beautiful Soup performs:" ] }, { "cell_type": "code", - "execution_count": null, + "execution_count": 142, "metadata": {}, "outputs": [ { "name": "stdout", "output_type": "stream", "text": [ - "Requirement already satisfied: lxml in c:\\users\\fmerino\\documents\\github\\python-web-scraping\\.venv312\\lib\\site-packages (6.0.1)\n", + "Requirement already satisfied: lxml in c:\\users\\jjala\\appdata\\local\\programs\\python\\python313\\lib\\site-packages (6.0.1)\n", "Note: you may need to restart the kernel to use updated packages.\n" ] } @@ -135,7 +120,7 @@ }, { "cell_type": "code", - "execution_count": 23, + "execution_count": 143, "metadata": { "tags": [] }, @@ -180,7 +165,7 @@ }, { "cell_type": "code", - "execution_count": 24, + "execution_count": 144, "metadata": { "tags": [] }, @@ -224,16 +209,16 @@ "cell_type": "markdown", "metadata": {}, "source": [ - "## Paso 2: Analizar la página con Beautiful Soup\n", + "## Paso 2: Analiza la Página con Beautiful Soup\n", "\n", - "Ahora, usamos la función `BeautifulSoup` para analizar la respuesta en un árbol HTML. Esto devuelve un objeto (llamado **objeto soup**) que contiene todo el HTML del documento original.\n", + "Ahora, utilizamos la función BeautifulSoup para analizar la respuesta y convertirla en un árbol HTML. Esto nos devuelve un objeto (llamado objeto soup) que contiene todo el HTML del documento original.\n", "\n", - "Si se produce un error relacionado con una biblioteca de análisis, asegúrese de haber instalado el paquete `lxml` para proporcionar a Beautiful Soup las herramientas de análisis necesarias." + "Si te aparece un error relacionado con una biblioteca de análisis, asegúrate de haber instalado el paquete lxml para que Beautiful Soup cuente con las herramientas necesarias para analizar el contenido." ] }, { "cell_type": "code", - "execution_count": 25, + "execution_count": 145, "metadata": {}, "outputs": [ { @@ -263,9 +248,9 @@ } ], "source": [ - "# Parse the response into an HTML tree\n", + "# Analiza la respuesta y conviértela en un árbol HTML.\n", "soup = BeautifulSoup(src, 'lxml')\n", - "# Take a look\n", + "# Echa un vistazo.\n", "print(soup.prettify()[:1000])" ] }, @@ -273,31 +258,31 @@ "cell_type": "markdown", "metadata": {}, "source": [ - "La salida se ve bastante similar a la anterior, pero ahora está organizada en un objeto 'soup' que nos permite recorrer la página más fácilmente." + "La salida se ve bastante similar a la anterior, pero ahora está organizada en un objeto soup, lo que nos permite recorrer la página de manera más sencilla." ] }, { "cell_type": "markdown", "metadata": {}, "source": [ - "## Paso 3: Buscar elementos HTML\n", + "## Paso 3: Buscar Elementos HTML\n", "\n", - "Beautiful Soup cuenta con varias funciones para encontrar componentes útiles en una página. Beautiful Soup permite encontrar elementos por:\n", + "Beautiful Soup tiene varias funciones para encontrar componentes útiles en una página. Beautiful Soup te permite buscar elementos según sus:\n", "\n", "1. Etiquetas HTML\n", "2. Atributos HTML\n", "3. Selectores CSS\n", "\n", - "Primero, busquemos **etiquetas HTML**.\n", + "Primero, busquemos etiquetas HTML.\n", "\n", - "La función `find_all` busca en el árbol `soup` todos los elementos con una etiqueta HTML específica y los devuelve.\n", + "La función find_all busca en el árbol soup todos los elementos que tengan una determinada etiqueta HTML y devuelve todos esos elementos.\n", "\n", "¿Qué hace el siguiente ejemplo?" ] }, { "cell_type": "code", - "execution_count": 26, + "execution_count": 146, "metadata": {}, "outputs": [ { @@ -329,7 +314,7 @@ } ], "source": [ - "# Find all elements with a certain tag\n", + "# Buscar todos los elementos con una determinada etiqueta\n", "a_tags = soup.find_all(\"a\")\n", "print(a_tags[:10])" ] @@ -338,14 +323,14 @@ "cell_type": "markdown", "metadata": {}, "source": [ - "Dado que `find_all()` es el método más popular en la API de búsqueda de Beautiful Soup, puedes usar un atajo. Si tratas el objeto BeautifulSoup como si fuera una función, es lo mismo que llamar a `find_all()` en ese objeto.\n", + "Como find_all() es el método más popular en la API de búsqueda de Beautiful Soup, puedes usar un atajo para llamarlo. Si tratas el objeto BeautifulSoup como si fuera una función, es lo mismo que llamar a find_all() sobre ese objeto.\n", "\n", "Estas dos líneas de código son equivalentes:" ] }, { "cell_type": "code", - "execution_count": 27, + "execution_count": 147, "metadata": { "tags": [] }, @@ -374,12 +359,12 @@ "cell_type": "markdown", "metadata": {}, "source": [ - "¿Cuantos enlaces obtuvimos?" + "¿Cuántos enlaces obtuvimos?" ] }, { "cell_type": "code", - "execution_count": 28, + "execution_count": 148, "metadata": {}, "outputs": [ { @@ -398,16 +383,16 @@ "cell_type": "markdown", "metadata": {}, "source": [ - "¡Eso es muchísimo! Muchos elementos de una página tendrán la misma etiqueta HTML. Por ejemplo, si buscas todo con la etiqueta `a`, probablemente obtendrás más resultados, muchos de los cuales quizás no quieras. Recuerda que la etiqueta `a` define un hipervínculo, por lo que normalmente encontrarás muchos en cualquier página.\n", + "¡Eso es bastante! Muchos elementos en una página tendrán la misma etiqueta HTML. Por ejemplo, si buscas todo lo que tenga la etiqueta a, probablemente obtendrás muchos resultados, muchos de los cuales quizás no te interesen. Recuerda que la etiqueta a define un hipervínculo, por lo que normalmente encontrarás muchos en cualquier página.\n", "\n", - "¿Qué sucedería si quisiéramos buscar etiquetas HTML con ciertos atributos, como clases CSS específicas?\n", + "¿Qué pasa si queremos buscar etiquetas HTML con ciertos atributos, como clases CSS específicas?\n", "\n", - "Podemos hacerlo añadiendo un argumento adicional a `find_all`. En el siguiente ejemplo, buscamos todas las etiquetas `a` y luego las filtramos con `class_=\"sidemenu\"`." + "Podemos hacerlo agregando un argumento adicional a find_all. En el siguiente ejemplo, estamos buscando todas las etiquetas a y luego filtrando aquellas que tengan class_=\"sidemenu\"." ] }, { "cell_type": "code", - "execution_count": 29, + "execution_count": 149, "metadata": { "tags": [] }, @@ -418,13 +403,13 @@ "[]" ] }, - "execution_count": 29, + "execution_count": 149, "metadata": {}, "output_type": "execute_result" } ], "source": [ - "# Get only the 'a' tags in 'sidemenu' class\n", + "# Obtener solo las etiquetas 'a' que tienen la clase 'sidemenu'\n", "side_menus = soup(\"a\", class_=\"sidemenu\")\n", "side_menus[:5]" ] @@ -433,14 +418,14 @@ "cell_type": "markdown", "metadata": {}, "source": [ - "Una forma más eficiente de buscar elementos en un sitio web es mediante un selector CSS. Para ello, debemos usar un método diferente llamado `select()`. Simplemente pase una cadena a `.select()` para obtener todos los elementos con esa cadena como un selector CSS válido.\n", + "Una forma más eficiente de buscar elementos en un sitio web es mediante un selector CSS. Para esto, debemos usar un método diferente llamado `select()`. Solo tienes que pasar una cadena al método `.select()` para obtener todos los elementos que coincidan con ese selector CSS.\n", "\n", - "En el ejemplo anterior, podemos usar `\"a.sidemenu\"` como selector CSS, que devuelve todas las etiquetas `a` con la clase `sidemenu`." + "En el ejemplo anterior, podemos usar `a.sidemenu` como selector CSS, lo que nos devuelve todas las etiquetas `a` con la clase `sidemenu`." ] }, { "cell_type": "code", - "execution_count": 30, + "execution_count": 150, "metadata": { "tags": [] }, @@ -451,13 +436,13 @@ "[]" ] }, - "execution_count": 30, + "execution_count": 150, "metadata": {}, "output_type": "execute_result" } ], "source": [ - "# Get elements with \"a.sidemenu\" CSS Selector.\n", + "# Obtener elementos con el selector CSS \"a.sidemenu\".\n", "selected = soup.select(\"a.sidemenu\")\n", "selected[:5]" ] @@ -466,79 +451,1417 @@ "cell_type": "markdown", "metadata": {}, "source": [ - "## 🥊 Desafío: Encontrar todo\n", + "## 🥊Desafío: Buscar Todos\n", "\n", - "Usa BeautifulSoup para encontrar todos los elementos `a` con la clase `mainmenu`." + "Usa BeautifulSoup para encontrar todos los elementos `a` con la clase `mainmenu`. Le cambiee a dropdown-item para que se vea los resultados " ] }, { "cell_type": "code", - "execution_count": 31, + "execution_count": 151, "metadata": {}, - "outputs": [], + "outputs": [ + { + "name": "stdout", + "output_type": "stream", + "text": [ + "[\n", + " English\n", + " , \n", + " Afrikaans\n", + " , \n", + " Albanian\n", + " , \n", + " Arabic\n", + " , \n", + " Armenian\n", + " ]\n" + ] + } + ], "source": [ - "# YOUR CODE HERE\n" + "### TRABAJO PRACTICO \n", + "enlaces = soup.find_all(\"a\", class_=\"dropdown-item\")\n", + "print(enlaces[:5]) #lo puse hasta 5 para que no se vea tan largo el resultado" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ - "## Paso 4: Obtener los atributos y el texto de los elementos\n", + "## Paso 4: Obtener atributos y texto de los elementos\n", "\n", - "Una vez identificados los elementos, necesitamos la información de acceso de cada uno. Normalmente, esto implica dos cosas:\n", + "Una vez que identificamos elementos, queremos acceder a la información en ese elemento. Usualmente, esto significa dos cosas:\n", "\n", "1. Texto\n", "2. Atributos\n", "\n", - "Obtener el texto dentro de un elemento es sencillo. Solo tenemos que usar el miembro `text` de un objeto `tag`:" + "Obtener el texto dentro de un elemento es fácil. Todo lo que tenemos que hacer es usar el miembro `text` de un objeto `tag`:" ] }, { "cell_type": "code", - "execution_count": 32, + "execution_count": 152, + "metadata": {}, + "outputs": [ + { + "name": "stdout", + "output_type": "stream", + "text": [ + "Texto: \n", + " English\n", + " \n", + "href: #\n", + "clases: ['dropdown-item']\n", + "------\n", + "Texto: \n", + " Afrikaans\n", + " \n", + "href: #\n", + "clases: ['dropdown-item']\n", + "------\n", + "Texto: \n", + " Albanian\n", + " \n", + "href: #\n", + "clases: ['dropdown-item']\n", + "------\n", + "Texto: \n", + " Arabic\n", + " \n", + "href: #\n", + "clases: ['dropdown-item']\n", + "------\n", + "Texto: \n", + " Armenian\n", + " \n", + "href: #\n", + "clases: ['dropdown-item']\n", + "------\n", + "Texto: \n", + " Azerbaijani\n", + " \n", + "href: #\n", + "clases: ['dropdown-item']\n", + "------\n", + "Texto: \n", + " Basque\n", + " \n", + "href: #\n", + "clases: ['dropdown-item']\n", + "------\n", + "Texto: \n", + " Bengali\n", + " \n", + "href: #\n", + "clases: ['dropdown-item']\n", + "------\n", + "Texto: \n", + " Bosnian\n", + " \n", + "href: #\n", + "clases: ['dropdown-item']\n", + "------\n", + "Texto: \n", + " Catalan\n", + " \n", + "href: #\n", + "clases: ['dropdown-item']\n", + "------\n", + "Texto: \n", + " Croatian\n", + " \n", + "href: #\n", + "clases: ['dropdown-item']\n", + "------\n", + "Texto: \n", + " Czech\n", + " \n", + "href: #\n", + "clases: ['dropdown-item']\n", + "------\n", + "Texto: \n", + " Danish\n", + " \n", + "href: #\n", + "clases: ['dropdown-item']\n", + "------\n", + "Texto: \n", + " Dutch\n", + " \n", + "href: #\n", + "clases: ['dropdown-item']\n", + "------\n", + "Texto: \n", + " Esperanto\n", + " \n", + "href: #\n", + "clases: ['dropdown-item']\n", + "------\n", + "Texto: \n", + " Estonian\n", + " \n", + "href: #\n", + "clases: ['dropdown-item']\n", + "------\n", + "Texto: \n", + " Filipino\n", + " \n", + "href: #\n", + "clases: ['dropdown-item']\n", + "------\n", + "Texto: \n", + " Finnish\n", + " \n", + "href: #\n", + "clases: ['dropdown-item']\n", + "------\n", + "Texto: \n", + " French\n", + " \n", + "href: #\n", + "clases: ['dropdown-item']\n", + "------\n", + "Texto: \n", + " Galician\n", + " \n", + "href: #\n", + "clases: ['dropdown-item']\n", + "------\n", + "Texto: \n", + " Georgian\n", + " \n", + "href: #\n", + "clases: ['dropdown-item']\n", + "------\n", + "Texto: \n", + " German\n", + " \n", + "href: #\n", + "clases: ['dropdown-item']\n", + "------\n", + "Texto: \n", + " Greek\n", + " \n", + "href: #\n", + "clases: ['dropdown-item']\n", + "------\n", + "Texto: \n", + " Gujarati\n", + " \n", + "href: #\n", + "clases: ['dropdown-item']\n", + "------\n", + "Texto: \n", + " Haitian Creole\n", + " \n", + "href: #\n", + "clases: ['dropdown-item']\n", + "------\n", + "Texto: \n", + " Hausa\n", + " \n", + "href: #\n", + "clases: ['dropdown-item']\n", + "------\n", + "Texto: \n", + " Hawaiian\n", + " \n", + "href: #\n", + "clases: ['dropdown-item']\n", + "------\n", + "Texto: \n", + " Hebrew\n", + " \n", + "href: #\n", + "clases: ['dropdown-item']\n", + "------\n", + "Texto: \n", + " Hindi\n", + " \n", + "href: #\n", + "clases: ['dropdown-item']\n", + "------\n", + "Texto: \n", + " Hungarian\n", + " \n", + "href: #\n", + "clases: ['dropdown-item']\n", + "------\n", + "Texto: \n", + " Icelandic\n", + " \n", + "href: #\n", + "clases: ['dropdown-item']\n", + "------\n", + "Texto: \n", + " Indonesian\n", + " \n", + "href: #\n", + "clases: ['dropdown-item']\n", + "------\n", + "Texto: \n", + " Interlingua\n", + " \n", + "href: #\n", + "clases: ['dropdown-item']\n", + "------\n", + "Texto: \n", + " Interlingue\n", + " \n", + "href: #\n", + "clases: ['dropdown-item']\n", + "------\n", + "Texto: \n", + " Inuktitut\n", + " \n", + "href: #\n", + "clases: ['dropdown-item']\n", + "------\n", + "Texto: \n", + " Irish\n", + " \n", + "href: #\n", + "clases: ['dropdown-item']\n", + "------\n", + "Texto: \n", + " Italian\n", + " \n", + "href: #\n", + "clases: ['dropdown-item']\n", + "------\n", + "Texto: \n", + " Japanese\n", + " \n", + "href: #\n", + "clases: ['dropdown-item']\n", + "------\n", + "Texto: \n", + " Javanese\n", + " \n", + "href: #\n", + "clases: ['dropdown-item']\n", + "------\n", + "Texto: \n", + " Kannada\n", + " \n", + "href: #\n", + "clases: ['dropdown-item']\n", + "------\n", + "Texto: \n", + " Khmer\n", + " \n", + "href: #\n", + "clases: ['dropdown-item']\n", + "------\n", + "Texto: \n", + " Korean\n", + " \n", + "href: #\n", + "clases: ['dropdown-item']\n", + "------\n", + "Texto: \n", + " Latin\n", + " \n", + "href: #\n", + "clases: ['dropdown-item']\n", + "------\n", + "Texto: \n", + " Latvian\n", + " \n", + "href: #\n", + "clases: ['dropdown-item']\n", + "------\n", + "Texto: \n", + " Lithuanian\n", + " \n", + "href: #\n", + "clases: ['dropdown-item']\n", + "------\n", + "Texto: \n", + " Luxembourgish\n", + " \n", + "href: #\n", + "clases: ['dropdown-item']\n", + "------\n", + "Texto: \n", + " Macedonian\n", + " \n", + "href: #\n", + "clases: ['dropdown-item']\n", + "------\n", + "Texto: \n", + " Malagasy\n", + " \n", + "href: #\n", + "clases: ['dropdown-item']\n", + "------\n", + "Texto: \n", + " Malayalam\n", + " \n", + "href: #\n", + "clases: ['dropdown-item']\n", + "------\n", + "Texto: \n", + " Maltese\n", + " \n", + "href: #\n", + "clases: ['dropdown-item']\n", + "------\n", + "Texto: \n", + " Maori\n", + " \n", + "href: #\n", + "clases: ['dropdown-item']\n", + "------\n", + "Texto: \n", + " Marathi\n", + " \n", + "href: #\n", + "clases: ['dropdown-item']\n", + "------\n", + "Texto: \n", + " Myanmar\n", + " \n", + "href: #\n", + "clases: ['dropdown-item']\n", + "------\n", + "Texto: \n", + " Nepali\n", + " \n", + "href: #\n", + "clases: ['dropdown-item']\n", + "------\n", + "Texto: \n", + " Norwegian\n", + " \n", + "href: #\n", + "clases: ['dropdown-item']\n", + "------\n", + "Texto: \n", + " Odia\n", + " \n", + "href: #\n", + "clases: ['dropdown-item']\n", + "------\n", + "Texto: \n", + " Pashto\n", + " \n", + "href: #\n", + "clases: ['dropdown-item']\n", + "------\n", + "Texto: \n", + " Punjabi\n", + " \n", + "href: #\n", + "clases: ['dropdown-item']\n", + "------\n", + "Texto: \n", + " Romanian\n", + " \n", + "href: #\n", + "clases: ['dropdown-item']\n", + "------\n", + "Texto: \n", + " Russian\n", + " \n", + "href: #\n", + "clases: ['dropdown-item']\n", + "------\n", + "Texto: \n", + " Samoan\n", + " \n", + "href: #\n", + "clases: ['dropdown-item']\n", + "------\n", + "Texto: \n", + " Sango\n", + " \n", + "href: #\n", + "clases: ['dropdown-item']\n", + "------\n", + "Texto: \n", + " Sanskrit\n", + " \n", + "href: #\n", + "clases: ['dropdown-item']\n", + "------\n", + "Texto: \n", + " Sardinian\n", + " \n", + "href: #\n", + "clases: ['dropdown-item']\n", + "------\n", + "Texto: \n", + " Sindhi\n", + " \n", + "href: #\n", + "clases: ['dropdown-item']\n", + "------\n", + "Texto: \n", + " Sinhala\n", + " \n", + "href: #\n", + "clases: ['dropdown-item']\n", + "------\n", + "Texto: \n", + " Slovak\n", + " \n", + "href: #\n", + "clases: ['dropdown-item']\n", + "------\n", + "Texto: \n", + " Slovenian\n", + " \n", + "href: #\n", + "clases: ['dropdown-item']\n", + "------\n", + "Texto: \n", + " Somali\n", + " \n", + "href: #\n", + "clases: ['dropdown-item']\n", + "------\n", + "Texto: \n", + " Southern Sotho\n", + " \n", + "href: #\n", + "clases: ['dropdown-item']\n", + "------\n", + "Texto: \n", + " Spanish\n", + " \n", + "href: #\n", + "clases: ['dropdown-item']\n", + "------\n", + "Texto: \n", + " Sundanese\n", + " \n", + "href: #\n", + "clases: ['dropdown-item']\n", + "------\n", + "Texto: \n", + " Swahili\n", + " \n", + "href: #\n", + "clases: ['dropdown-item']\n", + "------\n", + "Texto: \n", + " Swedish\n", + " \n", + "href: #\n", + "clases: ['dropdown-item']\n", + "------\n", + "Texto: \n", + " Tamil\n", + " \n", + "href: #\n", + "clases: ['dropdown-item']\n", + "------\n", + "Texto: \n", + " Telugu\n", + " \n", + "href: #\n", + "clases: ['dropdown-item']\n", + "------\n", + "Texto: \n", + " Thai\n", + " \n", + "href: #\n", + "clases: ['dropdown-item']\n", + "------\n", + "Texto: \n", + " Tigrinya\n", + " \n", + "href: #\n", + "clases: ['dropdown-item']\n", + "------\n", + "Texto: \n", + " Tonga\n", + " \n", + "href: #\n", + "clases: ['dropdown-item']\n", + "------\n", + "Texto: \n", + " Turkish\n", + " \n", + "href: #\n", + "clases: ['dropdown-item']\n", + "------\n", + "Texto: \n", + " Ukrainian\n", + " \n", + "href: #\n", + "clases: ['dropdown-item']\n", + "------\n", + "Texto: \n", + " Urdu\n", + " \n", + "href: #\n", + "clases: ['dropdown-item']\n", + "------\n", + "Texto: \n", + " Vietnamese\n", + " \n", + "href: #\n", + "clases: ['dropdown-item']\n", + "------\n", + "Texto: \n", + " Welsh\n", + " \n", + "href: #\n", + "clases: ['dropdown-item']\n", + "------\n", + "Texto: \n", + " Xhosa\n", + " \n", + "href: #\n", + "clases: ['dropdown-item']\n", + "------\n", + "Texto: \n", + " Yiddish\n", + " \n", + "href: #\n", + "clases: ['dropdown-item']\n", + "------\n", + "Texto: \n", + " Yoruba\n", + " \n", + "href: #\n", + "clases: ['dropdown-item']\n", + "------\n", + "Texto: \n", + " Zulu\n", + " \n", + "href: #\n", + "clases: ['dropdown-item']\n", + "------\n", + "Texto: Translate\n", + "href: https://translate.google.com\n", + "clases: ['goog-logo-link']\n", + "------\n", + "Texto: ILGA.GOV\n", + "href: /\n", + "clases: None\n", + "------\n", + "Texto: \n", + "LEGISLATION & LAWS \n", + "\n", + "href: /Legislation\n", + "clases: None\n", + "------\n", + "Texto: Bills & Resolutions\n", + "href: /Legislation\n", + "clases: None\n", + "------\n", + "Texto: Public Acts\n", + "href: /Legislation/PublicActs\n", + "clases: None\n", + "------\n", + "Texto: Illinois Compiled Statutes\n", + "href: /Legislation/ILCS/Chapters\n", + "clases: None\n", + "------\n", + "Texto: Illinois Constitution\n", + "href: /documents/commission/lrb/conmain.htm\n", + "clases: None\n", + "------\n", + "Texto: Search Legislation\n", + "href: /Search?q=&base=Legis\n", + "clases: None\n", + "------\n", + "Texto: Glossary\n", + "href: /Legislation/Glossary\n", + "clases: None\n", + "------\n", + "Texto: Guide\n", + "href: /Guide\n", + "clases: None\n", + "------\n", + "Texto: \n", + "Reports & Inquiry \n", + "\n", + "href: /Reports\n", + "clases: None\n", + "------\n", + "Texto: Legislative Reports\n", + "href: /Reports\n", + "clases: None\n", + "------\n", + "Texto: Special Reports\n", + "href: /Reports/SpecialReports\n", + "clases: None\n", + "------\n", + "Texto: FTP Site\n", + "href: /ftp/\n", + "clases: None\n", + "------\n", + "Texto: Legislator Lookup\n", + "href: /members/FindMyLegislator\n", + "clases: None\n", + "------\n", + "Texto: Capitol Complex Phone Numbers\n", + "href: /directory\n", + "clases: None\n", + "------\n", + "Texto: \n", + "Rules & Regulations \n", + "\n", + "href: #\n", + "clases: None\n", + "------\n", + "Texto: Illinois Register\n", + "href: http://www.cyberdriveillinois.com/departments/index/register/home.html\n", + "clases: None\n", + "------\n", + "Texto: Administrative Rules\n", + "href: /agencies/JCAR/AdminCode\n", + "clases: None\n", + "------\n", + "Texto: \n", + "Senate \n", + "\n", + "href: /Senate/Members\n", + "clases: None\n", + "------\n", + "Texto: Members\n", + "href: /Senate/Members\n", + "clases: None\n", + "------\n", + "Texto: Schedules\n", + "href: /Senate/Schedules\n", + "clases: None\n", + "------\n", + "Texto: Committees\n", + "href: /Senate/Committees\n", + "clases: None\n", + "------\n", + "Texto:  Request for Remote Testimony\n", + "href: /Uploads/Testimony/Senate/Remote Legislative Hearing Process 104th GA.pdf\n", + "clases: None\n", + "------\n", + "Texto: Journals\n", + "href: /Senate/Journals\n", + "clases: None\n", + "------\n", + "Texto: Transcripts\n", + "href: /Senate/Transcripts\n", + "clases: None\n", + "------\n", + "Texto: Rules\n", + "href: /Senate/Rules\n", + "clases: None\n", + "------\n", + "Texto: Audio/Video\n", + "href: /Senate/AudioVideo\n", + "clases: None\n", + "------\n", + "Texto: FOIA Information\n", + "href: /Documents/senate/FOIA.pdf\n", + "clases: None\n", + "------\n", + "Texto: Senate Employment Opportunities\n", + "href: /EmploymentOpportunities\n", + "clases: None\n", + "------\n", + "Texto: Media Guidelines\n", + "href: /Documents/senate/SenateMediaGuidelines.pdf\n", + "clases: None\n", + "------\n", + "Texto: \n", + "House \n", + "\n", + "href: /House/Members\n", + "clases: None\n", + "------\n", + "Texto: Members\n", + "href: /House/Members\n", + "clases: None\n", + "------\n", + "Texto: Schedules\n", + "href: /House/Schedules\n", + "clases: None\n", + "------\n", + "Texto: Committees\n", + "href: /House/Committees\n", + "clases: None\n", + "------\n", + "Texto:  Submit testimony for House Committees\n", + "href: /Uploads/Testimony/House/Remote_Committee_Hearing_Process_March2023.pdf\n", + "clases: None\n", + "------\n", + "Texto: Journals\n", + "href: /House/Journals\n", + "clases: None\n", + "------\n", + "Texto: Transcripts\n", + "href: /House/Transcripts\n", + "clases: None\n", + "------\n", + "Texto: Rules\n", + "href: /House/Rules\n", + "clases: None\n", + "------\n", + "Texto: Audio/Video\n", + "href: /House/AudioVideo\n", + "clases: None\n", + "------\n", + "Texto: FOIA Information\n", + "href: /Documents/house/FOIA.pdf\n", + "clases: None\n", + "------\n", + "Texto: House Employment Opportunities\n", + "href: /EmploymentOpportunities\n", + "clases: None\n", + "------\n", + "Texto:  Log In\n", + "href: /Account/Login\n", + "clases: ['nav-link']\n", + "------\n", + "Texto: Home\n", + "href: /\n", + "clases: ['active']\n", + "------\n", + "Texto: View List\n", + "href: /Senate/Members/List\n", + "clases: ['btn', 'btn-primary']\n", + "------\n", + "Texto: Officers\n", + "href: /Documents/Senate/104th_Senate_Officers.pdf\n", + "clases: ['btn', 'btn-primary', 'm-1']\n", + "------\n", + "Texto: Leadership\n", + "href: /Documents/Senate/104th_Senate_Leadership.pdf\n", + "clases: ['btn', 'btn-primary', 'm-1']\n", + "------\n", + "Texto: Seating Chart\n", + "href: https://www.ilga.gov/Documents/Senate/104th_Senate_Seating_Chart.pdf\n", + "clases: ['btn', 'btn-primary', 'm-1']\n", + "------\n", + "Texto: Report List\n", + "href: Members/rptMemberList\n", + "clases: ['btn', 'btn-primary', 'm-1']\n", + "------\n", + "Texto: Neil Anderson\n", + "href: /Senate/Members/Details/3312\n", + "clases: ['notranslate']\n", + "------\n", + "Texto: Neil Anderson\n", + "href: /Senate/Members/Details/3312\n", + "clases: ['notranslate']\n", + "------\n", + "Texto: Omar Aquino\n", + "href: /Senate/Members/Details/3316\n", + "clases: ['notranslate']\n", + "------\n", + "Texto: Omar Aquino\n", + "href: /Senate/Members/Details/3316\n", + "clases: ['notranslate']\n", + "------\n", + "Texto: Li Arellano, Jr.\n", + "href: /Senate/Members/Details/3383\n", + "clases: ['notranslate']\n", + "------\n", + "Texto: Li Arellano, Jr.\n", + "href: /Senate/Members/Details/3383\n", + "clases: ['notranslate']\n", + "------\n", + "Texto: Chris Balkema\n", + "href: /Senate/Members/Details/3413\n", + "clases: ['notranslate']\n", + "------\n", + "Texto: Chris Balkema\n", + "href: /Senate/Members/Details/3413\n", + "clases: ['notranslate']\n", + "------\n", + "Texto: Christopher Belt\n", + "href: /Senate/Members/Details/3337\n", + "clases: ['notranslate']\n", + "------\n", + "Texto: Christopher Belt\n", + "href: /Senate/Members/Details/3337\n", + "clases: ['notranslate']\n", + "------\n", + "Texto: Terri Bryant\n", + "href: /Senate/Members/Details/3386\n", + "clases: ['notranslate']\n", + "------\n", + "Texto: Terri Bryant\n", + "href: /Senate/Members/Details/3386\n", + "clases: ['notranslate']\n", + "------\n", + "Texto: Cristina Castro\n", + "href: /Senate/Members/Details/3317\n", + "clases: ['notranslate']\n", + "------\n", + "Texto: Cristina Castro\n", + "href: /Senate/Members/Details/3317\n", + "clases: ['notranslate']\n", + "------\n", + "Texto: Javier L. Cervantes\n", + "href: /Senate/Members/Details/3403\n", + "clases: ['notranslate']\n", + "------\n", + "Texto: Javier L. Cervantes\n", + "href: /Senate/Members/Details/3403\n", + "clases: ['notranslate']\n", + "------\n", + "Texto: Andrew S. Chesney\n", + "href: /Senate/Members/Details/3410\n", + "clases: ['notranslate']\n", + "------\n", + "Texto: Andrew S. Chesney\n", + "href: /Senate/Members/Details/3410\n", + "clases: ['notranslate']\n", + "------\n", + "Texto: Lakesia Collins\n", + "href: /Senate/Members/Details/3443\n", + "clases: ['notranslate']\n", + "------\n", + "Texto: Lakesia Collins\n", + "href: /Senate/Members/Details/3443\n", + "clases: ['notranslate']\n", + "------\n", + "Texto: Bill Cunningham\n", + "href: /Senate/Members/Details/3291\n", + "clases: ['notranslate']\n", + "------\n", + "Texto: Bill Cunningham\n", + "href: /Senate/Members/Details/3291\n", + "clases: ['notranslate']\n", + "------\n", + "Texto: John F. Curran\n", + "href: /Senate/Members/Details/3329\n", + "clases: ['notranslate']\n", + "------\n", + "Texto: John F. Curran\n", + "href: /Senate/Members/Details/3329\n", + "clases: ['notranslate']\n", + "------\n", + "Texto: Donald P. DeWitte\n", + "href: /Senate/Members/Details/3334\n", + "clases: ['notranslate']\n", + "------\n", + "Texto: Donald P. DeWitte\n", + "href: /Senate/Members/Details/3334\n", + "clases: ['notranslate']\n", + "------\n", + "Texto: Mary Edly-Allen\n", + "href: /Senate/Members/Details/3407\n", + "clases: ['notranslate']\n", + "------\n", + "Texto: Mary Edly-Allen\n", + "href: /Senate/Members/Details/3407\n", + "clases: ['notranslate']\n", + "------\n", + "Texto: Laura Ellman\n", + "href: /Senate/Members/Details/3339\n", + "clases: ['notranslate']\n", + "------\n", + "Texto: Laura Ellman\n", + "href: /Senate/Members/Details/3339\n", + "clases: ['notranslate']\n", + "------\n", + "Texto: Paul Faraci\n", + "href: /Senate/Members/Details/3412\n", + "clases: ['notranslate']\n", + "------\n", + "Texto: Paul Faraci\n", + "href: /Senate/Members/Details/3412\n", + "clases: ['notranslate']\n", + "------\n", + "Texto: Sara Feigenholtz\n", + "href: /Senate/Members/Details/3376\n", + "clases: ['notranslate']\n", + "------\n", + "Texto: Sara Feigenholtz\n", + "href: /Senate/Members/Details/3376\n", + "clases: ['notranslate']\n", + "------\n", + "Texto: Laura Fine\n", + "href: /Senate/Members/Details/3338\n", + "clases: ['notranslate']\n", + "------\n", + "Texto: Laura Fine\n", + "href: /Senate/Members/Details/3338\n", + "clases: ['notranslate']\n", + "------\n", + "Texto: Dale Fowler\n", + "href: /Senate/Members/Details/3318\n", + "clases: ['notranslate']\n", + "------\n", + "Texto: Dale Fowler\n", + "href: /Senate/Members/Details/3318\n", + "clases: ['notranslate']\n", + "------\n", + "Texto: Suzy Glowiak Hilton\n", + "href: /Senate/Members/Details/3341\n", + "clases: ['notranslate']\n", + "------\n", + "Texto: Suzy Glowiak Hilton\n", + "href: /Senate/Members/Details/3341\n", + "clases: ['notranslate']\n", + "------\n", + "Texto: Graciela Guzmán\n", + "href: /Senate/Members/Details/3442\n", + "clases: ['notranslate']\n", + "------\n", + "Texto: Graciela Guzmán\n", + "href: /Senate/Members/Details/3442\n", + "clases: ['notranslate']\n", + "------\n", + "Texto: Michael W. Halpin\n", + "href: /Senate/Members/Details/3408\n", + "clases: ['notranslate']\n", + "------\n", + "Texto: Michael W. Halpin\n", + "href: /Senate/Members/Details/3408\n", + "clases: ['notranslate']\n", + "------\n", + "Texto: Don Harmon\n", + "href: /Senate/Members/Details/3268\n", + "clases: ['notranslate']\n", + "------\n", + "Texto: Don Harmon\n", + "href: /Senate/Members/Details/3268\n", + "clases: ['notranslate']\n", + "------\n", + "Texto: Napoleon Harris, III\n", + "href: /Senate/Members/Details/3292\n", + "clases: ['notranslate']\n", + "------\n", + "Texto: Napoleon Harris, III\n", + "href: /Senate/Members/Details/3292\n", + "clases: ['notranslate']\n", + "------\n", + "Texto: Erica Harriss\n", + "href: /Senate/Members/Details/3411\n", + "clases: ['notranslate']\n", + "------\n", + "Texto: Erica Harriss\n", + "href: /Senate/Members/Details/3411\n", + "clases: ['notranslate']\n", + "------\n", + "Texto: Michael E. Hastings\n", + "href: /Senate/Members/Details/3293\n", + "clases: ['notranslate']\n", + "------\n", + "Texto: Michael E. Hastings\n", + "href: /Senate/Members/Details/3293\n", + "clases: ['notranslate']\n", + "------\n", + "Texto: Darby A. Hills\n", + "href: /Senate/Members/Details/3460\n", + "clases: ['notranslate']\n", + "------\n", + "Texto: Darby A. Hills\n", + "href: /Senate/Members/Details/3460\n", + "clases: ['notranslate']\n", + "------\n", + "Texto: Linda Holmes\n", + "href: /Senate/Members/Details/3270\n", + "clases: ['notranslate']\n", + "------\n", + "Texto: Linda Holmes\n", + "href: /Senate/Members/Details/3270\n", + "clases: ['notranslate']\n", + "------\n", + "Texto: Mattie Hunter\n", + "href: /Senate/Members/Details/3269\n", + "clases: ['notranslate']\n", + "------\n", + "Texto: Mattie Hunter\n", + "href: /Senate/Members/Details/3269\n", + "clases: ['notranslate']\n", + "------\n", + "Texto: Adriane Johnson\n", + "href: /Senate/Members/Details/3378\n", + "clases: ['notranslate']\n", + "------\n", + "Texto: Adriane Johnson\n", + "href: /Senate/Members/Details/3378\n", + "clases: ['notranslate']\n", + "------\n", + "Texto: Emil Jones, III\n", + "href: /Senate/Members/Details/3276\n", + "clases: ['notranslate']\n", + "------\n", + "Texto: Emil Jones, III\n", + "href: /Senate/Members/Details/3276\n", + "clases: ['notranslate']\n", + "------\n", + "Texto: Patrick J. Joyce\n", + "href: /Senate/Members/Details/3372\n", + "clases: ['notranslate']\n", + "------\n", + "Texto: Patrick J. Joyce\n", + "href: /Senate/Members/Details/3372\n", + "clases: ['notranslate']\n", + "------\n", + "Texto: David Koehler\n", + "href: /Senate/Members/Details/3271\n", + "clases: ['notranslate']\n", + "------\n", + "Texto: David Koehler\n", + "href: /Senate/Members/Details/3271\n", + "clases: ['notranslate']\n", + "------\n", + "Texto: Seth Lewis\n", + "href: /Senate/Members/Details/3406\n", + "clases: ['notranslate']\n", + "------\n", + "Texto: Seth Lewis\n", + "href: /Senate/Members/Details/3406\n", + "clases: ['notranslate']\n", + "------\n", + "Texto: Kimberly A. Lightford\n", + "href: /Senate/Members/Details/3264\n", + "clases: ['notranslate']\n", + "------\n", + "Texto: Kimberly A. Lightford\n", + "href: /Senate/Members/Details/3264\n", + "clases: ['notranslate']\n", + "------\n", + "Texto: Meg Loughran Cappel\n", + "href: /Senate/Members/Details/3380\n", + "clases: ['notranslate']\n", + "------\n", + "Texto: Meg Loughran Cappel\n", + "href: /Senate/Members/Details/3380\n", + "clases: ['notranslate']\n", + "------\n", + "Texto: Robert F. Martwick\n", + "href: /Senate/Members/Details/3369\n", + "clases: ['notranslate']\n", + "------\n", + "Texto: Robert F. Martwick\n", + "href: /Senate/Members/Details/3369\n", + "clases: ['notranslate']\n", + "------\n", + "Texto: Steve McClure\n", + "href: /Senate/Members/Details/3342\n", + "clases: ['notranslate']\n", + "------\n", + "Texto: Steve McClure\n", + "href: /Senate/Members/Details/3342\n", + "clases: ['notranslate']\n", + "------\n", + "Texto: Julie A. Morrison\n", + "href: /Senate/Members/Details/3294\n", + "clases: ['notranslate']\n", + "------\n", + "Texto: Julie A. Morrison\n", + "href: /Senate/Members/Details/3294\n", + "clases: ['notranslate']\n", + "------\n", + "Texto: Laura M. Murphy\n", + "href: /Senate/Members/Details/3313\n", + "clases: ['notranslate']\n", + "------\n", + "Texto: Laura M. Murphy\n", + "href: /Senate/Members/Details/3313\n", + "clases: ['notranslate']\n", + "------\n", + "Texto: Robert Peters\n", + "href: /Senate/Members/Details/3343\n", + "clases: ['notranslate']\n", + "------\n", + "Texto: Robert Peters\n", + "href: /Senate/Members/Details/3343\n", + "clases: ['notranslate']\n", + "------\n", + "Texto: Jason Plummer\n", + "href: /Senate/Members/Details/3344\n", + "clases: ['notranslate']\n", + "------\n", + "Texto: Jason Plummer\n", + "href: /Senate/Members/Details/3344\n", + "clases: ['notranslate']\n", + "------\n", + "Texto: Mike Porfirio\n", + "href: /Senate/Members/Details/3404\n", + "clases: ['notranslate']\n", + "------\n", + "Texto: Mike Porfirio\n", + "href: /Senate/Members/Details/3404\n", + "clases: ['notranslate']\n", + "------\n", + "Texto: Willie Preston\n", + "href: /Senate/Members/Details/3405\n", + "clases: ['notranslate']\n", + "------\n", + "Texto: Willie Preston\n", + "href: /Senate/Members/Details/3405\n", + "clases: ['notranslate']\n", + "------\n", + "Texto: Sue Rezin\n", + "href: /Senate/Members/Details/3281\n", + "clases: ['notranslate']\n", + "------\n", + "Texto: Sue Rezin\n", + "href: /Senate/Members/Details/3281\n", + "clases: ['notranslate']\n", + "------\n", + "Texto: Chapin Rose\n", + "href: /Senate/Members/Details/3295\n", + "clases: ['notranslate']\n", + "------\n", + "Texto: Chapin Rose\n", + "href: /Senate/Members/Details/3295\n", + "clases: ['notranslate']\n", + "------\n", + "Texto: Mike Simmons\n", + "href: /Senate/Members/Details/3398\n", + "clases: ['notranslate']\n", + "------\n", + "Texto: Mike Simmons\n", + "href: /Senate/Members/Details/3398\n", + "clases: ['notranslate']\n", + "------\n", + "Texto: Elgie R. Sims, Jr.\n", + "href: /Senate/Members/Details/3331\n", + "clases: ['notranslate']\n", + "------\n", + "Texto: Elgie R. Sims, Jr.\n", + "href: /Senate/Members/Details/3331\n", + "clases: ['notranslate']\n", + "------\n", + "Texto: Steve Stadelman\n", + "href: /Senate/Members/Details/3296\n", + "clases: ['notranslate']\n", + "------\n", + "Texto: Steve Stadelman\n", + "href: /Senate/Members/Details/3296\n", + "clases: ['notranslate']\n", + "------\n", + "Texto: Dave Syverson\n", + "href: /Senate/Members/Details/3265\n", + "clases: ['notranslate']\n", + "------\n", + "Texto: Dave Syverson\n", + "href: /Senate/Members/Details/3265\n", + "clases: ['notranslate']\n", + "------\n", + "Texto: Jil Tracy\n", + "href: /Senate/Members/Details/3319\n", + "clases: ['notranslate']\n", + "------\n", + "Texto: Jil Tracy\n", + "href: /Senate/Members/Details/3319\n", + "clases: ['notranslate']\n", + "------\n", + "Texto: Doris Turner\n", + "href: /Senate/Members/Details/3399\n", + "clases: ['notranslate']\n", + "------\n", + "Texto: Doris Turner\n", + "href: /Senate/Members/Details/3399\n", + "clases: ['notranslate']\n", + "------\n", + "Texto: Sally J. Turner\n", + "href: /Senate/Members/Details/3397\n", + "clases: ['notranslate']\n", + "------\n", + "Texto: Sally J. Turner\n", + "href: /Senate/Members/Details/3397\n", + "clases: ['notranslate']\n", + "------\n", + "Texto: Rachel Ventura\n", + "href: /Senate/Members/Details/3409\n", + "clases: ['notranslate']\n", + "------\n", + "Texto: Rachel Ventura\n", + "href: /Senate/Members/Details/3409\n", + "clases: ['notranslate']\n", + "------\n", + "Texto: Karina Villa\n", + "href: /Senate/Members/Details/3385\n", + "clases: ['notranslate']\n", + "------\n", + "Texto: Karina Villa\n", + "href: /Senate/Members/Details/3385\n", + "clases: ['notranslate']\n", + "------\n", + "Texto: Celina Villanueva\n", + "href: /Senate/Members/Details/3375\n", + "clases: ['notranslate']\n", + "------\n", + "Texto: Celina Villanueva\n", + "href: /Senate/Members/Details/3375\n", + "clases: ['notranslate']\n", + "------\n", + "Texto: Ram Villivalam\n", + "href: /Senate/Members/Details/3345\n", + "clases: ['notranslate']\n", + "------\n", + "Texto: Ram Villivalam\n", + "href: /Senate/Members/Details/3345\n", + "clases: ['notranslate']\n", + "------\n", + "Texto: Mark L. Walker\n", + "href: /Senate/Members/Details/3449\n", + "clases: ['notranslate']\n", + "------\n", + "Texto: Mark L. Walker\n", + "href: /Senate/Members/Details/3449\n", + "clases: ['notranslate']\n", + "------\n", + "Texto: Craig Wilcox\n", + "href: /Senate/Members/Details/3336\n", + "clases: ['notranslate']\n", + "------\n", + "Texto: Craig Wilcox\n", + "href: /Senate/Members/Details/3336\n", + "clases: ['notranslate']\n", + "------\n", + "Texto: Dan McConchie\n", + "href: /Senate/Members/Details/3315\n", + "clases: ['notranslate']\n", + "------\n", + "Texto: Dan McConchie\n", + "href: /Senate/Members/Details/3315\n", + "clases: ['notranslate']\n", + "------\n", + "Texto: \n", + " Contact ILGA Webmaster\n", + " \n", + "href: mailto:webmaster@ilga.gov?subject=Email from ILGA Web\n", + "clases: None\n", + "------\n", + "Texto: \n", + "\n", + "\n", + "href: http://www.amberillinois.com\n", + "clases: None\n", + "------\n", + "Texto: \n", + "\n", + "\n", + "href: https://www.missingkids.org\n", + "clases: None\n", + "------\n", + "Texto: ILGA.GOV\n", + "href: /\n", + "clases: None\n", + "------\n", + "Texto: Disclaimers\n", + "href: /Disclaimers\n", + "clases: None\n", + "------\n", + "Texto: \n", + " ADA\n", + " \n", + "href: /Accessibility\n", + "clases: None\n", + "------\n", + "Texto: \n", + " Contact ILGA Webmaster\n", + " \n", + "href: mailto:webmaster@ilga.gov?subject=Email from ILGA Web\n", + "clases: None\n", + "------\n", + "Texto: \n", + "\n", + "\n", + "href: http://www.amberillinois.com\n", + "clases: None\n", + "------\n", + "Texto: \n", + "\n", + "\n", + "href: https://www.missingkids.org\n", + "clases: None\n", + "------\n", + "Texto: ILGA.GOV\n", + "href: /\n", + "clases: None\n", + "------\n", + "Texto: Disclaimers\n", + "href: /Disclaimers\n", + "clases: None\n", + "------\n", + "Texto: \n", + " ADA\n", + " \n", + "href: /Accessibility\n", + "clases: None\n", + "------\n", + "Texto: \n", + "href: #\n", + "clases: ['back-to-top', 'd-flex', 'align-items-center', 'justify-content-center']\n", + "------\n" + ] + } + ], + "source": [ + "# Encuentra todos los enlaces del HTML\n", + "links = soup.find_all(\"a\")\n", + "\n", + "# Imprime el texto y el href de cada enlace\n", + "for link in links:\n", + " print(\"Texto:\", link.text)\n", + " print(\"href:\", link.get('href'))\n", + " print(\"clases:\", link.get('class'))\n", + " print(\"------\")\n", + "\n" + ] + }, + { + "cell_type": "code", + "execution_count": 153, "metadata": { "tags": [] }, "outputs": [ { - "ename": "IndexError", - "evalue": "list index out of range", - "output_type": "error", - "traceback": [ - "\u001b[31m---------------------------------------------------------------------------\u001b[39m", - "\u001b[31mIndexError\u001b[39m Traceback (most recent call last)", - "\u001b[36mCell\u001b[39m\u001b[36m \u001b[39m\u001b[32mIn[32]\u001b[39m\u001b[32m, line 5\u001b[39m\n\u001b[32m 2\u001b[39m side_menu_links = soup.select(\u001b[33m\"\u001b[39m\u001b[33ma.sidemenu\u001b[39m\u001b[33m\"\u001b[39m)\n\u001b[32m 4\u001b[39m \u001b[38;5;66;03m# Examine the first link\u001b[39;00m\n\u001b[32m----> \u001b[39m\u001b[32m5\u001b[39m first_link = \u001b[43mside_menu_links\u001b[49m\u001b[43m[\u001b[49m\u001b[32;43m0\u001b[39;49m\u001b[43m]\u001b[49m\n\u001b[32m 6\u001b[39m \u001b[38;5;28mprint\u001b[39m(first_link)\n\u001b[32m 8\u001b[39m \u001b[38;5;66;03m# What class is this variable?\u001b[39;00m\n", - "\u001b[31mIndexError\u001b[39m: list index out of range" + "name": "stdout", + "output_type": "stream", + "text": [ + "\n", + " English\n", + " \n", + "Class: \n" ] } ], "source": [ - "# Get all sidemenu links as a list\n", - "side_menu_links = soup.select(\"a.sidemenu\")\n", + "# obtener elementos con el selector CSS \"a.dropdown-item\"\n", + "side_menu_links = soup.select(\"a.dropdown-item\")\n", "\n", - "# Examine the first link\n", + "# examinar el primer elemento\n", "first_link = side_menu_links[0]\n", - "print(first_link)\n", "\n", - "# What class is this variable?\n", - "print('Class: ', type(first_link))" + "# obtener el texto del enlace\n", + "print(first_link.text)\n", + "\n", + "# cuál clase de objeto es?\n", + "print('Class: ', type(first_link))\n" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ - "¡Es una etiqueta de Beautiful Soup! Esto significa que tiene un miembro \"texto\":" + "It's a Beautiful Soup tag! This means it has a `text` member:" ] }, { "cell_type": "code", - "execution_count": null, + "execution_count": 154, "metadata": { "tags": [] }, - "outputs": [], + "outputs": [ + { + "name": "stdout", + "output_type": "stream", + "text": [ + "\n", + " English\n", + " \n" + ] + } + ], "source": [ "print(first_link.text)" ] @@ -547,18 +1870,26 @@ "cell_type": "markdown", "metadata": {}, "source": [ - "A veces necesitamos el valor de ciertos atributos. Esto es especialmente relevante para las etiquetas «a» o enlaces, donde el atributo «href» nos indica adónde lleva el enlace.\n", + "Sometimes we want the value of certain attributes. This is particularly relevant for `a` tags, or links, where the `href` attribute tells us where the link goes.\n", "\n", - "💡 **Consejo**: Puedes acceder a los atributos de una etiqueta tratándola como un diccionario:" + "💡 **Tip**: You can access a tag’s attributes by treating the tag like a dictionary:" ] }, { "cell_type": "code", - "execution_count": null, + "execution_count": 155, "metadata": { "tags": [] }, - "outputs": [], + "outputs": [ + { + "name": "stdout", + "output_type": "stream", + "text": [ + "#\n" + ] + } + ], "source": [ "print(first_link['href'])" ] @@ -569,16 +1900,21 @@ "source": [ "## 🥊 Desafío: Extraer atributos específicos\n", "\n", - "Extraer todos los atributos `href` de cada URL `mainmenu`." + "Extrae todos los atributos `href` de cada URL `mainmenu`." ] }, { "cell_type": "code", - "execution_count": null, + "execution_count": 156, "metadata": {}, "outputs": [], "source": [ - "# YOUR CODE HERE\n" + "# YOUR CODE HERE\n", + "# Extraer todos los atributos href de cada enlace con la clase 'mainmenu'\n", + "mainmenu_links = soup.select(\"a.mainmenu\")\n", + "\n", + "for link in mainmenu_links:\n", + " print(link.get('href'))\n" ] }, { @@ -589,7 +1925,7 @@ "\n", "# Análisis de la Asamblea General de Illinois\n", "\n", - "Aunque parezca increíble, estas son las herramientas fundamentales para analizar un sitio web. Una vez que dediques más tiempo a familiarizarte con HTML y CSS, simplemente será cuestión de comprender la estructura de un sitio web específico y aplicar inteligentemente las herramientas de Beautiful Soup y Python.\n", + "Aunque parezca increíble, estas son las herramientas fundamentales para analizar un sitio web. Una vez que dediques más tiempo a familiarizarte con HTML y CSS, solo tendrás que comprender la estructura de un sitio web específico y aplicar con inteligencia las herramientas de Beautiful Soup y Python.\n", "\n", "Apliquemos estas habilidades para analizar la [98.ª Asamblea General de Illinois](http://www.ilga.gov/senate/default.asp?GA=98).\n", "\n", @@ -600,24 +1936,24 @@ "cell_type": "markdown", "metadata": {}, "source": [ - "## Rastrear y analizar la página web\n", + "## Analizar la página web\n", "\n", - "Rastreemos y analicemos la página web con las herramientas que aprendimos en la sección anterior." + "Analicemos la página web usando las herramientas que aprendimos en la sección anterior." ] }, { "cell_type": "code", - "execution_count": null, + "execution_count": 157, "metadata": { "tags": [] }, "outputs": [], "source": [ - "# Make a GET request\n", - "req = requests.get('http://www.ilga.gov/senate/default.asp?GA=98')\n", - "# Read the content of the server’s response\n", + "# Hacemos una nueva solicitud a otra página\n", + "req = requests.get('https://www.ilga.gov/Senate/Members/rptMemberList')\n", + "# leer el contenido de la respuesta del servidor\n", "src = req.text\n", - "# Soup it\n", + "# analiza la respuesta y conviértela en un árbol HTML.\n", "soup = BeautifulSoup(src, \"lxml\")" ] }, @@ -632,11 +1968,22 @@ }, { "cell_type": "code", - "execution_count": null, + "execution_count": 158, "metadata": {}, - "outputs": [], + "outputs": [ + { + "data": { + "text/plain": [ + "60" + ] + }, + "execution_count": 158, + "metadata": {}, + "output_type": "execute_result" + } + ], "source": [ - "# Get all table row elements\n", + "# obtener todas las filas de la tabla\n", "rows = soup.find_all(\"tr\")\n", "len(rows)" ] @@ -650,15 +1997,15 @@ }, { "cell_type": "code", - "execution_count": null, + "execution_count": 159, "metadata": {}, "outputs": [], "source": [ - "# Returns every ‘tr tr tr’ css selector in the page\n", - "rows = soup.select('tr tr tr')\n", + "# retornar solo las filas que están dentro de otra fila\n", + "rows = soup.select('a.dropdown-item')\n", "\n", - "for row in rows[:5]:\n", - " print(row, '\\n')" + "for row in rows[:20]:\n", + " print(row, '\\n')\n" ] }, { @@ -670,12 +2017,42 @@ }, { "cell_type": "code", - "execution_count": null, + "execution_count": 160, "metadata": {}, - "outputs": [], + "outputs": [ + { + "name": "stdout", + "output_type": "stream", + "text": [ + "0\n", + "[]\n" + ] + } + ], "source": [ - "example_row = rows[2]\n", - "print(example_row.prettify())" + "print(len(rows))\n", + "print(rows)\n" + ] + }, + { + "cell_type": "code", + "execution_count": 161, + "metadata": {}, + "outputs": [ + { + "name": "stdout", + "output_type": "stream", + "text": [ + "No rows found.\n" + ] + } + ], + "source": [ + "if rows:\n", + "\texample_row = rows[0]\n", + "\tprint(example_row.prettify())\n", + "else:\n", + "\tprint(\"No rows found.\")" ] }, { @@ -691,21 +2068,93 @@ }, { "cell_type": "code", - "execution_count": null, + "execution_count": 162, "metadata": {}, "outputs": [], "source": [ - "for cell in example_row.select('td'):\n", - " print(cell)\n", - "print()\n", + "# Parse the HTML to get the rows\n", + "soup = BeautifulSoup(src, \"lxml\")\n", + "rows = soup.select(\"tbody tr\")\n", "\n", - "for cell in example_row.select('.detail'):\n", - " print(cell)\n", - "print()\n", + "if rows:\n", + "\texample_row = rows[0]\n", + "else:\n", + "\texample_row = None\n", + "\tprint(\"No rows found.\")\n", + " " + ] + }, + { + "cell_type": "code", + "execution_count": 163, + "metadata": {}, + "outputs": [ + { + "name": "stdout", + "output_type": "stream", + "text": [ + "\n", + " \n", + " \n", + " Neil Anderson\n", + " \n", + " (R)\n", + "
\n", + " 47th District\n", + " \n", + " \n", + " 208 A Capitol Building\n", + "
\n", + "
\n", + " Springfield, IL 62706\n", + "
\n", + " (217) 782-5957\n", + " \n", + " \n", + " 103 North College Avenue\n", + "
\n", + " #201\n", + "
\n", + " Aledo IL 61231\n", + "
\n", + " (309) 230-7584\n", + " \n", + "\n", + "\n", + "\n", + "Neil Anderson (R)\n", + "
\n", + " 47th District\n", + " \n", + "\n", + " 208 A Capitol Building
\n", + "
\n", + " Springfield, IL 62706
\n", + " (217) 782-5957\n", + " \n", + " \n", + "103 North College Avenue
\n", + " #201
\n", + " Aledo IL 61231
\n", + " (309) 230-7584\n", + " \n" + ] + } + ], + "source": [ + "if rows:\n", + " example_row = rows[0]\n", + " print(example_row.prettify())\n", "\n", - "for cell in example_row.select('td.detail'):\n", - " print(cell)\n", - "print()" + " # Aquí procesa example_row solo si existe\n", + " for cell in example_row.select('td'):\n", + " print(cell)\n", + " for cell in example_row.select('.detail'):\n", + " print(cell)\n", + " for cell in example_row.select('td.detail'):\n", + " print(cell)\n", + "else:\n", + " print(\"No rows found.\")\n" ] }, { @@ -717,20 +2166,40 @@ }, { "cell_type": "code", - "execution_count": null, + "execution_count": 167, "metadata": { "tags": [] }, - "outputs": [], + "outputs": [ + { + "name": "stdout", + "output_type": "stream", + "text": [ + "tds: 3\n", + "details: 0\n", + "td.details: 0\n" + ] + } + ], "source": [ - "assert example_row.select('td') == example_row.select('.detail') == example_row.select('td.detail')" + "# revisamos cuántos elementos encuentra cada método\n", + "tds = example_row.select('td')\n", + "details = example_row.select('.detail')\n", + "td_details = example_row.select('td.detail')\n", + "\n", + "print(\"tds:\", len(tds))\n", + "print(\"details:\", len(details))\n", + "print(\"td.details:\", len(td_details))\n", + "\n", + "# solo para verificar que todos los .detail están en \n", + "assert td_details == details # estos deben de ser iguales" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ - "Utilicemos el selector `td.detail` para ser lo más específicos posible." + "Let's use the selector `td.detail` to be as specific as possible." ] }, { @@ -748,7 +2217,7 @@ "cell_type": "markdown", "metadata": {}, "source": [ - "La mayoría de las veces, nos interesa el **texto** real de un sitio web, no sus etiquetas. Recordemos que para obtener el texto de un elemento HTML, usamos el miembro `text`:" + "Most of the time, we're interested in the actual **text** of a website, not its tags. Recall that to get the text of an HTML element, we use the `text` member:" ] }, { @@ -767,7 +2236,7 @@ "cell_type": "markdown", "metadata": {}, "source": [ - "¡Se ve bien! Ahora solo necesitamos usar nuestros conocimientos básicos de Python para obtener los elementos de esta lista que necesitamos. Recuerda: queremos el nombre del senador, su distrito y su partido." + "Looks good! Now we just use our basic Python knowledge to get the elements of this list that we want. Remember, we want the senator's name, their district, and their party." ] }, { @@ -785,9 +2254,9 @@ "cell_type": "markdown", "metadata": {}, "source": [ - "## Eliminando filas basura\n", + "## Getting Rid of Junk Rows\n", "\n", - "Vimos al principio que no todas las filas que obtuvimos corresponden a un senador. Tendremos que hacer limpieza antes de continuar. Vean algunos ejemplos:" + "We saw at the beginning that not all of the rows we got actually correspond to a senator. We'll need to do some cleaning before we can proceed forward. Take a look at some examples:" ] }, { @@ -805,9 +2274,9 @@ "cell_type": "markdown", "metadata": {}, "source": [ - "Al escribir nuestro bucle for, queremos que solo se aplique a las filas relevantes. Por lo tanto, debemos filtrar las filas irrelevantes. Para ello, comparamos algunas de estas filas con las que necesitamos, observamos sus diferencias y luego formulamos esto en una condición.\n", + "When we write our for loop, we only want it to apply to the relevant rows. So we'll need to filter out the irrelevant rows. The way to do this is to compare some of these to the rows we do want, see how they differ, and then formulate that in a conditional.\n", "\n", - "Como puedes imaginar, hay muchas maneras de hacerlo, y dependerá del sitio web. Aquí te mostraremos algunas para que te hagas una idea de cómo hacerlo." + "As you can imagine, there a lot of possible ways to do this, and it'll depend on the website. We'll show some here to give you an idea of how to do this." ] }, { @@ -829,7 +2298,7 @@ "cell_type": "markdown", "metadata": {}, "source": [ - "Quizás las buenas filas tengan una longitud de 5. Comprobémoslo:" + "Perhaps good rows have a length of 5. Let's check:" ] }, { @@ -850,7 +2319,7 @@ "cell_type": "markdown", "metadata": {}, "source": [ - "Encontramos una fila de pie de página en nuestra lista que queremos evitar. Probemos algo diferente:" + "We found a footer row in our list that we'd like to avoid. Let's try something else:" ] }, { @@ -886,16 +2355,16 @@ "cell_type": "markdown", "metadata": {}, "source": [ - "¡Parece que encontramos algo que funcionó!" + "Looks like we found something that worked!" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ - "## Unir todo en un bucle\n", + "## Loop it All Together\n", "\n", - "Ahora que hemos visto cómo obtener los datos que queremos de una fila y filtrar las filas que no necesitamos, vamos a unirlo todo en un bucle." + "Now that we've seen how to get the data we want from one row, as well as filter out the rows we don't want, let's put it all together into a loop." ] }, { @@ -942,7 +2411,7 @@ "cell_type": "markdown", "metadata": {}, "source": [ - "Echemos un vistazo a lo que tenemos en \"miembros\"." + "Let's take a look at what we have in `members`." ] }, { @@ -958,37 +2427,37 @@ "cell_type": "markdown", "metadata": {}, "source": [ - "## 🥊 Desafío: Obtener elementos `href` que apunten a los proyectos de ley de los miembros\n", + "## 🥊 Challenge: Get `href` elements pointing to members' bills \n", "\n", - "El código anterior recupera información sobre:\n", + "The code above retrieves information on: \n", "\n", - "- el nombre del senador,\n", - "- su número de distrito,\n", - "- y su partido.\n", + "- the senator's name,\n", + "- their district number,\n", + "- and their party.\n", "\n", - "Ahora queremos recuperar la URL de la lista de proyectos de ley de cada senador. Cada URL seguirá un formato específico.\n", + "We now want to retrieve the URL for each senator's list of bills. Each URL will follow a specific format. \n", "\n", - "El formato de la lista de proyectos de ley de un senador determinado es:\n", + "The format for the list of bills for a given senator is:\n", "\n", "`http://www.ilga.gov/senate/SenatorBills.asp?GA=98&MemberID=[MEMBER_ID]&Primary=True`\n", "\n", - "para obtener algo como:\n", + "to get something like:\n", "\n", "`http://www.ilga.gov/senate/SenatorBills.asp?MemberID=1911&GA=98&Primary=True`\n", "\n", - "en el cual `MEMBER_ID=1911`. \n", + "in which `MEMBER_ID=1911`. \n", "\n", - "Deberías poder ver que, lamentablemente, `MEMBER_ID` no se extrae actualmente en nuestro código de extracción.\n", + "You should be able to see that, unfortunately, `MEMBER_ID` is not currently something pulled out in our scraping code.\n", "\n", - "Tu tarea inicial es modificar el código anterior para que también **recuperemos la URL completa que apunta a la página correspondiente de los proyectos de ley patrocinados por las primarias**, para cada miembro, y la devolvamos junto con su nombre, distrito y partido.\n", + "Your initial task is to modify the code above so that we also **retrieve the full URL which points to the corresponding page of primary-sponsored bills**, for each member, and return it along with their name, district, and party.\n", "\n", - "Consejos:\n", + "Tips: \n", "\n", - "* Para ello, deberás obtener el elemento de anclaje apropiado (``) en la fila de la tabla de cada legislador. Puedes usar el método `.select()` en el objeto `row` del bucle para hacerlo, similar al comando que encuentra todas las celdas `td.detail` de la fila. Recuerda que solo queremos el enlace a los proyectos de ley del legislador, no a los comités ni a su página de perfil.\n", - "* El HTML de los elementos de anclaje se verá como `Proyectos de ley`. La cadena del atributo `href` contiene el enlace **relativo** que buscamos. Puedes acceder a un atributo de un objeto `Tag` de BeatifulSoup de la misma manera que accedes a un diccionario de Python: `anchor['attributeName']`. Consulta la documentación para más detalles.\n", - "* Hay muchas maneras diferentes de usar BeautifulSoup. Puedes hacer lo que necesites para extraer el `href`.\n", + "* To do this, you will want to get the appropriate anchor element (``) in each legislator's row of the table. You can again use the `.select()` method on the `row` object in the loop to do this — similar to the command that finds all of the `td.detail` cells in the row. Remember that we only want the link to the legislator's bills, not the committees or the legislator's profile page.\n", + "* The anchor elements' HTML will look like `Bills`. The string in the `href` attribute contains the **relative** link we are after. You can access an attribute of a BeatifulSoup `Tag` object the same way you access a Python dictionary: `anchor['attributeName']`. See the documentation for more details.\n", + "* There are a _lot_ of different ways to use BeautifulSoup to get things done. whatever you need to do to pull the `href` out is fine.\n", "\n", - "El código se ha completado parcialmente. Complétalo donde dice `#TU CÓDIGO AQUÍ`. Guarda la ruta en un objeto llamado `full_path`." + "The code has been partially filled out for you. Fill it in where it says `#YOUR CODE HERE`. Save the path into an object called `full_path`." ] }, { @@ -1049,9 +2518,9 @@ "cell_type": "markdown", "metadata": {}, "source": [ - "## 🥊 Desafío: Modulariza tu código\n", + "## 🥊 Challenge: Modularize Your Code\n", "\n", - "Convierte el código anterior en una función que acepte una URL, rastree la URL para encontrar sus senadores y devuelva una lista de tuplas con información sobre cada senador." + "Turn the code above into a function that accepts a URL, scrapes the URL for its senators, and returns a list of tuples containing information about each senator. " ] }, { @@ -1085,21 +2554,21 @@ "cell_type": "markdown", "metadata": {}, "source": [ - "## 🥊Desafío práctico: Escribir una función de scraping\n", - "\n", - "Queremos scraping las páginas web correspondientes a los proyectos de ley patrocinados por cada proyecto de ley.\n", + "## 🥊 Take-home Challenge: Writing a Scraper Function\n", "\n", - "Escribir una función llamada `get_bills(url)` para analizar la URL de un proyecto de ley. Esto implica:\n", + "We want to scrape the webpages corresponding to bills sponsored by each bills.\n", "\n", - "- Solicitar la URL mediante la biblioteca `requests`\n", - "- Usar las funciones de la biblioteca `BeautifulSoup` para encontrar todos los elementos `` con la clase `billlist`\n", - "- Devolver una _lista_ de tuplas, cada una con:\n", - "- Descripción (2.ª columna)\n", - "- Cámara (S o H) (3.ª columna)\n", - "- La última acción (4.ª columna)\n", - "- La fecha de la última acción (5.ª columna)\n", + "Write a function called `get_bills(url)` to parse a given bills URL. This will involve:\n", "\n", - "Esta función se ha completado parcialmente. Complete el resto." + " - requesting the URL using the `requests` library\n", + " - using the features of the `BeautifulSoup` library to find all of the `` elements with the class `billlist`\n", + " - return a _list_ of tuples, each with:\n", + " - description (2nd column)\n", + " - chamber (S or H) (3rd column)\n", + " - the last action (4th column)\n", + " - the last action date (5th column)\n", + " \n", + "This function has been partially completed. Fill in the rest." ] }, { @@ -1117,11 +2586,11 @@ " bills = []\n", " for row in rows:\n", " # YOUR CODE HERE\n", - " # bill_id =\n", - " #description =\n", - " #chamber =\n", - " #last_action =\n", - " #last_action_date =\n", + " bill_id =\n", + " description =\n", + " chamber =\n", + " last_action =\n", + " last_action_date =\n", " bill = (bill_id, description, chamber, last_action, last_action_date)\n", " bills.append(bill)\n", " return bills" @@ -1144,11 +2613,11 @@ "cell_type": "markdown", "metadata": {}, "source": [ - "### Extraer todos los proyectos de ley\n", + "### Scrape All Bills\n", "\n", - "Finalmente, cree un diccionario `bills_dict` que asigne un número de distrito (la clave) a una lista de proyectos de ley (el valor) provenientes de ese distrito. Puede hacerlo recorriendo en bucle todos los miembros del senado en `members_dict` y llamando a `get_bills()` para cada una de las URL de sus proyectos de ley asociados.\n", + "Finally, create a dictionary `bills_dict` which maps a district number (the key) onto a list of bills (the value) coming from that district. You can do this by looping over all of the senate members in `members_dict` and calling `get_bills()` for each of their associated bill URLs.\n", "\n", - "**NOTA:** Por favor, llame a la función `time.sleep(1)` en cada iteración del bucle para no destruir el sitio web del estado." + "**NOTE:** please call the function `time.sleep(1)` for each iteration of the loop, so that we don't destroy the state's web site." ] }, { @@ -1178,7 +2647,7 @@ "metadata": { "anaconda-cloud": {}, "kernelspec": { - "display_name": ".venv312", + "display_name": "Python 3", "language": "python", "name": "python3" }, @@ -1192,7 +2661,7 @@ "name": "python", "nbconvert_exporter": "python", "pygments_lexer": "ipython3", - "version": "3.12.10" + "version": "3.13.6" } }, "nbformat": 4,