A Python-based web scraping tool that extracts course details (headline, summary, video link) from a specified educational platform and saves the results to CSV.
.
├── scrape.py # Main scraping script
├── requirements.txt # Python dependencies
├── LICENSE
├── cms_scrape.csv # Example output of scraped course data
└── README.md # This file
-
Python 3.7 or higher
-
Install required packages:
pip install -r requirements.txt
-
Configure target URL
- Open
scrape.pyand set theBASE_URLorTARGET_PAGEto the page you wish to scrape.
- Open
-
Run the scraper
python scrape.py
- The script will fetch the page content, parse the HTML, and extract headline, summary, and video link for each course.
- Results are written to
cms_scrape.csvin the project root.
-
Inspect output
- Open
cms_scrape.csvwith your favorite spreadsheet application or use pandas/R to analyze the data.
- Open
- Uses
requeststo retrieve HTML content. - Parses HTML with
BeautifulSoup(usinglxmlparser). - Finds course elements by CSS selectors or HTML tags.
- Extracts and cleans text fields.
- Writes structured data to CSV via the
csvmodule.
Contributions are welcome! Feel free to file issues or submit pull requests for enhancements, such as:
- Support for pagination or multiple pages.
- Export to other formats (JSON, Excel).
- Integration with scheduling or automation tools.
This project is licensed under the MIT License. See LICENSE for details.