⚠️ Alpha Version
This library is currently in alpha, APIs and behavior may change without notice. Use at your own risk.
A TypeScript/JavaScript library for scraping recipe data from various cooking websites. This is a JavaScript port inspired by the Python recipe-scrapers library.
- 🍳 Extract structured recipe data from cooking websites
- 🔍 Support for multiple popular recipe sites
- 🚀 Built with TypeScript for better developer experience
- ⚡ Fast and lightweight using Bun runtime for development and testing
- 🧪 Comprehensive test coverage
Add the recipe-scrapers-js
package and its peer dependencies.
npm install recipe-scrapers-js cheerio zod
# or
yarn add recipe-scrapers-js cheerio zod
# or
pnpm add recipe-scrapers-js cheerio zod
# or
bun add recipe-scrapers-js cheerio zod
import { getScraper } from 'recipe-scrapers-js'
const html = `<html>The html to scrape...</html>`
const url = 'https://allrecipes.com/recipe/example'
// Get a scraper for a specific URL
// This function will throw if a scraper does not exist.
const MyScraper = getScraper(url)
const scraper = new MyScraper(html, url, /* { ...options } */)
const recipe = await scraper.toObject()
console.log(recipe)
interface ScraperOptions {
/**
* Additional extractors to be used by the scraper.
* These extractors will be added to the default set of extractors.
* Extractors are applied according to their priority.
* Higher priority extractors will run first.
* @default []
*/
extraExtractors?: ExtractorPlugin[]
/**
* Additional post-processors to be used by the scraper.
* These post-processors will be added to the default set of post-processors.
* Post-processors are applied after all extractors have run.
* Post-processors are also applied according to their priority.
* Higher priority post-processors will run first.
* @default []
*/
extraPostProcessors?: PostProcessorPlugin[]
/**
* Whether link scraping is enabled.
* @default false
*/
linksEnabled?: boolean
/**
* Logging level for the scraper.
* This controls the verbosity of logs produced by the scraper.
* @default LogLevel.Warn
*/
logLevel?: LogLevel
}
This library supports recipe extraction from various popular cooking websites. The scraper automatically detects the appropriate parser based on the URL.
- Bun (latest version)
# Clone the repository
git clone https://github.com/nerdstep/recipe-scrapers-js.git
cd recipe-scrapers
# Install dependencies
bun install
# Run tests
bun test
# Build the project
bun run build
bun run build
- Build the library for distributionbun test
- Run the test suitebun test:coverage
- Run tests with coverage reportbun fetch-test-data
- Fetch test data from the original Python repositorybun lint
- Run linting and type checkingbun lint:fix
- Fix linting issues automatically
-
Fetch test data from the original Python repository
bun fetch-test-data
-
Convert the data into the expected JSON format (i.e. the
RecipeObject
interface)bun process-test-data <host>
-
Create a new scraper class extending
AbstractScraper
-
Implement the required methods for data extraction
-
Add the scraper to the scrapers registry
-
Run tests to ensure the extraction works as expected
-
Update documentation as needed
import { AbstractScraper } from './abstract-scraper'
import type { RecipeFields } from '@/types/recipe.interface'
export class NewSiteScraper extends AbstractScraper {
static host() {
return 'www.newsite.com'
}
extractors = {
ingredients: this.extractIngredients.bind(this),
}
protected extractIngredients(): RecipeFields['ingredients'] {
const items = this.$('.ingredient').map((_, el) =>
this.$(el).text().trim()
).get()
return new Set(items)
}
// ... implement other extraction methods
}
Contributions are welcome! Please feel free to submit a Pull Request. For major changes, please open an issue first to discuss what you would like to change.
- Fork the repository
- Create your feature branch (
git checkout -b feature/amazing-feature
) - Commit your changes (
git commit -m 'Add some amazing feature'
) - Push to the branch (
git push origin feature/amazing-feature
) - Open a Pull Request
The project uses test data from the original Python recipe-scrapers repository to ensure compatibility and accuracy. Tests are written using Bun's built-in test runner.
# Run all tests
bun test
# Run tests with coverage
bun test:coverage
This project is licensed under the MIT License - see the LICENSE file for details.
- Original recipe-scrapers Python library by hhursev
- Schema.org Recipe specification
- Cheerio for HTML parsing
- Zod for schema validation
This library is for educational and personal use. Please respect the robots.txt files and terms of service of the websites you scrape.