Recipe Scrapers JS
⚠️ Alpha Version
This library is currently in alpha, APIs and behavior may change without notice. Use at your own risk.
A TypeScript/JavaScript library for scraping recipe data from various cooking websites. This is a JavaScript port inspired by the Python recipe-scrapers library.
Features
- 🍳 Extract structured recipe data from cooking websites
- 🔍 Support for multiple popular recipe sites
- 🚀 Built with TypeScript for better developer experience
- ⚡ Fast and lightweight using Bun runtime for development and testing
- 🧪 Comprehensive test coverage
Installation
npm install recipe-scrapers-js
# or
yarn add recipe-scrapers-js
# or
pnpm add recipe-scrapers-js
# or
bun add recipe-scrapers-js
Usage
Basic Usage
import { getScraper } from 'recipe-scrapers-js'
const html = `<html>The html to scrape...</html>`
const url = 'https://allrecipes.com/recipe/example'
// Get a scraper for a specific URL
// This function will throw if a scraper does not exist.
const MyScraper = getScraper(url)
const scraper = new MyScraper(html, url, /* { ...options } */)
const recipe = await scraper.toObject()
console.log(recipe)
Options
interface ScraperOptions {
/**
* Additional extractors to be used by the scraper.
* These extractors will be added to the default set of extractors.
* Extractors are applied according to their priority.
* Higher priority extractors will run first.
* @default []
*/
extraExtractors?: ExtractorPlugin[]
/**
* Additional post-processors to be used by the scraper.
* These post-processors will be added to the default set of post-processors.
* Post-processors are applied after all extractors have run.
* Post-processors are also applied according to their priority.
* Higher priority post-processors will run first.
* @default []
*/
extraPostProcessors?: PostProcessorPlugin[]
/**
* Whether link scraping is enabled.
* @default false
*/
linksEnabled?: boolean
/**
* Logging level for the scraper.
* This controls the verbosity of logs produced by the scraper.
* @default LogLevel.Warn
*/
logLevel?: LogLevel
}
Supported Sites
This library supports recipe extraction from various popular cooking websites. The scraper automatically detects the appropriate parser based on the URL.
Development
Prerequisites
- Bun (latest version)
Setup
# Clone the repository
git clone https://github.com/nerdstep/recipe-scrapers-js.git
cd recipe-scrapers
# Install dependencies
bun install
# Run tests
bun test
# Build the project
bun run build
Scripts
bun run build
- Build the library for distributionbun test
- Run the test suitebun test:coverage
- Run tests with coverage reportbun fetch-test-data
- Fetch test data from the original Python repositorybun lint
- Run linting and type checkingbun lint:fix
- Fix linting issues automatically
Adding New Scrapers
Fetch test data from the original Python repository
bun fetch-test-data
Convert the data into the expected JSON format (i.e. the
RecipeObject
interface)bun process-test-data <host>
Create a new scraper class extending
AbstractScraper
- Implement the required methods for data extraction
- Add the scraper to the scrapers registry
- Run tests to ensure the extraction works as expected
- Update documentation as needed
import { AbstractScraper } from './abstract-scraper'
import type { RecipeFields } from '@/types/recipe.interface'
export class NewSiteScraper extends AbstractScraper {
static host() {
return 'www.newsite.com'
}
extractors = {
ingredients: this.extractIngredients.bind(this),
}
protected extractIngredients(): RecipeFields['ingredients'] {
const items = this.$('.ingredient').map((_, el) =>
this.$(el).text().trim()
).get()
return new Set(items)
}
// ... implement other extraction methods
}
Contributing
Contributions are welcome! Please feel free to submit a Pull Request. For major changes, please open an issue first to discuss what you would like to change.
- Fork the repository
- Create your feature branch (
git checkout -b feature/amazing-feature
) - Commit your changes (
git commit -m 'Add some amazing feature'
) - Push to the branch (
git push origin feature/amazing-feature
) - Open a Pull Request
Testing
The project uses test data from the original Python recipe-scrapers repository to ensure compatibility and accuracy. Tests are written using Bun's built-in test runner.
# Run all tests
bun test
# Run tests with coverage
bun test:coverage
License
This project is licensed under the MIT License - see the LICENSE file for details.
Acknowledgments
- Original recipe-scrapers Python library by hhursev
- Schema.org Recipe specification
- Cheerio for HTML parsing
Copyright and Usage
This library is for educational and personal use. Please respect the robots.txt files and terms of service of the websites you scrape.