This scraper creates offline versions in ZIM format of PhET science simulations for Science and Math.
It requires Node.js version 16 or higher.
npm i && phet2zimThe above will eventually output a ZIM file to output/
See phet2zim --help for details.
phet2zim --output generates ZIM files in a specific folder.
phet2zim --output myFolder--withoutLanguageVariants uses to exclude languages with Country variant. For example en_CA will not be present in zim with this argument.
--subjects is used to pass specific subjects to download. Pass values as csv. Sample of valid subjects :
physics, biology, earth-science, motion, sound-and-waves, work-energy-and-power, heat-and-thermodynamics, quantum-phenomena
Available only on GET step:
--withoutLanguageVariants ...Available on GET and EXPORT steps only:
--includeLanguages 'lang_1,lang_2,lang_3' ...
--excludeLanguages 'lang_1,lang_2,lang_3' ...
--subjects 'math,physics' ...Available on EXPORT step only:
# Skip ZIM files for individual languages
--mulOnly
# Create a ZIM file with all languages
--createMulExample:
phet2zim --includeLanguages en ru frAnother way to configure behaviour is through environment variables. Sample .env file (with default values):
# request per second, affects GET step only
PHET_RPS=8
# async workers on TRANSFORM step (keep it equal to number of CPU cores)
PHET_WORKERS=10
# number of retries on GET step (delay grow with exponential backoff)
PHET_RETRIES=5
# display verbose errors
PHET_VERBOSE_ERRORS=falseThis project achieves multiple things:
- Download PhET content
- Generate an Index for said content
- Generate ZIM file(s) containing content and index
Things this project does not yet do, but should:
- Generate Android APK
The functionality is split into 5 npm scripts:
npm run setup- deletes state from previous runsnpm run get- downloads PhET simulations in specified languagesnpm run transform- prepare the content and media filesnpm run export- generates ZIM file(s)npm start- runs all of the above in sequence
The steps get, transform and export have their own output directories:
getoutputs HTML and PNG files tostate/gettransformoutputs intermediate files tostate/transformexportoutputs HTML and PNG files tostate/exportAND a ZIM file(s) tooutput/(by default, unless customized with--output)
