Generate Chinese flashcards from raw learning text using Wiktionary and OpenAI.
- Python 3.9+
- OpenAI API key
- Install dependencies:
python3 -m pip install -r requirements.txt
- Create a
.env
in project root:
OPENAI_API_KEY=sk-...
# Optional model override
OPENAI_MODEL=gpt-4o-mini
- Create instance directories under
output/
, each with aninput.txt
:
mkdir -p output/book
cp your_input.txt output/book/input.txt
- Run generator (streams progress, skips existing
.md
files, halts on first error):
.venv/bin/python generate.py --verbose
- Setup venv + deps:
make setup
- Run generator via Makefile:
make generate
- Per-word files:
<HEADWORD>.md
in each instance directory underoutput/
.
- Automatically creates/uses
extracted.txt
in each instance to list vocab; edit it to change the processing set/order. - Skips any vocab that already has
<HEADWORD>.md
in the instance directory. - Recurses for multi‑character words: writes parent, subword character cards, and component cards until no more named components.
- Halts on the first BLOCKED with a detailed reason; re‑run to resume (completed files are skipped).
- Pronunciation is Pinyin‑only (tone marks), multiple readings separated by
/
. - Examples are formatted as
ZH (pinyin) - EN
.
.env
is auto-loaded on startup (no manual export needed).- The tool calls OpenAI for judgment steps (headword extraction and single-headword field extraction), then validates and writes Markdown following a fixed schema.