Skip to content

Conversation

@ochafik
Copy link
Collaborator

@ochafik ochafik commented Aug 13, 2023

I've played around with Python bindings generation and thought it was worth adding as an example. This PR features:

  • Access to full C API (incl. CUDA, MPI, OpenCL, Metal, alloc... and any local API changes)
  • Instant regeneration with python regenerate.py (a cffi wrapper; uses llama.cpp headers by default, see README.md for options)
  • Lightweight utils to do copies between tensors (ggml & numpy.ndarray alike) with automatic (de/re)quantization, or view a tensor as a numpy ndarray
  • Full stubs with preservation of original signatures as docstrings (for neat autocomplete in IDEs)

You can play with it directly with this Colab

Screenshot 2023-08-13 at 19 32 20 Screenshot 2023-08-13 at 19 32 34

Some notes:

  • I've committed the generated bindings (ggml/cffi.py) and stubs (ggml/__init__.pyi) mostly for shows (to ease up the review) but we probably won't want to keep them in the repo (lemme know what you think, happy to add a cmake command to generate them during the build)
  • There's already bindings w/ high-quality docs (https://github.com/abetlen/ggml-python) but I wanted something that's easy to autogenerate / keep up-to-date with local changes, and I thought having it in the repo's examples made sense.
  • I've tried cffi's native extension generation (see complex branch) and decided not to go for it as it's quite a bit more fiddly than this "little" example, with unclear performance benefits.

Apologies if my Python style is questionable, I'm a bit new 😅

ochafik added 2 commits August 13, 2023 20:05
Add python example w/ cffi-generated bindings

Features:

- Seamless copies between tensors (ggml & numpy alike) with automatic (de/re)quantization
- Access to full C API (incl. CUDA, MPI, OpenCL, Metal, alloc... and any local API changes)
- Trivial regeneration with `python regenerate.py` (uses llama.cpp headers by default, README.md for options)
Copy link
Member

@ggerganov ggerganov left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for the contribution!

Looks interesting. I'm not a big Python user, so cannot tell how useful this is, but it looks well done. As long as it does not interfere with the C library we can add it to the repo.

Feel free to merge it

@ochafik
Copy link
Collaborator Author

ochafik commented Aug 22, 2023

@ggerganov thanks a lot, will merge now (pushed some cosmetic changes + regenerated bindings, so much new gguf stuff 🤗)

re/ usefulness, I'm working on a minimalist Python hybrid of llama2.c + llama.cpp, which I hope could be a easy platform to experiment with various modifications (e.g. on-the-fly matrix decomposition / pruning at loading time w/ numpy)

Debugging GGML_ASSERT failures in Python can be fiddly but I'll cleanup a util that catches SIGABRT and enters an interpreter for easy debugging.

@ochafik ochafik merged commit ffab9c3 into ggml-org:master Aug 22, 2023
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants