⚡ Build your chatbot within minutes on your favorite device; offer SOTA compression techniques for LLMs; run LLMs efficiently on Intel Platforms⚡
-
Updated
Oct 8, 2024 - Python
⚡ Build your chatbot within minutes on your favorite device; offer SOTA compression techniques for LLMs; run LLMs efficiently on Intel Platforms⚡
Large-scale LLM inference engine
Official Implementation of EAGLE-1 (ICML'24), EAGLE-2 (EMNLP'24), and EAGLE-3.
scalable and robust tree-based speculative decoding algorithm
Code for "LayerSkip: Enabling Early Exit Inference and Self-Speculative Decoding", ACL 2024
[COLM 2024] TriForce: Lossless Acceleration of Long Sequence Generation with Hierarchical Speculative Decoding
REST: Retrieval-Based Speculative Decoding, NAACL 2024
LLM Inference on consumer devices
[ICML 2025] |TokenSwift: Lossless Acceleration of Ultra Long Sequence Generation
[NeurIPS'23] Speculative Decoding with Big Little Decoder
Implementation of the paper Fast Inference from Transformers via Speculative Decoding, Leviathan et al. 2023.
Model compression toolkit engineered for enhanced usability, comprehensiveness, and efficiency.
[ICLR 2025] SWIFT: On-the-Fly Self-Speculative Decoding for LLM Inference Acceleration
Codes for our paper "Speculative Decoding: Exploiting Speculative Execution for Accelerating Seq2seq Generation" (EMNLP 2023 Findings)
[ICML 2025] Reward-guided Speculative Decoding (RSD) for efficiency and effectiveness.
Pretty and simple to use implementation of speculative decoding algorithm eagle which is extrapolation algorithm for greater language model efficiency 🦅
Official Implementation of SAM-Decoding: Speculative Decoding via Suffix Automaton
PipeInfer: Accelerating LLM Inference using Asynchronous Pipelined Speculation
minimal C implementation of speculative decoding based on llama2.c
Official Implementation of LANTERN (ICLR'25) and LANTERN++(ICLRW-SCOPE'25)
Add a description, image, and links to the speculative-decoding topic page so that developers can more easily learn about it.
To associate your repository with the speculative-decoding topic, visit your repo's landing page and select "manage topics."