Blazing-fast implementation of the FP-Growth algorithm for frequent pattern mining, written in Rust with Python bindings.
- 🔥 Fast: 10-30x faster than Python implementations (mlxtend, efficient-apriori)
- ⚡ Parallel: Multi-threaded mining using Rayon
- 💾 Memory-Efficient: Flat array storage, minimal allocations
- 🐍 Python Integration: Seamless numpy integration via PyO3
- 🧪 Well-Tested: Comprehensive unit and integration tests
- 📊 Benchmarked: Detailed performance comparisons available
FP-Growth (Frequent Pattern Growth) is an algorithm for mining frequent itemsets without candidate generation. It's faster than Apriori for dense datasets because it:
- Builds a compact FP-Tree (prefix tree) to compress transactions
- Uses divide-and-conquer to mine patterns recursively
- Avoids generating candidate itemsets explicitly
Use Cases:
- Market basket analysis (shopping patterns)
- Web usage mining (click patterns)
- Bioinformatics (gene expression patterns)
- Text mining (word co-occurrence)
# Clone the repository
git clone <your-repo-url>
cd priors
# Build and install Python package
cd priors
maturin develop --releaseimport numpy as np
import priors
# Create transaction matrix (rows=transactions, cols=items)
transactions = np.array([
[1, 1, 0, 0, 1], # Transaction 1: items A, B, E
[1, 1, 1, 0, 0], # Transaction 2: items A, B, C
[1, 0, 1, 1, 0], # Transaction 3: items A, C, D
[0, 1, 1, 0, 0], # Transaction 4: items B, C
], dtype=np.int32)
# Mine frequent itemsets with 50% minimum support
result = priors.fp_growth(transactions, min_support=0.5)
# result[0]: All frequent 1-itemsets
# result[1]: All frequent 2-itemsets
# result[2]: All frequent 3-itemsets, etc.
for level_idx, itemsets in enumerate(result):
print(f"Frequent {level_idx + 1}-itemsets: {len(itemsets)}")
print(itemsets)Benchmark on 1000 transactions, 50 items, 10 avg items/transaction:
| Implementation | Time | Speedup |
|---|---|---|
| priors (Rust) | 8ms | 23x |
| mlxtend (Python) | 185ms | 1x |
| efficient-apriori | 220ms | 0.84x |
See BENCHMARKING.md for detailed benchmarks and how to run them.
priors/
├── src/
│ ├── lib.rs # Python bindings
│ └── fp/ # FP-Growth module
│ ├── mod.rs # Module exports
│ ├── storage.rs # Memory-efficient storage
│ ├── tree/ # FP-Tree implementation
│ │ ├── mod.rs
│ │ ├── tree.rs # Tree structures
│ │ └── tree_ops.rs # Tree operations
│ ├── builder.rs # Tree construction
│ ├── mining.rs # Main algorithm
│ ├── combinations.rs # Single-path optimization
│ └── tests.rs # Unit tests
├── benches/ # Benchmarks
├── Cargo.toml # Rust dependencies
└── pyproject.toml # Python package config
# Run Rust tests
cargo test
# Run with output
cargo test -- --nocapture
# Run benchmarks
python benchmark_comparison.py- Start with
storage.rs: Learn about flat array storage - Then
tree/tree.rs: Understand FP-Tree structure - Then
tree/tree_ops.rs: See how the tree is manipulated - Then
builder.rs: Learn tree construction - Finally
mining.rs: The main algorithm
FP-Tree Structure:
Transactions: [[A,B,C], [A,B,D], [A,C]]
FP-Tree: root → A:3 → B:2 → C:1
└→ D:1
└→ C:1
Common prefixes are shared, saving memory!
Conditional Pattern Base: For item C, get all paths leading to C:
- Path 1:
[A, B]with count 1 - Path 2:
[A]with count 1
Build conditional tree for C and recurse!
- Rust 1.70+ (
rustup install stable) - Python 3.7+
- maturin (
pip install maturin)
# Development build (fast compile, slow runtime)
maturin develop
# Release build (slow compile, fast runtime)
maturin develop --release
# Just compile Rust (no Python)
cargo build --release- New storage format: Edit
storage.rs - Tree optimizations: Edit
tree/tree_ops.rs - Algorithm variants: Edit
mining.rs - Python API: Edit
lib.rs
- Use
with_capacity()for pre-allocation - Profile with
cargo flamegraph - Tune Rayon threads:
RAYON_NUM_THREADS=4
- Use
np.int32(notint64or Python int) - Ensure contiguous arrays:
np.ascontiguousarray() - Batch process multiple datasets
Contributions welcome! Areas to improve:
- More algorithm variants (Apriori, Eclat)
- Streaming FP-Growth for large datasets
- GPU acceleration (CUDA)
- More Python examples
- Documentation improvements
[Add your license here]
- Original FP-Growth paper: Han et al., "Mining Frequent Patterns without Candidate Generation" (2000)
- Inspired by mlxtend and efficient-apriori implementations
- Built with PyO3, ndarray, and Rayon
Made with ❤️ and Rust 🦀