ScalarLM is a fully open source, integrated LLM inference and training platform built on top of vLLM, Huggingface, and Megatron-LM
ScalarLM is built on top of these core components:
- vLLM - High-performance LLM inference engine
- Megatron-LM - Training harness, distribution strategy
- PyTorch - Deep learning framework
- Transformers - Model implementations and utilities
- FastAPI - API server framework
- Python 3.8+
- PyTorch 2.0+
- vLLM (installed in step 2 below)
- CUDA 11.8+ (optional but recommended, for GPU acceleration)
# Clone the repository
git clone https://github.com/scalarlm/scalarlm.git
cd scalarlm
# Start it
./scalarlm up
ScalarLM has been completely redesigned with a clean architecture that solves dependency management issues:
- β Zero coupling - vLLM has no knowledge of ScalarLM
- β External enhancement - ScalarLM adapters enhance vLLM models
- β Version independence - Use any vLLM version
- β Clean separation - Both systems evolve independently
# Start ScalarLM server (simplest way)
./scalarlm up
# View available commands
./scalarlm --help
./scalarlm up # Start ScalarLM server
./scalarlm benchmark # Run performance benchmarks
./scalarlm llm-logs # View LLM logs
./scalarlm llm-ls # List available models
./scalarlm llm-plot # Plot training metrics
./scalarlm llm-squeue # View training queue status
./scalarlm test # Run tests
./scalarlm build-image # Build Docker image
Target | Container | Latest Release |
---|---|---|
NVIDIA BLACKWELL | gdiamos/scalarlm-nvidia-12.0:latest |
gdiamos/scalarlm-nvidia-12.0:v0.99 |
NVIDIA HOPPER | gdiamos/scalarlm-nvidia-8.0:latest |
gdiamos/scalarlm-nvidia-8.0:v0.99 |
NVIDIA HOPPER | gdiamos/scalarlm-nvidia-8.6:latest |
gdiamos/scalarlm-nvidia-8.6:v0.99 |
NVIDIA ADA | gdiamos/scalarlm-nvidia-7.5:latest |
gdiamos/scalarlm-nvidia-7.5:v0.99 |
ARM | gdiamos/scalarlm-arm:latest |
gdiamos/scalarlm-arm:v0.99 |
AMD | gdiamos/scalarlm-amd:latest |
gdiamos/scalarlm-amd:v0.99 |
x86 | gdiamos/scalarlm-cpu:latest |
gdiamos/scalarlm-cpu:v0.99 |
# Or use ./scalarlm up command
./scalarlm up cpu # CPU version
./scalarlm up nvidia # NVIDIA GPU version
./scalarlm up amd # AMD GPU version
# Core Settings
export SCALARLM_MODEL="meta-llama/Llama-2-7b-hf" # Default model
# Performance Settings
export SCALARLM_GPU_MEMORY_UTILIZATION="0.9" # GPU memory usage
export SCALARLM_MAX_MODEL_LENGTH="2048" # Maximum model length
ScalarLM looks for configuration in these locations (in order):
/app/cray/cray-config.yaml
- Local project config (in the container)
Example cray-config.yaml
:
model: meta-llama/Llama-2-7b-hf
max_model_length: 2048
gpu_memory_utilization: 0.9
scalarlm/
βββ tests/ # Unit and integration tests
βββ infra/ # ScalarLM infrastructure
βββ ml/ # Training and ML components
βββ deployment/ # Deployment configurations
βββ README.md # This file
- π High-performance inference via vLLM
- π― Advanced training with Megatron-LM integration
- π OpenAI-compatible API for easy integration
- π Distributed training capabilities
- ποΈ Tokenformer adapters for enhanced performance
- ποΈ Zero coupling between vLLM and ScalarLM
- π Version independence - use any vLLM version
- π‘οΈ Robust dependency management
- π§ Easy maintenance and updates
- π¦ Modern packaging with pyproject.toml
- Fork the repository
- Create a feature branch (
git checkout -b feature/amazing-feature
) - Make your changes
- Run tests (
make test integration-test
) - Commit your changes (
git commit -m 'Add amazing feature'
) - Push to the branch (
git push origin feature/amazing-feature
) - Open a Pull Request
- Follow the clean architecture principles
- Maintain zero coupling between vLLM and ScalarLM
- Add tests for new features
- Update documentation as needed
- Use the provided Makefile for development tasks
ScalarLM is licensed under the CC-0 License. See LICENSE for details.
ScalarLM is inspired by the work of Seymour Roger Cray (1925-1996), "the father of supercomputing", who created the supercomputer industry and designed the fastest computers in the world for decades.
Built with:
- vLLM - High-performance LLM inference
- Megatron-LM - Large-scale training
- HuggingFace - Model hub and transformers
- PyTorch - Deep learning framework
Ready to get started? Run ./scalarlm up
to set up your development environment!