Important
NeMo Cookbook is under active development
NeMo Cookbook is an example template for Generative AI with NVIDIA NeMo 2.0.
NVIDA NeMo is an accelerated end-to-end platform that is flexible and production ready. NeMo is comprised of several component frameworks which enable teams to build, customize, and deploy Generative AI solutions for:
- large language models
- vision language models
- video models
- speech models
NeMo Cookbook is inspired by NeMo tutorials
and focuses on using NeMo to tune generative models.
- Code profiling
- Logging training and tuning runs with Weights & Biases
- Model alignment with NeMo Aligner
- Model output control with NeMo Guardrails
- Containerization with Docker
We will use NVIDIA and Meta models including, but not limited to:
- NVIDIA Llama variants, Mistral variants, Megatron distillations, and Minitron
- NVIDIA embedding, reranking, and retrieval models
- NVIDIA Cosmos tokenizers
- NeMo compatible Meta Llama variants
- a CUDA compatible OS and device (GPU) with at least 48GB of VRAM (e.g. an L40S).
- CUDA 12.1
- Python 3.10.10
- Pytorch 2.2.1
Tip
See https://nemo.theosis.ai/cookbook/hardware for more regarding VRAM requirements of particular models
- NVIDIA Developer Program
- NVIDIA NGC for NeMo and TensorRT-LLM containers
- build.nvidia.com for API calls to NVIDIA hosted endpoints
- Hugging Face Hub for model weights and datasets
To prepare a development environment, please run the following in terminal:
bash install_requirements.sh
Doing so will install nemo_lab
along with the nemo_run
, megatron_core 0.10.0rc0
, and the nvidia/apex
PyTorch extension.
Note
megatron_core 0.10.0rc0
is required for compatibility with NeMo 2.0
Note
NVIDIA Apex is required for RoPE Scaling in NeMo 2.0. NVIDIA Apex is built with CUDA and C++ extensions for performance and full functionality. please be aware that the build process may take several minutes
Important
running the images requires for the host machine to have access to NVIDIA GPUs
Two Docker images have been created for the quick start tutorials. One for pretraining, and one for finetuning.
To run pretraining, do the following in terminal:
docker pull jxtngx/nemo-lab:pretrain
docker run --rm --gpus 1 -it jxtngx/nemo-lab:pretrain
python pretrain_nemotron3_4b.py
To run finetuning, do the following in terminal:
docker pull jxtngx/nemo-lab:finetune
docker run --rm --gpus 1 -it jxtngx/nemo-lab:finetune
# WAIT FOR CONTAINER TO START
huggingface-cli login
# ENTER HF KEY WHEN PROMPTED
python finetune_llama3_8b.py
Important
Finetuning requires a Hugging Face key and access to Llama 3 8B
For keys, see: https://huggingface.co/docs/hub/en/security-tokens
For Llama 3 8B access, see: https://huggingface.co/meta-llama/Meta-Llama-3-8B
Quickstart | Docker | NVIDIA |
---|---|---|
Pretrain | ||
Finetune |
Important
regarding the NVIDIA Launchable: use the following command in terminal to run the:
tuning: python /workspace/finetune_llama3_8b.py
training: python /workspace/pretrain_nemotron3_4b.py
Important
regarding the NVIDIA Launchable: to avoid data storage costs, be certain to delete the demo instance once the demo is complete