A high-throughput and memory-efficient inference and serving engine for LLMs
-
Updated
Nov 5, 2025 - Python
CUDA® is a parallel computing platform and programming model developed by NVIDIA for general computing on graphical processing units (GPUs). With CUDA, developers are able to dramatically speed up computing applications by harnessing the power of GPUs.
A high-throughput and memory-efficient inference and serving engine for LLMs
SGLang is a fast serving framework for large language models and vision language models.
An interactive NVIDIA-GPU process viewer and beyond, the one-stop solution for GPU process management.
A flexible framework of neural networks for deep learning
A Python framework for accelerated simulation, data generation and spatial computing.
A PyTorch Library for Accelerating 3D Deep Learning Research
Simple, scalable AI model deployment on GPU clusters
Jittor is a high-performance deep learning framework based on JIT compiling and meta-operators.
A library for accelerating Transformer models on NVIDIA GPUs, including using 8-bit floating point (FP8) precision on Hopper, Ada and Blackwell GPUs, to provide better performance with lower memory utilization in both training and inference.
PyTorch/TorchScript/FX compiler for NVIDIA GPUs using TensorRT
Minkowski Engine is an auto-diff neural network library for high-dimensional sparse tensors
PyTorch native quantization and sparsity for training and inference
Pytorch domain library for recommendation systems
Self-hosted, local only NVR and AI Computer Vision software. With features such as object detection, motion detection, face recognition and more, it gives you the power to keep an eye on your home, office or any other place you want to monitor.
RamaLama is an open-source developer tool that simplifies the local serving of AI models from any source and facilitates their use for inference in production, all through the familiar language of containers.
label-smooth, amsoftmax, partial-fc, focal-loss, triplet-loss, lovasz-softmax. Maybe useful
CUDA integration for Python, plus shiny features
Created by Nvidia
Released June 23, 2007