#

CUDA

CUDA® is a parallel computing platform and programming model developed by NVIDIA for general computing on graphical processing units (GPUs). With CUDA, developers are able to dramatically speed up computing applications by harnessing the power of GPUs.

Here are 1,018 public repositories matching this topic...

vllm-project / vllm

A high-throughput and memory-efficient inference and serving engine for LLMs

Updated Nov 5, 2025
Python

sgl-project / sglang

SGLang is a fast serving framework for large language models and vision language models.

Updated Nov 5, 2025
Python

numba

numba / numba

NumPy aware dynamic Python compiler using LLVM

python compiler numpy llvm parallel cuda numba

Updated Nov 4, 2025
Python

cupy / cupy

NumPy & SciPy for GPU

python gpu numpy cuda cublas scipy tensor cudnn rocm cupy cusolver nccl curand cusparse nvrtc cutensor nvtx cusparselt

Updated Nov 4, 2025
Python

nvitop

XuehaiPan / nvitop

An interactive NVIDIA-GPU process viewer and beyond, the one-stop solution for GPU process management.

console monitoring gpu grafana cuda prometheus nvidia prometheus-exporter curses nvml top command-line-tool htop grafana-dashboard nvidia-smi monitoring-tool process-monitoring gpu-monitoring resource-monitor

Updated Oct 27, 2025
Python

chainer / chainer

A flexible framework of neural networks for deep learning

python machine-learning deep-learning neural-network chainer gpu numpy cuda neural-networks cudnn cupy

Updated Aug 28, 2023
Python

LMCache / LMCache

Supercharge Your LLM with the Fastest KV Cache Layer

fast amd cuda inference pytorch speed rocm kv-cache llm vllm

Updated Nov 5, 2025
Python

NVIDIA / warp

A Python framework for accelerated simulation, data generation and spatial computing.

python gpu cuda nvidia gpu-acceleration differentiable-programming nvidia-warp

Updated Nov 5, 2025
Python

NVIDIAGameWorks / kaolin

A PyTorch Library for Accelerating 3D Deep Learning Research

cuda pytorch artificial-intelligence neural-networks camera-api physics-simulation rasterization interactive-visualizations 3d-deep-learning differentiable-rendering differentiable-lighting gaussian-splatting nvidia-warp

Updated Oct 27, 2025
Python

gpustack / gpustack

Simple, scalable AI model deployment on GPU clusters

Updated Nov 5, 2025
Python

Jittor / jittor

Jittor is a high-performance deep learning framework based on JIT compiling and meta-operators.

python deep-learning gpu cuda jittor

Updated Jul 28, 2025
Python

NVIDIA / TransformerEngine

A library for accelerating Transformer models on NVIDIA GPUs, including using 8-bit floating point (FP8) precision on Hopper, Ada and Blackwell GPUs, to provide better performance with lower memory utilization in both training and inference.

python machine-learning deep-learning gpu cuda pytorch jax fp8

Updated Nov 4, 2025
Python

pytorch / TensorRT

PyTorch/TorchScript/FX compiler for NVIDIA GPUs using TensorRT

machine-learning deep-learning cuda pytorch nvidia jetson tensorrt libtorch

Updated Nov 5, 2025
Python

NVIDIA / MinkowskiEngine

Minkowski Engine is an auto-diff neural network library for high-dimensional sparse tensors

Updated Mar 5, 2024
Python

pytorch / ao

PyTorch native quantization and sparsity for training and inference

training sparsity cuda inference optimizer pytorch transformer offloading llama quantization mx brrr dtypes float8

Updated Nov 5, 2025
Python

meta-pytorch / torchrec

Pytorch domain library for recommendation systems

deep-learning gpu cuda pytorch recommendation-system sharding recommender-system

Updated Nov 5, 2025
Python

viseron

roflcoopter / viseron

Self-hosted, local only NVR and AI Computer Vision software. With features such as object detection, motion detection, face recognition and more, it gives you the power to keep an eye on your home, office or any other place you want to monitor.

Updated Nov 3, 2025
Python

containers / ramalama

RamaLama is an open-source developer tool that simplifies the local serving of AI models from any source and facilitates their use for inference in production, all through the familiar language of containers.

ai containers cuda intel hip hacktoberfest inference-server podman llm llamacpp vllm

Updated Nov 5, 2025
Python

CoinCheung / pytorch-loss

label-smooth, amsoftmax, partial-fc, focal-loss, triplet-loss, lovasz-softmax. Maybe useful

cuda pytorch ema triplet-loss label-smoothing focal-loss amsoftmax dice-loss mish lovasz-softmax partial-fc

Updated Oct 17, 2024
Python

inducer / pycuda

CUDA integration for Python, plus shiny features

python gpu array cuda scientific-computing gpu-computing multidimensional-arrays pycuda

Updated Oct 12, 2025
Python

Created by Nvidia

Released June 23, 2007

Followers: 272 followers
Website: github.com/topics/cuda
Wikipedia: Wikipedia

Related topics

nvcc