GVProf: A Value Profiler for GPU-based Clusters
-
Updated
Mar 24, 2024 - Python
GVProf: A Value Profiler for GPU-based Clusters
The GPU Optimizer for ML Models enhances GPU performance for machine learning. It offers advanced scheduling, real-time monitoring, and efficient resource management through a user-friendly web interface and robust API, integrating big data technologies for seamless data processing and model optimization. @NVIDIA
🤖 Ollama Consumer - A Python-based interactive chat interface for Ollama models with advanced model management, comprehensive benchmarking, vision support, and automatic error recovery. Features dynamic model switching, GPU optimization, and intelligent service monitoring for seamless AI model interactions.
Optimizing PyTorch Model Training by Wrapping Memory Mapped Tensors on Nvidia GPUs with TensorDict.
High-performance CUDA implementation of LayerNorm for PyTorch achieving 1.46x speedup through kernel fusion. Optimized for large language models (4K-8K hidden dims) with vectorized memory access, warp-level primitives, and mixed precision support. Drop-in replacement for nn.LayerNorm with 25% memory reduction.
Optimizing PyTorch Model Training by Wrapping Memory Mapped Tensors on an Nvidia GPU with TensorDict.
Add a description, image, and links to the gpu-optimization topic page so that developers can more easily learn about it.
To associate your repository with the gpu-optimization topic, visit your repo's landing page and select "manage topics."