Fetch GPU stats using torch.cuda.memory_stats

## 🚀 Feature

(as discussed in https://github.com/PyTorchLightning/pytorch-lightning/discussions/7518)

Gather GPU stats using torch.cuda.memory_stats instead of nvidia-smi for GPUStatsMonitor.

### Motivation

Some machines do not have nvidia-smi installed, so they currently are unable to gather data using the GPUStatsMonitor callback, which is useful for detecting OOMs and debugging their models.



### Pitch

For users using PyTorch version >= 1.8.0, use [torch.cuda.memory_stats](https://pytorch.org/docs/stable/generated/torch.cuda.memory_stats.html) to gather memory data instead of invoking the nvidia-smi binary. 

Some fields (fan_speed, temperature) that are logged in GPUStatsMonitor are not available from torch.cuda.memory_stats. We can either 1) use nvidia-smi as a fallback if a user requests those fields or 2) Remove those fields if we see that they aren’t being used anywhere and don’t consider them useful anymore.





### Alternatives



### Additional context



______________________________________________________________________

#### If you enjoy Lightning, check out our other projects! ⚡

<sub>

- [**Metrics**](https://github.com/PyTorchLightning/metrics): Machine learning metrics for distributed, scalable PyTorch applications.

- [**Flash**](https://github.com/PyTorchLightning/lightning-flash): The fastest way to get a Lightning baseline! A collection of tasks for fast prototyping, baselining, finetuning and solving problems with deep learning

- [**Bolts**](https://github.com/PyTorchLightning/lightning-bolts): Pretrained SOTA Deep Learning models, callbacks and more for research and production with PyTorch Lightning and PyTorch

- [**Lightning Transformers**](https://github.com/PyTorchLightning/lightning-transformers): Flexible interface for high performance research using SOTA Transformers leveraging Pytorch Lightning, Transformers, and Hydra.

</sub>

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Fetch GPU stats using torch.cuda.memory_stats #8780

🚀 Feature

Motivation

Pitch

Alternatives

Additional context

If you enjoy Lightning, check out our other projects! ⚡

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Fetch GPU stats using torch.cuda.memory_stats #8780

Description

🚀 Feature

Motivation

Pitch

Alternatives

Additional context

If you enjoy Lightning, check out our other projects! ⚡

Metadata

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Issue actions