Skip to content

Conversation

@CoderHam
Copy link

@CoderHam CoderHam commented May 3, 2024

  • There is a bug that was fixed in the 526 driver release. For older driver versions the recommendation is to downgrade the pynvml version to 11.4.0 and use 11.5.0 only for drivers after 526.

Uses the legacy pynvml memory usage function even with pynvml 11.5.0 if the driver version is older than 526.

Mentioned in the issue as well: #808 (comment)

- There is a bug that was fixed in the 526 driver release. For older driver versions the recommendation is to downgrade the pynvml version to 11.4.0 and use 11.5.0 only for drivers after 526.

Uses the legacy pynvml memory usage function even with pynvml 11.5.0 if the driver version is older than 526.

Mentioned in the issue as well: NVIDIA#808 (comment)
@jaedeok-nvidia
Copy link
Collaborator

Thanks for addressing the pynvml issue, relating to a driver version. @CoderHam can I know which doc(or link) you referred to determine the driver version (526)?

@CoderHam
Copy link
Author

@jaedeok-nvidia took a while to dig through it but I followed the thread from https://forums.developer.nvidia.com/t/nvml-bug-nvmldevicegetcomputerunningprocesses-returns-compute-processes-for-all-gpu-devices/222337/2 and NVIDIA/k8s-device-plugin#331 (comment)

This confirmed that the issue with missing symbols in the underlying nvml libraries prevents us from using the v2 api prior to driver 526.

@kaiyux kaiyux mentioned this pull request May 28, 2024
@kaiyux
Copy link
Member

kaiyux commented May 28, 2024

Hi @CoderHam , the changes are integrated in #1688 and we've credited you as co-author, hence I'm closing this PR now, thanks a lot

@kaiyux kaiyux closed this May 28, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

4 participants