-
Notifications
You must be signed in to change notification settings - Fork 3.6k
Closed
Labels
featureIs an improvement or enhancementIs an improvement or enhancementgood first issueGood for newcomersGood for newcomershelp wantedOpen to be worked onOpen to be worked onlet's do it!approved to implementapproved to implementwon't fixThis will not be worked onThis will not be worked on
Description
🚀 Feature
Automatically pick a GPU that has enough VRAM for the workload.
Motivation
Auto_select_GPU #1426 offers potential to automatically adjust the choice of GPU when for example they may be in use by someone else.
However the current implementation only check if it's possible to use the GPU not if the GPU has enough VRAM for the usage.
The proposed feature would have the following workflow :
- check which GPU can be used
- on train start :
- check if one batch can be sent through the network on current GPU
- else change current GPU
- repeat until there are no more GPU or a capable GPU has been found
- (possibly ?) wait a bit then retry
Pitch
Keep the flag auto_select_GPU but add checks in the lifecycle to ensure you can actually use the GPU selected.
Alternatives
Add a user try/catch in on_train_start.
Additional context
The use case for this feature is when you want to run multiple jobs in parallel/on a shared cluster without manually tuning the appropriate GPU.
Metadata
Metadata
Assignees
Labels
featureIs an improvement or enhancementIs an improvement or enhancementgood first issueGood for newcomersGood for newcomershelp wantedOpen to be worked onOpen to be worked onlet's do it!approved to implementapproved to implementwon't fixThis will not be worked onThis will not be worked on