Skip to content

Enhancing auto_select_GPU #2075

@sebastienwood

Description

@sebastienwood

🚀 Feature

Automatically pick a GPU that has enough VRAM for the workload.

Motivation

Auto_select_GPU #1426 offers potential to automatically adjust the choice of GPU when for example they may be in use by someone else.
However the current implementation only check if it's possible to use the GPU not if the GPU has enough VRAM for the usage.
The proposed feature would have the following workflow :

  • check which GPU can be used
  • on train start :
    • check if one batch can be sent through the network on current GPU
    • else change current GPU
    • repeat until there are no more GPU or a capable GPU has been found
    • (possibly ?) wait a bit then retry

Pitch

Keep the flag auto_select_GPU but add checks in the lifecycle to ensure you can actually use the GPU selected.

Alternatives

Add a user try/catch in on_train_start.

Additional context

The use case for this feature is when you want to run multiple jobs in parallel/on a shared cluster without manually tuning the appropriate GPU.

Metadata

Metadata

Assignees

No one assigned

    Labels

    featureIs an improvement or enhancementgood first issueGood for newcomershelp wantedOpen to be worked onlet's do it!approved to implementwon't fixThis will not be worked on

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions