-
Notifications
You must be signed in to change notification settings - Fork 3.6k
Description
🚀 Feature
In #5915 we started discussing what a good interface for the cluster environment could be and where it is headed in the future.
We should iterate on these ideas.
Some ideas / issues / refactors to discuss:
-
there is a small amount of code duplication.
-
Review Add LSF support #5102 and determine if interface needs changes
-
Add docs for all environments
-
Teardown logic: cluster environments that set environment variables (globally) should also be responsible for cleaning them up ([fix] Add a cluster environment teardown to clean up environment state #6942)
-
Convert methods to proper setters/getters (Rename "master" methods to "main" in ClusterEnvironment plugins #10103)
-
Consider renaming "creates_children". It is not clear if it means the environment within Lightning or the actual cluster. cc @Borda @justusschock @awaelchli @akihironitta @rohitgr7 @tchaton @ananthsub Remove deprecated method
ClusterEnvironment.creates_children#10339 -
Move left over slurm logic in slurm connector to the slurm environment (refactor slurm_job_id #10622 Deprecate
trainer.slurm_job_id#10615) -
A
detectmethod:
# returns true if we can detect the cluster environment.
def detect():
if ...
return True