-
Notifications
You must be signed in to change notification settings - Fork 3.6k
Improving Hydra+DDP support #11617
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Merged
Merged
Improving Hydra+DDP support #11617
Changes from all commits
Commits
Show all changes
40 commits
Select commit
Hold shift + click to select a range
64e66a6
adds hydra test
jgbos ff7d739
update tests
jgbos 77a0876
create method for hydra
jgbos 9d02c4c
minor test updates
jgbos 6361369
adds teardown code to support hydra multirun
jgbos dea8ea5
[pre-commit.ci] auto fixes from pre-commit.com hooks
pre-commit-ci[bot] 20499b0
fixes hydra available import checks
jgbos bb93d27
moved hydra specific code to utilities
jgbos fee8d61
[pre-commit.ci] auto fixes from pre-commit.com hooks
pre-commit-ci[bot] 2c1e839
fix mypy and flake8 errors
jgbos 4ec0a45
fixes error with old version of hydra
jgbos 0eb0627
Update pytorch_lightning/utilities/hydra.py
jgbos 6530c6a
addresses comments and simplifies tests
jgbos 7f340ed
rebase with launcher
rohitgr7 12c1122
add hydra launcher
rohitgr7 bdce069
move launcher
rohitgr7 bf0f0e8
Merge branch 'master' into fix-hydra-multirun
Borda 011db09
[pre-commit.ci] auto fixes from pre-commit.com hooks
pre-commit-ci[bot] d763922
prepare for merge
jgbos c6d4266
Merge remote-tracking branch 'origin' into fix-hydra-multirun
jgbos 5bcfad1
working post merge
jgbos 387a7cf
fixes test import
jgbos e8d625b
add support for hydra `rerun`
jgbos 0f12e9a
refactor subprocess cmd and add test
jgbos f67856f
Update tests/tests_pytorch/strategies/test_ddp_hydra_support.py
jgbos e2bdb76
Update tests/tests_pytorch/strategies/test_ddp_hydra_support.py
jgbos ec19a90
resolve comments
jgbos 6358621
uses hydra test_utils and checks standalone
jgbos 5bb0b61
Merge branch 'master' into fix-hydra-multirun
carmocca 3f2ab47
fix tests
rohitgr7 a2a8481
Merge branch 'master' into fix-hydra-multirun
rohitgr7 3120612
Revert "fix tests"
carmocca 17e82cd
Move file
carmocca e6029d3
Merge branch 'master' into fix-hydra-multirun
carmocca ed78629
Minor test signature changes
carmocca 5165f36
Tests require hydra>=1.0.7
carmocca e3ce4a7
oopsie
carmocca a827b2b
Update tests/tests_pytorch/strategies/launchers/test_subprocess_scrip…
carmocca 4b6e46d
fixes failing test
jgbos ba32666
Merge branch 'fix-hydra-multirun' of github.com:jgbos/pytorch-lightni…
jgbos File filter
Filter by extension
Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
There are no files selected for viewing
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
161 changes: 161 additions & 0 deletions
161
tests/tests_pytorch/strategies/launchers/test_subprocess_script.py
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
| Original file line number | Diff line number | Diff line change |
|---|---|---|
| @@ -0,0 +1,161 @@ | ||
| import logging | ||
| import os | ||
| import sys | ||
| from pathlib import Path | ||
|
|
||
| import pytest | ||
| from lightning_utilities.core.imports import RequirementCache | ||
|
|
||
| from pytorch_lightning.strategies.launchers.subprocess_script import _HYDRA_AVAILABLE | ||
| from tests_pytorch.helpers.runif import RunIf | ||
|
|
||
| _HYDRA_WITH_RERUN = RequirementCache("hydra-core>=1.2") | ||
| _HYDRA_WITH_RUN_PROCESS = RequirementCache("hydra-core>=1.0.7") | ||
|
|
||
| if _HYDRA_AVAILABLE: | ||
| from omegaconf import OmegaConf | ||
| if _HYDRA_WITH_RUN_PROCESS: | ||
| from hydra.test_utils.test_utils import run_process | ||
|
|
||
|
|
||
| # fixture to run hydra jobs in a clean temporary directory | ||
| # Hydra creates its own output directories and logs | ||
| @pytest.fixture | ||
| def cleandir(tmp_path): | ||
| """Run function in a temporary directory.""" | ||
| old_dir = os.getcwd() # get current working directory (cwd) | ||
| os.chdir(tmp_path) # change cwd to the temp-directory | ||
| yield tmp_path # yields control to the test to be run | ||
| os.chdir(old_dir) | ||
| logging.shutdown() | ||
|
|
||
|
|
||
| # Script to run from command line | ||
| script = """ | ||
| import hydra | ||
| import os | ||
| import torch | ||
|
|
||
| from pytorch_lightning import Trainer | ||
| from pytorch_lightning.demos.boring_classes import BoringModel | ||
|
|
||
| class BoringModelGPU(BoringModel): | ||
| def on_train_start(self) -> None: | ||
| # make sure that the model is on GPU when training | ||
| assert self.device == torch.device(f"cuda:{self.trainer.strategy.local_rank}") | ||
|
|
||
| @hydra.main(config_path=None, version_base="1.1") | ||
| def task_fn(cfg): | ||
| trainer = Trainer(accelerator="auto", devices=cfg.devices, strategy=cfg.strategy, fast_dev_run=True) | ||
| model = BoringModelGPU() | ||
| trainer.fit(model) | ||
| trainer.test(model) | ||
|
|
||
| if torch.distributed.is_initialized(): | ||
| torch.distributed.destroy_process_group() | ||
|
|
||
| os.environ.pop("LOCAL_RANK", None) | ||
|
|
||
| if __name__ == "__main__": | ||
| task_fn() | ||
| """ | ||
|
|
||
|
|
||
| @RunIf(min_cuda_gpus=2, skip_windows=True, standalone=True) | ||
carmocca marked this conversation as resolved.
Show resolved
Hide resolved
|
||
| @pytest.mark.skipif(not _HYDRA_WITH_RUN_PROCESS, reason=str(_HYDRA_WITH_RUN_PROCESS)) | ||
| @pytest.mark.parametrize("subdir", [None, "dksa", ".hello"]) | ||
| def test_ddp_with_hydra_runjob(cleandir, subdir): | ||
| # Save script locally | ||
| with open("temp.py", "w") as fn: | ||
| fn.write(script) | ||
|
|
||
| # Run CLI | ||
| devices = 2 | ||
| cmd = [sys.executable, "temp.py", f"+devices={devices}", '+strategy="ddp"'] | ||
| if subdir is not None: | ||
| cmd += [f"hydra.output_subdir={subdir}"] | ||
| run_process(cmd) | ||
|
|
||
| # Make sure config.yaml was created for additional | ||
| # processes. | ||
| logs = list(Path.cwd().glob("**/config.yaml")) | ||
| assert len(logs) == devices | ||
|
|
||
| # Make sure the parameter was set and used | ||
| cfg = OmegaConf.load(logs[0]) | ||
| assert cfg.devices == devices | ||
|
|
||
| # Make sure PL spawned a job that is logged by Hydra | ||
| logs = list(Path.cwd().glob("**/*.log")) | ||
| assert len(logs) == 1 | ||
|
|
||
|
|
||
| @RunIf(min_cuda_gpus=2, skip_windows=True, standalone=True) | ||
carmocca marked this conversation as resolved.
Show resolved
Hide resolved
|
||
| @pytest.mark.skipif(not _HYDRA_WITH_RUN_PROCESS, reason=str(_HYDRA_WITH_RUN_PROCESS)) | ||
| @pytest.mark.parametrize("num_jobs", [1, 2]) | ||
| def test_ddp_with_hydra_multirunjob(cleandir, num_jobs): | ||
| # Save script locally | ||
| with open("temp.py", "w") as fn: | ||
| fn.write(script) | ||
|
|
||
| # create fake multirun params based on `num_jobs` | ||
| fake_param = "+foo=" + ",".join(str(i) for i in range(num_jobs)) | ||
|
|
||
| # Run CLI | ||
| run_process([sys.executable, "temp.py", "+devices=2", '+strategy="ddp"', fake_param, "--multirun"]) | ||
|
|
||
| # Make sure config.yaml was created for each job | ||
| configs = sorted(Path.cwd().glob("**/.pl_ddp_hydra_*/config.yaml")) | ||
| assert len(configs) == num_jobs | ||
|
|
||
| # Make sure the parameter was set and used for each job | ||
| for i, config in enumerate(configs): | ||
| cfg = OmegaConf.load(config) | ||
| local_rank = int(config.parent.parent.parts[-1]) | ||
| assert cfg.devices == 2 | ||
| assert cfg.foo == local_rank | ||
|
|
||
| logs = list(Path.cwd().glob("**/*.log")) | ||
| assert len(logs) == num_jobs | ||
|
|
||
|
|
||
| yaml_file = """ | ||
| hydra: | ||
| callbacks: | ||
| save_job_info: | ||
| _target_: hydra.experimental.callbacks.PickleJobInfoCallback | ||
| """ | ||
|
|
||
|
|
||
| @RunIf(min_cuda_gpus=2, skip_windows=True, standalone=True) | ||
carmocca marked this conversation as resolved.
Show resolved
Hide resolved
|
||
| @pytest.mark.skipif(not _HYDRA_WITH_RERUN, reason=str(_HYDRA_WITH_RERUN)) | ||
| @pytest.mark.parametrize("num_jobs", [1, 2]) | ||
| def test_ddp_with_hydra_multirunjob_rerun(cleandir, num_jobs): | ||
| # Save script locally | ||
| with open("temp.py", "w") as fn: | ||
| fn.write(script) | ||
|
|
||
| with open("config.yaml", "w") as fn: | ||
| fn.write(yaml_file) | ||
|
|
||
| # create fake multirun params based on `num_jobs` | ||
| fake_param = "+foo=" + ",".join(str(i) for i in range(num_jobs)) | ||
|
|
||
| # Run CLI | ||
| run_process( | ||
| [ | ||
| sys.executable, | ||
| "temp.py", | ||
| "-cp", | ||
| ".", | ||
| "-cn", | ||
| "config.yaml", | ||
| "+devices=2", | ||
| '+strategy="ddp"', | ||
| fake_param, | ||
| "--multirun", | ||
| ] | ||
| ) | ||
|
|
||
| pickles = sorted(Path.cwd().glob("**/.hydra/config.pickle")) | ||
| assert len(pickles) == num_jobs | ||
carmocca marked this conversation as resolved.
Show resolved
Hide resolved
|
||
Add this suggestion to a batch that can be applied as a single commit.
This suggestion is invalid because no changes were made to the code.
Suggestions cannot be applied while the pull request is closed.
Suggestions cannot be applied while viewing a subset of changes.
Only one suggestion per line can be applied in a batch.
Add this suggestion to a batch that can be applied as a single commit.
Applying suggestions on deleted lines is not supported.
You must change the existing code in this line in order to create a valid suggestion.
Outdated suggestions cannot be applied.
This suggestion has been applied or marked resolved.
Suggestions cannot be applied from pending reviews.
Suggestions cannot be applied on multi-line comments.
Suggestions cannot be applied while the pull request is queued to merge.
Suggestion cannot be applied right now. Please check back later.
Uh oh!
There was an error while loading. Please reload this page.