Skip to content

LightningCLI should place the config.yaml in the run_dir not only the log_dir #14162

@kotchin

Description

@kotchin

🚀 Feature

When using the logger (specifically WandB), LightningCLI should place the copy of the config file in the run_dir within the log_dir, not only in the log_dir (which can be common to multiple run_dirs).

Motivation

Running multiple different experiments with WandB using the LightningCLI yields various subdirectories within the logger.init_args.dir (which I'll refer to as log_dir) named by the run_name. When specifying unique run_names for each experiment, the logger places the checkpoint within each of these unique sub_dirs of the overall log_dir. LightningCLI also keeps a copy of the config.yaml used which is great, but places it in the log_dir instead of the unique sub_dir of the run.
Running multiple experiments is not possible as LightningCLI complains that a config file is already in the desired location and we should specify if we want to overwrite it. This is not a great option:

RuntimeError: SaveConfigCallback expected log_dir/config.yaml to NOT exist. Aborting to avoid overwriting results of a previous run. You can delete the previous config file, set `LightningCLI(save_config_callback=None)` to disable config saving, or set `LightningCLI(save_config_overwrite=True)` to overwrite the config file.

Pitch

In effect, I'm proposing something like this:
(Or the config file could be placed even deeper in the hierarchy if needed).

log_dir/
├── wise-morning-85
│   ├── config.yaml
│   └── 3il98jea
│         └── checkpoints
│             └── epoch=3-step=21281.ckpt
└── worldly-flower-88
    ├── config.yaml
    └── 2c5zevtf
         └── checkpoints
             └── epoch=4-step=21633.ckpt

Instead of the current:

log_dir/
├── config.yaml
├── wise-morning-85
│   └── 3il98jea
│       └── checkpoints
│           └── epoch=3-step=21281.ckpt
└── worldly-flower-88
    ├── 2c5zevtf
         └── checkpoints
             └── epoch=4-step=21633.ckpt

Alternatives

I don't have a proposal for any alternative implementation but am open to feedback from the community.

Additional context

There are a lot of discussions around the loggers (WandB in particular) and the general LightningCLI implementation (#14054 (comment) and possibly #12028 and #7543) which may impact this implementation/solution, so this issue may or may not be already considered when solving the broader effort of unifying the behavior of the loggers and the lightningCLI.

The behavior is visible when using the following versions:
python 3.10.4
pytorch-lightning 1.7.0
wandb 0.13.1

cc @awaelchli @morganmcg1 @borisdayma @scottire @manangoel99

Metadata

Metadata

Assignees

Labels

bugSomething isn't workinglogger: wandbWeights & Biases

Type

No type

Projects

No projects

Milestone

Relationships

None yet

Development

No branches or pull requests

Issue actions