-
Notifications
You must be signed in to change notification settings - Fork 3.6k
Closed
Labels
discussionIn a discussion stageIn a discussion stagefeatureIs an improvement or enhancementIs an improvement or enhancementquestionFurther information is requestedFurther information is requestedworking as intendedWorking as intendedWorking as intended
Description
Hi! Thank you for a great framework! I've tried to write down stages for training. E.g. in my config:
<<: *default
stage2:
<<: *default
datasets:
<<: *datasets
# 240k
root: ['path1',
'path2,
'path3]
per_folder_ratio: [1.0, 1.0, 1.0]
transform:
<<: *transform
augs_lvl: "light"
optimizer:
class_name: RAdam
lr: 0.0001
and in train.py:
for stage_name, stage in selected_stages.items():
print(f"running stage {stage_name}")
model = build_model(stage) # BUILD LIGHTNING MODULE from func
trainer = build_trainer(stage_config=stage,
module=model,
...) # MAKE pytorch_lightning Trainer using this model
trainer.fit(model)
I'm using early stopping which make one iteration in aforementioned loop (another way which i'm also tried is to wait max_epochs epochs).
The problem is that second call to trainer.fit initializes DDP one more time and program is crashing because ip address is not freed from previous DDP init.
RuntimeError: Address already in use
I've tried master version of pytorch lightning but the problem did not dissapear
Metadata
Metadata
Assignees
Labels
discussionIn a discussion stageIn a discussion stagefeatureIs an improvement or enhancementIs an improvement or enhancementquestionFurther information is requestedFurther information is requestedworking as intendedWorking as intendedWorking as intended