Skip to content

State maintenance in DP #565

@S-aiueo32

Description

@S-aiueo32

In many image generation tasks with GANs, generator and discriminator is trained through the same generated image single iteration.
In PyTorch Lightning, the procedure is written like below:

def training_step(self, batch, batch_nb, optimizer_i):
    foo = batch['foo']
    bar = batch['bar']

    if optimizer_i == 0:  # train discriminator
        self.foo_out = self.netG(foo)  # register as a instance variable

        # calc d_loss
        d_loss = ...

        return {'loss': d_loss}

    elif optimizer_i == 1:  # train generator
        # common reconstruction error
        g_loss = F.l1_loss(self.foo_out, bar)
        # other losses
        ...

        return {'loss': g_loss}

It works well on single GPU, however, self.foo_out has been flushed in optimizer_i == 1 branch when DP is set.

I think it is a undesired behavior, any help or fix?

Metadata

Metadata

Assignees

Type

No type

Projects

No projects

Milestone

Relationships

None yet

Development

No branches or pull requests

Issue actions