Initial integration with FairScale Pipe Module for model partitioning/gradient checkpointing #4443

SeanNaren · 2020-10-30T16:23:18Z

What does this PR do?

Closes #4322.

Relates to #4178 but I want to keep separate the FairScale ZeRO + ShardedDDP integration and the Pipe + Checkpointing integration since they're decoupled (for now) in fairscale.

Pipe allows a sequential model to be split onto separate GPUs. It comes with its own hyper-parameters and because it's tied to a torch.nn.sequential it needs closer integration from the users perspective.

Still unsure on the API, but want to throw something out for us to discuss.

All the feedback welcomed :)

    train_data = torch.utils.data.DataLoader(RandomDataset(32, 64))
    val_data = torch.utils.data.DataLoader(RandomDataset(32, 64))

    # model
    model = BoringModel()
    # model.layers is a sequential module that needs to be manually wrapped 
    model.layers = LightningPipeModule(
        model.layers,
        layer_partitions=[1, 1], # Puts 1 layer on each GPU
        microbatches=8 # Bubble partitioning under the hood for device utilization
    )
    accelerator = PipeAccelerator(model.layers, cluster_environment=TorchElasticEnvironment())

    trainer = Trainer(
        default_root_dir=os.getcwd(),
        max_epochs=1,
        gpus=2,
        accelerator=accelerator
    )
    trainer.fit(model, train_data, val_data)

TODO

Fix backward pass, such that non last layer rank GPUs call backward_helper rather than calling the loss
Test gradient checkpointing through Pipe
Fix evaluation with Pipe module (see How to use Pipe in evaluation mode facebookresearch/fairscale#173)

cc @ananthsub @williamFalcon

PR review

Is this pull request ready for review? (if not, please submit in draft mode)

Anyone in the community is free to review the PR once the tests have passed.
If we didn't discuss your PR in Github issues there's a high chance it will not be merged.

Did you have fun?

Make sure you had fun coding 🙃

pep8speaks · 2020-10-30T16:23:23Z

Hello @SeanNaren! Thanks for updating this PR.

There are currently no PEP 8 issues detected in this Pull Request. Cheers! 🍻

Comment last updated at 2020-11-18 15:09:15 UTC

pytorch_lightning/accelerators/accelerator_connector.py

pytorch_lightning/accelerators/pipe_accelerator.py

SeanNaren · 2020-11-01T11:25:34Z

So the code works, allowing the user to split the model across GPUs in a SPSD fashion (I've just made that hardcoded, not sure we want to worry about SPMD support).

Buuut imo it involves too much of the users input, particulary in their train/val/test step logic:

    def training_step(self, batch, batch_idx):
        output = self(batch)
        if self.trainer.accelerator_backend.final_stage:
            loss = self.loss(batch, output)
            self.log('loss', loss)
            print(loss)
            return {"loss": loss}
        else:
            self.trainer.accelerator_backend.sync_gradients(output)

    def validation_step(self, batch, batch_idx):
        output = self(batch)
        if self.trainer.accelerator_backend.final_stage:
            loss = self.loss(batch, output)
            self.log('x', loss)

    def test_step(self, batch, batch_idx):
        output = self.layers(batch)
        if self.trainer.accelerator_backend.final_stage:
            loss = self.loss(batch, output)
            self.log('y', loss)

This check is required because some GPUs are just intermediates (they only contain the portion of the model that the activations are passed downstream to other processes, no loss calculated). This happens to include GPU 0 so right now logging is a mess.

Theres' a neat refactor from @froody which will allow control from just one process, which should make things much cleaner and potentially remove the manual final stage check within the step functions (facebookresearch/fairscale#173 (comment))

tchaton

Awesome addition ! I wonder if we could block function call using self.trainer.accelerator_backend.final_stage in self.log. However, I guess sync_gradients can't be moved out as it needs output.

pytorch_lightning/accelerators/accelerator_connector.py

tchaton · 2020-11-02T09:22:21Z

pytorch_lightning/accelerators/pipe_accelerator.py

+            style=PipelineStyle.MultiProcess,
+            input_device=torch.cuda.current_device(),
+            worker_map=get_worker_map(),
+            checkpoint='never',


What is checkpoint never ?

The checkpoint argument refers to activation checkpointing, i.e. not saving grads on the initial forward pass, and then re-running the forward pass with grads enabled during the backward pass.

Will be exposed eventually of course :)

tchaton · 2020-11-02T09:25:05Z

pytorch_lightning/accelerators/pipe_accelerator.py

+            backend=rpc.BackendType.TENSORPIPE,
+            rpc_backend_options=rpc.TensorPipeRpcBackendOptions(init_method=init_method),
+        )
+        mpu.initialize_model_parallel(model_parallel_size_=1, pipeline_length=len(self.pipe_module.layer_partitions))


May the user want to modify those parameters ?

SeanNaren · 2020-11-02T14:23:36Z

Thanks for your comments @tchaton :)

I had a question around the API, since I think this is eventually going to tie to a few other important components (like standalone gradient checkpointing, and potentially even the shardedDDP stuff).

In the current API I've wrapped the sequential module in a lightning module, and defined a custom accelerator in the code:

    train_data = torch.utils.data.DataLoader(RandomDataset(32, 64))
    val_data = torch.utils.data.DataLoader(RandomDataset(32, 64))

    # model
    model = BoringModel()
    # model.layers is a sequential module that needs to be manually wrapped 
    model.layers = LightningPipeModule(
        model.layers,
        layer_partitions=[1, 1], # Puts 1 layer on each GPU
        microbatches=8 # Bubble partitioning under the hood for device utilization
    )
    accelerator = PipeAccelerator(model.layers, cluster_environment=TorchElasticEnvironment())

    trainer = Trainer(
        default_root_dir=os.getcwd(),
        max_epochs=1,
        gpus=2,
        accelerator=accelerator
    )
    trainer.fit(model, train_data, val_data)

This is meh and I'd prefer to so something like:

    train_data = torch.utils.data.DataLoader(RandomDataset(32, 64))
    val_data = torch.utils.data.DataLoader(RandomDataset(32, 64))

    # model
    model = BoringModel()
    # model.layers is a sequential module that needs to be manually wrapped 
    model.layers = LightningPipeModule(
        model.layers,
        layer_partitions=[1, 1], # Puts 1 layer on each GPU
        microbatches=8 # Bubble partitioning under the hood for device utilization
    )

    trainer = Trainer(
        default_root_dir=os.getcwd(),
        max_epochs=1,
        gpus=2,
        accelerator='ddp_pipe' # Skip initializing the accelerator beforehand!
    )
    trainer.fit(model, train_data, val_data)

Is there a way I could 'register' the pipe module for the accelerator to pick up automatically?

SeanNaren commented Oct 30, 2020

View reviewed changes

pytorch_lightning/accelerators/accelerator_connector.py Outdated Show resolved Hide resolved

SeanNaren commented Oct 30, 2020

View reviewed changes

pytorch_lightning/accelerators/pipe_accelerator.py Outdated Show resolved Hide resolved

Initial integration with fairscale Pipe

6cf6a7d

SeanNaren force-pushed the feature/4322-pipe branch from 3834ec3 to 6cf6a7d Compare November 1, 2020 11:04

tchaton reviewed Nov 2, 2020

View reviewed changes

Merge branch 'master' into feature/4322-pipe

a9fd798

SeanNaren and others added 5 commits November 2, 2020 15:12

Modified to use ddp_pipe, find module within accelerator

36083c2

Fixed import

143da3a

Initialize before init_ddp

04fff05

Merge branch 'master' into feature/4322-pipe

d2868c9

Merge branch 'master' into feature/4322-pipe

8013c0f

SeanNaren mentioned this pull request Nov 10, 2020

self-balancing architecture #50

Closed

Expose vars

4a389f3

Borda added checkpointing Related to checkpointing distributed Generic distributed-related topic feature Is an improvement or enhancement labels Nov 30, 2020

Borda requested a review from tchaton November 30, 2020 19:27

Borda added this to the 1.1 milestone Nov 30, 2020

edenlightning removed this from the 1.1 milestone Dec 8, 2020

tchaton closed this Dec 9, 2020

SeanNaren deleted the feature/4322-pipe branch December 9, 2020 14:14

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Initial integration with FairScale Pipe Module for model partitioning/gradient checkpointing #4443

Initial integration with FairScale Pipe Module for model partitioning/gradient checkpointing #4443

Uh oh!

SeanNaren commented Oct 30, 2020 •

edited

Loading

Uh oh!

pep8speaks commented Oct 30, 2020 •

edited

Loading

Uh oh!

Uh oh!

Uh oh!

SeanNaren commented Nov 1, 2020

Uh oh!

tchaton left a comment

Uh oh!

Uh oh!

tchaton Nov 2, 2020

Uh oh!

froody Nov 11, 2020

Uh oh!

SeanNaren Nov 12, 2020

Uh oh!

tchaton Nov 2, 2020

Uh oh!

SeanNaren commented Nov 2, 2020

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

6 participants

Initial integration with FairScale Pipe Module for model partitioning/gradient checkpointing #4443

Initial integration with FairScale Pipe Module for model partitioning/gradient checkpointing #4443

Uh oh!

Conversation

SeanNaren commented Oct 30, 2020 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

What does this PR do?

TODO

PR review

Did you have fun?

Uh oh!

pep8speaks commented Oct 30, 2020 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Comment last updated at 2020-11-18 15:09:15 UTC

Uh oh!

Uh oh!

Uh oh!

SeanNaren commented Nov 1, 2020

Uh oh!

tchaton left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

tchaton Nov 2, 2020

Choose a reason for hiding this comment

Uh oh!

froody Nov 11, 2020

Choose a reason for hiding this comment

Uh oh!

SeanNaren Nov 12, 2020

Choose a reason for hiding this comment

Uh oh!

tchaton Nov 2, 2020

Choose a reason for hiding this comment

Uh oh!

SeanNaren commented Nov 2, 2020

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

6 participants

SeanNaren commented Oct 30, 2020 •

edited

Loading

pep8speaks commented Oct 30, 2020 •

edited

Loading