Make Transformer tolerate missing layers for PP #322

wconstab · 2024-05-10T23:19:16Z

Stack from ghstack (oldest at bottom):

A few small changes here lets manual PP frontend 'reconfigure' a whole
transformer model to a stage's portion simply by setting undesired
layers to None (in cases of top level layers) or deleting them from the
ModuleDict (for 'layers.*').

These changes don't impact the FQNs of the remaining layers, which is
critical for checkpoint load/save compatibility.

[ghstack-poisoned]

A few small changes here lets manual PP frontend 'reconfigure' a whole transformer model to a stage's portion simply by setting undesired layers to None (in cases of top level layers) or deleting them from the ModuleDict (for 'layers.*'). These changes don't impact the FQNs of the remaining layers, which is critical for checkpoint load/save compatibility. ghstack-source-id: 48a7aaf Pull Request resolved: #322

tianyu-l · 2024-05-11T00:55:02Z

torchtitan/models/llama/model.py

+        h = self.tok_embeddings(tokens) if self.tok_embeddings else tokens

-        for layer in self.layers:
+        for layer in self.layers.values():


Is order still respected after switching to dict? If not, we need to sort the layers based on int(key).

yea, it is. https://pytorch.org/docs/stable/generated/torch.nn.ModuleDict.html

fegin · 2024-05-13T17:18:31Z

Nice. But it is less intuitive than I originally thought. Especially the int/str conversion part. Not sure if that's a best UX for pippy or a customized PipelineModuleList will be easier for users.

wanchaol

lgtm!

A few small changes here lets manual PP frontend 'reconfigure' a whole transformer model to a stage's portion simply by setting undesired layers to None (in cases of top level layers) or deleting them from the ModuleDict (for 'layers.*'). These changes don't impact the FQNs of the remaining layers, which is critical for checkpoint load/save compatibility. ghstack-source-id: 48a7aaf Pull Request resolved: #322

awgu · 2024-05-14T10:30:05Z

torchtitan/models/llama/model.py

+        self.layers = torch.nn.ModuleDict()
        for layer_id in range(model_args.n_layers):
-            self.layers.append(TransformerBlock(layer_id, model_args))
+            self.layers[str(layer_id)] = TransformerBlock(layer_id, model_args)


curious why do the dict keys have to be str (as opposed to int directly)?

awgu · 2024-06-29T16:03:41Z

One downside to using ModuleDict is that now the model print does not collapse TransformerBlocks together, making the model print very long.

A few small changes here lets manual PP frontend 'reconfigure' a whole transformer model to a stage's portion simply by setting undesired layers to None (in cases of top level layers) or deleting them from the ModuleDict (for 'layers.*'). These changes don't impact the FQNs of the remaining layers, which is critical for checkpoint load/save compatibility. ghstack-source-id: 48a7aaf Pull Request resolved: pytorch#322

Update

3fd7d4a

[ghstack-poisoned]

wconstab mentioned this pull request May 10, 2024

Refactor freqs_cis slice to be safer for PP #321

Merged

facebook-github-bot added the CLA Signed This label is managed by the Meta Open Source bot. label May 10, 2024

wconstab requested review from tianyu-l and wanchaol May 10, 2024 23:48

wconstab mentioned this pull request May 10, 2024

Add Pipeline Parallel (and 2D PP+FSDP) support #318

Merged

tianyu-l reviewed May 11, 2024

View reviewed changes

fegin approved these changes May 13, 2024

View reviewed changes

wanchaol approved these changes May 13, 2024

View reviewed changes

wconstab merged commit 3fd7d4a into gh/wconstab/14/base May 13, 2024

wconstab deleted the gh/wconstab/14/head branch May 13, 2024 21:46

awgu reviewed May 14, 2024

View reviewed changes

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Make Transformer tolerate missing layers for PP #322

Make Transformer tolerate missing layers for PP #322

Uh oh!

wconstab commented May 10, 2024 •

edited

Loading

Uh oh!

tianyu-l May 11, 2024 •

edited

Loading

Uh oh!

wconstab May 11, 2024

Uh oh!

fegin commented May 13, 2024

Uh oh!

wanchaol left a comment

Uh oh!

awgu May 14, 2024

Uh oh!

awgu commented Jun 29, 2024

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

7 participants

Make Transformer tolerate missing layers for PP #322

Make Transformer tolerate missing layers for PP #322

Uh oh!

Conversation

wconstab commented May 10, 2024 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

tianyu-l May 11, 2024 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

wconstab May 11, 2024

Choose a reason for hiding this comment

Uh oh!

fegin commented May 13, 2024

Uh oh!

wanchaol left a comment

Choose a reason for hiding this comment

Uh oh!

awgu May 14, 2024

Choose a reason for hiding this comment

Uh oh!

awgu commented Jun 29, 2024

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

7 participants

wconstab commented May 10, 2024 •

edited

Loading

tianyu-l May 11, 2024 •

edited

Loading