Update on "[llama-mm] Add export-friendly tile position embedding"

larryliu0820 · larryliu0820 · commit b0d9e3fdaa5b · 2024-11-05T12:02:19.000-08:00
Summary:

Before we make a decision on whether torchtune takes this
export-friendly version of `TilePositionEmbedding`, we put it under
`extension/llm` so that users can start to use it.

Added unit tests to make sure the behavior is the same as the reference
implementation in torchtune and export/AOTI/ET all working properly.

Test Plan:

Reviewers:

Subscribers:

Tasks:

Tags:

[ghstack-poisoned]
diff --git a/extension/llm/modules/_position_embeddings.py b/extension/llm/modules/_position_embeddings.py
@@ -188,10 +188,13 @@ def forward(self, x: torch.Tensor, aspect_ratio: torch.Tensor) -> torch.Tensor:
             torch._check(n_tiles_w >= 1)
             torch._check(n_tiles_h <= self.max_num_tiles)
             torch._check(n_tiles_w <= self.max_num_tiles)
+            # TODO: Remove this once pytorch/pytorch#120288 is fixed
             padded_embedding = F.pad(self.embedding, (0, 0, 0, 0, 0, 1, 0, 1))
             pos_embed = padded_embedding[:n_tiles_h, :n_tiles_w, :, :]
 
-            # Add pos encoding to the non padded tiles.
+            # We need to do a clone here in order to make this model export
+            # friendly as the reshape is collapsing dim 0 and dim 1 into a
+            # single dim.
             pos_embed = pos_embed.clone()
             pos_embed = pos_embed.reshape(n_non_padded_tiles, 1, self.embed_dim)