-
Notifications
You must be signed in to change notification settings - Fork 814
Update annotation types of transforms and functionals #1453
Conversation
Codecov Report
@@ Coverage Diff @@
## main #1453 +/- ##
==========================================
- Coverage 86.35% 85.67% -0.68%
==========================================
Files 58 58
Lines 2220 2262 +42
==========================================
+ Hits 1917 1938 +21
- Misses 303 324 +21
Continue to review full report at Codecov.
|
mthrok
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Looks good. I learned that TS compiler can handle Any. Thanks.
docs/source/transforms.rst
Outdated
| .. automethod:: forward | ||
|
|
||
| Truncate | ||
| ------------ |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
| ------------ | |
| -------- |
ReST expects the length of line to match the title.
| self.max_seq_len = max_seq_len | ||
|
|
||
| self.token_transform = transforms.SentencePieceTokenizer(spm_model_path) | ||
| self.tokenizer = transforms.SentencePieceTokenizer(spm_model_path) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Technically speaking, I think, this is BC-breaking change. The dict_state dumped from the previous version will stop working. Even if the BC-breaking is not an issue, as a good practice I recommend to split name change into a different PR, considering the possibility such as cherry-picking at minor release or something.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks for raising this up. I will keep change it back to original and do the naming change in separate PR.
torchtext/transforms.py
Outdated
| return self._label_names | ||
|
|
||
|
|
||
| class Truncate(Module): |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The implementation LGTM.
However, since this new transform is not used by the other change, I think that the addition of Truncate can be in a separate, self-contained PR. Splitting it will give better UX when users reading release note and looking into the commit.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Again good suggestion. Actually this whole idea of changing to Any became suddenly apparent after I started adding this transform. And then I just kept all the changes :). I will do another PR to add Truncate and remove it here.
Apparently yes. I was also pleasantly surprise when I closely looked at their example https://pytorch.org/docs/stable/generated/torch.jit.isinstance.html :) |
|
Thanks @mthrok for the quick review and valuable feedback. I am going to break this PR in few. More importantly, I will keep this PR to only update the annotation types of existing transforms and add doc-strings. I will update the summary accordingly. |
Summary:
AnyTruncatetransformProblem when transforms input and outputs are type annotated
The transforms and functionals support multiple styles of inputs (both batched and non-batched are supported) as well as sometimes they may support multiple types (refer to Truncate transform added in this PR that can work with both
intandstrprimitive types). Composability of these transforms and support for torch scriptability may not go hand-in-hand. Below code snippets shows couple of these issues.Solution
By changing the input and output type to
Anyand using type refinement (using torch.jit.instance) in primitive transforms, the above problem can be alleviated. As a nice side-benefit, this would force the developer to always check the input types while implementing primitive transforms.Other concerns