-
Notifications
You must be signed in to change notification settings - Fork 814
Add CNN-DM dataset to torchtext #1789
Conversation
…into feature/add-cnndm
Convert cnndm output to datapipes
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Overall the implementation looks great! Some followup items:
-
Update the PR description
-
Add a section in the description for followup items with a checklist of features that have yet to be implemented. We can then add a link to followup PRs that implement those features and check them off (similar to #1542)
-
Add unittests for the dataset
-
Let's try to remove new lines unless they add to readability of code. Here's what the guidance from pep8 style guide suggests
Extra blank lines may be used (sparingly) to separate groups of related functions. Blank lines may be omitted between a bunch of related one-liners (e.g. a set of dummy implementations).
Use blank lines in functions, sparingly, to indicate logical sections.
Nayef211
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM! Thanks for addressing all the PR comments. Let's wait for a review from @parmeet before we merge this in. 😄
|
Thanks @pmabbo13 for adding this complex dataset to the repo. Overall it looks great. Just left few minor comments, but In general I think we should be good to land as such. |
Description
Add CNNDM dataset to TorchText using TorchData datapipes
Process
Testing
Create mock dataset that mimics the format of raw CNNDM files and ensure that the datapipe yields the correct output on this mock dataset
pytest test/datasets/test_cnndm.pyFollow-Up Items