This repository was archived by the owner on Sep 10, 2025. It is now read-only.
-
Notifications
You must be signed in to change notification settings - Fork 814
[MERGE 1/2] merge main branch to fbsync
#1948
Closed
Closed
Conversation
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
* Migrating penntreebank dataset to use torchdata * Update FileLoader to FileOpener * Resolved comments about return_path * Using strip() to remove leading/trailing spaces Co-authored-by: nayef211 <[email protected]>
* Migrating enwik9 dataset to use torchdata * Added typing to params * Fixed PR comments. Updated to data_dp * Added caching for extracted files * Moved FileOpener after ondiskcache datapipe Co-authored-by: nayef211 <[email protected]>
…ytorch#1530) * add double caching for yelp polarity to speed up extracted reading. * rename dps for consistency and simplify filepath_fn * add FileOpener within caching block for more consistency.
* Migrate IMDB to datapipes * add double cache for extracted reading * update cache name
…1528) * add double caching for yahoo to speed up extracted reading. * simplify filepath_fn * rename dps for consistency. * add FileOpener within caching block for more consistency.
* Migrate WikiText2 to datapipes * Address code review comments and add double caching
* First attempt at adding test for amazon review polarity * Updated dataset to take validate_hash param. Finalized tests * Created non empty tar file * Remove formatting. Patch _hash_check method from torchdata during testing * Added super().setUpClass() * Remove commented import Co-authored-by: nayef211 <[email protected]>
* Migrating SST2 from experimental to datasets folder * Added SST2 to docs and to init file * Removing empty line from docs Co-authored-by: nayef211 <[email protected]>
* Rename amazon review polarity test * Added renamed file to git Co-authored-by: nayef211 <[email protected]>
Co-authored-by: nayef211 <[email protected]>
* Added mock test for SST2 * Remove print line * Resolving PR comments * Updated comment to say zip * updated ordering of splits in parameterization * Using zip_equal for iteration in test_sst2 Co-authored-by: nayef211 <[email protected]>
Co-authored-by: nayef211 <[email protected]>
Co-authored-by: nayef211 <[email protected]>
* migrate IWSLT2017 to datapipes. * refactor IWSLT2017 to use feedback from IWSLT2016. * remove unused import. * fix flake. * fix typo in comment. * add TODOs to IWSLT datasets. * refactor common code out of IWSLTs and convert single quotes to double. * fix typo.
…ch#1541) * Implement CLIPEncoder in C++ Add case insensitive flag to CLIP pre tokenization regex Add Python interface Bring back gpt2 Add docstring Update docs * Fix stylecheck
* mock up IWSLT2016 test for faster testing. * rename variable for consistency.
* Resolve and reemove TODOs * remove todo
…#1913) * avoid to loop through the whole counter in bleu_score method * fix bug when max_n > len(candidate) * add comment to explain L88
* add decoding capability to GPT2BPE tokenizer * use wstring_convert for all conversions * minor update to comment and string creation logic * move converter definition outside of for loop
…d avoid splitting on them (pytorch#1916) * add_special_tokens and never split features added * removed a comment and updated a type hint * added explanation and example for how this change works * move SPECIAL_TOKENS_ATTRIBUTES to utils * rebase and address latest nit comments
…ch#1927) [ghstack-poisoned]
* Fix upload channell using correct flag * Fix version extraction
This reverts commit 0026773.
* Fixed on_disk_cache issues [ghstack-poisoned] * Update on "Fixed on_disk_cache issues" Fixed issues with cache locks and cache files overwrites. Required to be compatible with meta-pytorch/data#810 [ghstack-poisoned] * Update on "Fixed on_disk_cache issues" Fixed issues with cache locks and cache files overwrites. Required to be compatible with meta-pytorch/data#810 [ghstack-poisoned] Co-authored-by: Vitaly Fedyunin <[email protected]>
* update decoding logic to handle special tokens * rebased and added example * minor refactor: moved boolean assignment outside of for loop
* Move relative_buckets Tensor to same device as relative_position * Update code pointer comments * Reference self.device from within MultiHeadedAttention private methods * Remove faulty call with device to t5 forward method * Add device to Attention obj
* Add Character Level BPE Tokenizer (pytorch#1936) Summary: Pull Request resolved: pytorch#1936 This change adds a character level BPE tokenizer to the set of available transforms. It takes a pre-trained encoder dict (i.e vocab dict) and merge list as input. It is not using C++ for encoding / decoding at this time. Reviewed By: langong347 Differential Revision: D40186470 fbshipit-source-id: 48bacc631f537e941a495e39ef9ccb17d3ef7896 * run linter * add regex to requirements and CharBPETokenizer to transforms.rst * fix docs and requirements * try to fix docstring format Co-authored-by: Roman Shraga <[email protected]>
Contributor
|
@Nayef211 has imported this pull request. If you are a Meta employee, you can view this diff on Phabricator. |
joecummings
approved these changes
Oct 19, 2022
facebook-github-bot
pushed a commit
that referenced
this pull request
Oct 19, 2022
Contributor
Author
|
Closing PR as corresponding diff was merged |
Sign up for free
to subscribe to this conversation on GitHub.
Already have an account?
Sign in.
Add this suggestion to a batch that can be applied as a single commit.
This suggestion is invalid because no changes were made to the code.
Suggestions cannot be applied while the pull request is closed.
Suggestions cannot be applied while viewing a subset of changes.
Only one suggestion per line can be applied in a batch.
Add this suggestion to a batch that can be applied as a single commit.
Applying suggestions on deleted lines is not supported.
You must change the existing code in this line in order to create a valid suggestion.
Outdated suggestions cannot be applied.
This suggestion has been applied or marked resolved.
Suggestions cannot be applied from pending reviews.
Suggestions cannot be applied on multi-line comments.
Suggestions cannot be applied while the pull request is queued to merge.
Suggestion cannot be applied right now. Please check back later.
Description
mainandfbsyncare both in sync #1949