-
Notifications
You must be signed in to change notification settings - Fork 814
Fix IWSLT2016 testing #1585
Fix IWSLT2016 testing #1585
Conversation
| super().tearDownClass() | ||
|
|
||
| @parameterized.expand([("train", "de", "en"), ("valid", "de", "en")]) | ||
| @parameterized.expand([ |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
TODO: can we (should we?) parameterize by each supported lang pair?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
For complete code coverage I think it would make sense to parameterize by each supported lang pair. I believe we could use the nested_params fn to generate the cartesian product of language pairs. The main issue I forsee with using that function is it would generate lang pairs that map to the same language (i.e en ->en) which I'm not sure would make sense for this dataset.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@parmeet also wanted to get your thoughts on this
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yes, I would agree. Simulating the exact structure as the original dataset with all the possible use-cases would be the key for coverage.
Nayef211
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
| super().tearDownClass() | ||
|
|
||
| @parameterized.expand([("train", "de", "en"), ("valid", "de", "en")]) | ||
| @parameterized.expand([ |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
For complete code coverage I think it would make sense to parameterize by each supported lang pair. I believe we could use the nested_params fn to generate the cartesian product of language pairs. The main issue I forsee with using that function is it would generate lang pairs that map to the same language (i.e en ->en) which I'm not sure would make sense for this dataset.
7be1b68 to
417ee55
Compare
|
There are test failures related to this test. @erip could you please fix them and then I can follow-up with the review. Thanks! |
|
It looks like my attempt at addressing the review feedback has broken the tests and unfortunately I can't test locally because the required nightly isn't available on OS X... I'll try to address this ASAP. |
Thanks for all your help with the testing efforts @erip! I just want to see if you would have the bandwidth to add the The |
|
Yes, absolutely! |
IWSLT16 doesn't support every langpair direction, so we would need to filter some pairs out. At that point, it might just make more sense to enumerate the pairs directly (with some logic for swapping the direction as needed). Thoughts? |
yupp, I agree. The total number of supported pairs are reasonable enough to enumerate explicitly. |
|
@parmeet the middle ground seems to be leveraging the |
@erip just wondering if you're talking about the torchtext nightly? Because we can build any of the pytorch/torchtext packages directly from source right? Also I'm happy to patch this PR to test your changes in a linux environment and give you feedback on any failures if you think that would be helpful! |
|
No, it's pytorch nightly. I'm running into an error about datapipes_only not being a valid kwarg -- the same one that was causing CI errors in Windows. I have been trying to install fresh nightlies for a couple of days but no joy. |
|
OK, a new nightly has been cut so I can test locally again 🥳 I have fixed one lingering issue and have a TODO to pass the right test sets... will follow a similar approach as langpair parameterization. |
…xpectations match.
5242134 to
91e2cf2
Compare
|
I've rebased so the previously broken Multi30k should be good here. Tests are passing and life is good. 😎 |
Codecov Report
@@ Coverage Diff @@
## main #1585 +/- ##
==========================================
+ Coverage 80.34% 80.96% +0.62%
==========================================
Files 58 58
Lines 2569 2569
==========================================
+ Hits 2064 2080 +16
+ Misses 505 489 -16
Continue to review full report at Codecov.
|
| root_dir, "IWSLT2016", "2016-01.tgz" | ||
| ) | ||
| with tarfile.open(outer_temp_dataset_path, "w:gz") as tar: | ||
| tar.add(outer_temp_dataset_dir, arcname="2016-01") |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Before adding outer_temp_dataset_dir to the archive, we probably want to remove inner_temp_dataset_dir since the contents of this folder are already available in the inner_compressed_dataset_path, right?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We could approach it that way -- my approach is to just re-read the cleaned files to generate the expected values. I don't know if there's a strong reason to prefer one vs. the other.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
ya, we probably want to match the extraction process as if we are working with the original dataset archive. The code-base would be extracting and caching files from inner tarball sayde-en.tgz and if the files are already present in top-most archive 2016-01.tgz then we won't be executing the inner extraction (due to secondary caching logic introduced in this issue #1589). This is certainly a complex test/dataset, let me know if I am missing anything :)
Nayef211
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Just left some errors I found while patching your PR locally and removing the outer temp dir. Would you be able to investigate further @erip?
| """ | ||
| temp_dataset_dir = os.path.join(root_dir, f"IWSLT2016/2016-01/texts/{src}/{tgt}/{src}-{tgt}/") | ||
| os.makedirs(temp_dataset_dir, exist_ok=True) | ||
| outer_temp_dataset_dir = os.path.join(root_dir, f"IWSLT2016/2016-01/texts/{src}/{tgt}/") |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
If you change this path to be "...IWSLT2016/temp_dataset_dir/2016-01/..." you will notice that there are a lot of test failures. The purpose of naming it this is to ensure that the folder we are creating the mocked data is different from where the files/folders would be place when the archive is extracted (i.e. when calling the dataset). Without doing this, the archive will never be extracted since our dataset implementation checks to see whether a file/folder exists and if it does, the archive is never extracted.
When patching this PR locally and making the above change, I noticed we're getting 212 failed, 451 passed. This is a sample error I'm getting from pytest:
FAILED test_iwslt2016.py::TestIWSLT2016::test_iwslt2016_219_train - AssertionError: 'H T G W w U w K b d\n' != 'b H K H c L t k I s\n'
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Oh, I understand the idea now. 👍 Let me give this a shot.
| ) | ||
| with tarfile.open(outer_temp_dataset_path, "w:gz") as tar: | ||
| tar.add(outer_temp_dataset_dir, arcname="2016-01") | ||
|
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Another alternative to what I mentioned above is to add the following lines here to delete the outer temp dataset dir and in doing so ensure the archive will always be extracted:
import shutil
shutil.rmtree(outer_temp_dataset_dir)
Doing this gives me the following results: 110 failed, 553 passed. This is a sample error I'm getting from pytest:
FAILED test_iwslt2016.py::TestIWSLT2016::test_iwslt2016_008_train - ValueError: Iterables have different lengths
|
I'm having some trouble wrapping my mind around all the various nestings again and debugging all the issues accompanying it. I also have some personal things to attend to -- I understand there's some urgency here, so I'm not going to be offended if someone else wants to work on this. I can try to look again tonight. |
Hey @erip, thanks so much for your efforts in this PR so far. Please do not bother. I know this is a complex test and since this is an improvement over what you have already contributed here #1563, I don't necessarily call it a blocker for our branch-cut. I would be happy to take it over from here, otherwise we can keep this PR and do a cherry-picking into the Release branch later. cc: @Nayef211 |
@parmeet if you have the bandwidth to take this on that would be great! We just have the test for the IWSLT2016 and IWSLT2017 datasets remaining before we can close #1493. I suspect the current caching issue should be an easy fix if you have a good understanding of the folder structure within the dataset archive. |
sure, let me give it a try. |
|
Closing this now that #1596 is merged |
This serves as a patch to the newly-added IWSLT2016 mock testing which addresses two issues:
cc @parmeet