Fix IWSLT2016 testing #1585

erip · 2022-02-05T14:10:09Z

This serves as a patch to the newly-added IWSLT2016 mock testing which addresses two issues:

Starting from the downloaded archive to test the extraction and cleaning pipeline more fully
Adds missed test split testing path

test/datasets/test_iwslt2016.py

erip · 2022-02-06T15:22:57Z

test/datasets/test_iwslt2016.py

        super().tearDownClass()

-    @parameterized.expand([("train", "de", "en"), ("valid", "de", "en")])
+    @parameterized.expand([


TODO: can we (should we?) parameterize by each supported lang pair?

For complete code coverage I think it would make sense to parameterize by each supported lang pair. I believe we could use the nested_params fn to generate the cartesian product of language pairs. The main issue I forsee with using that function is it would generate lang pairs that map to the same language (i.e en ->en) which I'm not sure would make sense for this dataset.

@parmeet also wanted to get your thoughts on this

Yes, I would agree. Simulating the exact structure as the original dataset with all the possible use-cases would be the key for coverage.

Nayef211

I think overall this PR LGTM. @parmeet can I also get your review since you reviewed the initial IWSLT PR #1563

test/datasets/test_iwslt2016.py

Nayef211 · 2022-02-07T15:18:49Z

test/datasets/test_iwslt2016.py

        super().tearDownClass()

-    @parameterized.expand([("train", "de", "en"), ("valid", "de", "en")])
+    @parameterized.expand([


For complete code coverage I think it would make sense to parameterize by each supported lang pair. I believe we could use the nested_params fn to generate the cartesian product of language pairs. The main issue I forsee with using that function is it would generate lang pairs that map to the same language (i.e en ->en) which I'm not sure would make sense for this dataset.

parmeet · 2022-02-07T20:22:20Z

There are test failures related to this test. @erip could you please fix them and then I can follow-up with the review. Thanks!

erip · 2022-02-07T21:11:36Z

It looks like my attempt at addressing the review feedback has broken the tests and unfortunately I can't test locally because the required nightly isn't available on OS X... I'll try to address this ASAP.

Nayef211 · 2022-02-07T21:43:17Z

It looks like my attempt at addressing the review feedback has broken the tests and unfortunately I can't test locally because the required nightly isn't available on OS X... I'll try to address this ASAP.

Thanks for all your help with the testing efforts @erip! I just want to see if you would have the bandwidth to add the IWSLT2017 test as well once this PR is merged? It should be a simple matter of parameterizing this test to make it work with both IWSLT2016 and IWSLT2017.

The torchtext branch-cut is coming up this Thursday and we want to make sure all our tests are complete by this Wednesay!

erip · 2022-02-07T21:44:06Z

Yes, absolutely!

erip · 2022-02-07T23:30:41Z

to generate the cartesian product of language pairs

IWSLT16 doesn't support every langpair direction, so we would need to filter some pairs out. At that point, it might just make more sense to enumerate the pairs directly (with some logic for swapping the direction as needed). Thoughts?

parmeet · 2022-02-07T23:35:43Z

IWSLT16 doesn't support every langpair direction, so we would need to filter some pairs out. At that point, it might just make more sense to enumerate the pairs directly (with some logic for swapping the direction as needed). Thoughts?

yupp, I agree. The total number of supported pairs are reasonable enough to enumerate explicitly.

erip · 2022-02-07T23:42:12Z

@parmeet the middle ground seems to be leveraging the SUPPORTED_DATASETS attribute which gives the adjacency matrix for langpairs. I've incorporated that into testing but as I mention, the turnaround is a bit slow since I can't test locally ☹️

Nayef211 · 2022-02-08T02:49:24Z

I can't test locally because the required nightly isn't available on OS X... I'll try to address this ASAP.

@erip just wondering if you're talking about the torchtext nightly? Because we can build any of the pytorch/torchtext packages directly from source right? Also I'm happy to patch this PR to test your changes in a linux environment and give you feedback on any failures if you think that would be helpful!

erip · 2022-02-08T03:18:34Z

No, it's pytorch nightly. I'm running into an error about datapipes_only not being a valid kwarg -- the same one that was causing CI errors in Windows. I have been trying to install fresh nightlies for a couple of days but no joy.

erip · 2022-02-08T12:52:26Z

OK, a new nightly has been cut so I can test locally again 🥳 I have fixed one lingering issue and have a TODO to pass the right test sets... will follow a similar approach as langpair parameterization.

erip · 2022-02-08T14:39:49Z

OK @parmeet @Nayef211 this is ready for review. The one thing I couldn't "figure out" was the "temp_dataset_dir" business. I think maybe that won't work here since we're looking for a very specific internal structure, but if you have thoughts about how to approach it I can try to add it.

…age.

…ort langpairs.

…xpectations match.

erip · 2022-02-08T16:01:44Z

I've rebased so the previously broken Multi30k should be good here. Tests are passing and life is good. 😎

codecov · 2022-02-08T16:26:11Z

Codecov Report

Merging #1585 (91e2cf2) into main (da34de2) will increase coverage by 0.62%.
The diff coverage is n/a.

@@            Coverage Diff             @@
##             main    #1585      +/-   ##
==========================================
+ Coverage   80.34%   80.96%   +0.62%     
==========================================
  Files          58       58              
  Lines        2569     2569              
==========================================
+ Hits         2064     2080      +16     
+ Misses        505      489      -16

Impacted Files	Coverage Δ
torchtext/data/datasets_utils.py	`64.74% <0.00%> (+5.75%)`	⬆️

Continue to review full report at Codecov.

Legend - Click here to learn more
Δ = absolute <relative> (impact), ø = not affected, ? = missing data
Powered by Codecov. Last update da34de2...91e2cf2. Read the comment docs.

parmeet · 2022-02-08T16:41:50Z

test/datasets/test_iwslt2016.py

+        root_dir, "IWSLT2016", "2016-01.tgz"
+    )
+    with tarfile.open(outer_temp_dataset_path, "w:gz") as tar:
+        tar.add(outer_temp_dataset_dir, arcname="2016-01")


Before adding outer_temp_dataset_dir to the archive, we probably want to remove inner_temp_dataset_dir since the contents of this folder are already available in the inner_compressed_dataset_path, right?

We could approach it that way -- my approach is to just re-read the cleaned files to generate the expected values. I don't know if there's a strong reason to prefer one vs. the other.

ya, we probably want to match the extraction process as if we are working with the original dataset archive. The code-base would be extracting and caching files from inner tarball sayde-en.tgz and if the files are already present in top-most archive 2016-01.tgz then we won't be executing the inner extraction (due to secondary caching logic introduced in this issue #1589). This is certainly a complex test/dataset, let me know if I am missing anything :)

Nayef211

Just left some errors I found while patching your PR locally and removing the outer temp dir. Would you be able to investigate further @erip?

Nayef211 · 2022-02-08T17:08:32Z

test/datasets/test_iwslt2016.py

    """
-    temp_dataset_dir = os.path.join(root_dir, f"IWSLT2016/2016-01/texts/{src}/{tgt}/{src}-{tgt}/")
-    os.makedirs(temp_dataset_dir, exist_ok=True)
+    outer_temp_dataset_dir = os.path.join(root_dir, f"IWSLT2016/2016-01/texts/{src}/{tgt}/")


If you change this path to be "...IWSLT2016/temp_dataset_dir/2016-01/..." you will notice that there are a lot of test failures. The purpose of naming it this is to ensure that the folder we are creating the mocked data is different from where the files/folders would be place when the archive is extracted (i.e. when calling the dataset). Without doing this, the archive will never be extracted since our dataset implementation checks to see whether a file/folder exists and if it does, the archive is never extracted.

When patching this PR locally and making the above change, I noticed we're getting 212 failed, 451 passed. This is a sample error I'm getting from pytest:

FAILED test_iwslt2016.py::TestIWSLT2016::test_iwslt2016_219_train - AssertionError: 'H T G W w U w K b d\n' != 'b H K H c L t k I s\n'

Oh, I understand the idea now. 👍 Let me give this a shot.

Nayef211 · 2022-02-08T17:12:57Z

test/datasets/test_iwslt2016.py

+    )
+    with tarfile.open(outer_temp_dataset_path, "w:gz") as tar:
+        tar.add(outer_temp_dataset_dir, arcname="2016-01")



Another alternative to what I mentioned above is to add the following lines here to delete the outer temp dataset dir and in doing so ensure the archive will always be extracted:

import shutil shutil.rmtree(outer_temp_dataset_dir)

Doing this gives me the following results: 110 failed, 553 passed. This is a sample error I'm getting from pytest:

FAILED test_iwslt2016.py::TestIWSLT2016::test_iwslt2016_008_train - ValueError: Iterables have different lengths

erip · 2022-02-08T19:13:56Z

I'm having some trouble wrapping my mind around all the various nestings again and debugging all the issues accompanying it. I also have some personal things to attend to -- I understand there's some urgency here, so I'm not going to be offended if someone else wants to work on this. I can try to look again tonight.

parmeet · 2022-02-08T20:59:32Z

I'm having some trouble wrapping my mind around all the various nestings again and debugging all the issues accompanying it. I also have some personal things to attend to -- I understand there's some urgency here, so I'm not going to be offended if someone else wants to work on this. I can try to look again tonight.

Hey @erip, thanks so much for your efforts in this PR so far. Please do not bother. I know this is a complex test and since this is an improvement over what you have already contributed here #1563, I don't necessarily call it a blocker for our branch-cut. I would be happy to take it over from here, otherwise we can keep this PR and do a cherry-picking into the Release branch later. cc: @Nayef211

Nayef211 · 2022-02-09T00:05:51Z

Hey @erip, thanks so much for your efforts in this PR so far. Please do not bother. I know this is a complex test and since this is an improvement over what you have already contributed here #1563, I don't necessarily call it a blocker for our branch-cut. I would be happy to take it over from here, otherwise we can keep this PR and do a cherry-picking into the Release branch later. cc: @Nayef211

@parmeet if you have the bandwidth to take this on that would be great! We just have the test for the IWSLT2016 and IWSLT2017 datasets remaining before we can close #1493. I suspect the current caching issue should be an easy fix if you have a good understanding of the folder structure within the dataset archive.

parmeet · 2022-02-09T03:51:31Z

Hey @erip, thanks so much for your efforts in this PR so far. Please do not bother. I know this is a complex test and since this is an improvement over what you have already contributed here #1563, I don't necessarily call it a blocker for our branch-cut. I would be happy to take it over from here, otherwise we can keep this PR and do a cherry-picking into the Release branch later. cc: @Nayef211

@parmeet if you have the bandwidth to take this on that would be great! We just have the test for the IWSLT2016 and IWSLT2017 datasets remaining before we can close #1493. I suspect the current caching issue should be an easy fix if you have a good understanding of the folder structure within the dataset archive.

sure, let me give it a try.

Nayef211 · 2022-02-10T19:53:42Z

Closing this now that #1596 is merged

pytorch-bot bot added the ciflow/default label Feb 5, 2022

facebook-github-bot added the cla signed label Feb 5, 2022

erip commented Feb 6, 2022

View reviewed changes

test/datasets/test_iwslt2016.py Outdated Show resolved Hide resolved

erip commented Feb 6, 2022

View reviewed changes

test/datasets/test_iwslt2016.py Outdated Show resolved Hide resolved

erip commented Feb 6, 2022

View reviewed changes

Nayef211 reviewed Feb 7, 2022

View reviewed changes

erip force-pushed the hotfix/iwslt2016-test branch from 7be1b68 to 417ee55 Compare February 7, 2022 16:30

erip added 12 commits February 8, 2022 11:00

start from outermost tar for consistency and better testing.

c7c5dbe

fix bug with inner path location and add test splits for better cover…

25b10dc

…age.

fix comment.

50f9f7e

incorporate feedback from review.

dbd99b4

revert temp_dataset_dir temporarily but add test support for all supp…

d502709

…ort langpairs.

fix flake.

2fb3275

expand split, src, and tgt appropriately.

c4df43e

pass langpair to constructor so appropriate files are searched.

9ae1fd3

parameterize dev and test sets.

6156ffe

fix flake.

d242b4f

refactor logic so we read previously-cleaned files if they exist so e…

264a298

…xpectations match.

revert experiment which uses triple as a key since it is unnecessary.

91e2cf2

erip force-pushed the hotfix/iwslt2016-test branch from 5242134 to 91e2cf2 Compare February 8, 2022 16:00

parmeet reviewed Feb 8, 2022

View reviewed changes

Nayef211 reviewed Feb 8, 2022

View reviewed changes

Nayef211 mentioned this pull request Feb 8, 2022

Revamp TorchText Dataset Testing Strategy #1493

Closed

27 tasks

This was referenced Feb 9, 2022

IWSLT testing to start from compressed file #1596

Merged

Add Mock test for IWSLT2017 dataset #1598

Merged

Nayef211 closed this Feb 10, 2022

erip deleted the hotfix/iwslt2016-test branch February 10, 2022 20:08

Fix IWSLT2016 testing #1585

Fix IWSLT2016 testing #1585

Uh oh!

Conversation

erip commented Feb 5, 2022

Uh oh!

Uh oh!

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Nayef211 left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Choose a reason for hiding this comment

Uh oh!

parmeet commented Feb 7, 2022

Uh oh!

erip commented Feb 7, 2022

Uh oh!

Nayef211 commented Feb 7, 2022

Uh oh!

erip commented Feb 7, 2022

Uh oh!

erip commented Feb 7, 2022

Uh oh!

parmeet commented Feb 7, 2022

Uh oh!

erip commented Feb 7, 2022

Uh oh!

Nayef211 commented Feb 8, 2022

Uh oh!

erip commented Feb 8, 2022

Uh oh!

erip commented Feb 8, 2022

Uh oh!

erip commented Feb 8, 2022

Uh oh!

erip commented Feb 8, 2022

Uh oh!

codecov bot commented Feb 8, 2022 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Codecov Report

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Nayef211 left a comment

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

erip commented Feb 8, 2022

Uh oh!

parmeet commented Feb 8, 2022

Uh oh!

Nayef211 commented Feb 9, 2022

Uh oh!

parmeet commented Feb 9, 2022

Uh oh!

Nayef211 commented Feb 10, 2022

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

codecov bot commented Feb 8, 2022 •

edited

Loading