Multi30k mocked testing #1554

parmeet · 2022-01-30T01:27:29Z

Reference issue #1493

erip · 2022-01-30T13:32:08Z

test/datasets/test_multi30k.py

+        with open(txt_file, "w") as f:
+            for i in range(5):
+                rand_string = " ".join(
+                    random.choice(string.ascii_letters) for i in range(seed)
+                )


One thought: since all of our datasets are utf-8 files, does it make sense to write unicode strings to make sure we don't have lingering bugs from default encodings when opening files? Maybe this is overkill, but it's been a big source of bugs when I did mostly windows development.

Ya, agreed. I think it is a good suggestion!

I will keep it as a follow-up item as generating random UTF-8 string is not trivial.

Nayef211

Overall LGTM. Left some nit comments. Also think @erip has a valid point about writing unicode strings when we mock data. Should we do this for all our tests moving forward?

Nayef211 · 2022-01-31T16:03:21Z

test/datasets/test_multi30k.py

+                rand_string = " ".join(
+                    random.choice(string.ascii_letters) for i in range(seed)
+                )
+                content = f'{rand_string}\n'


nit: can we use double quotes here instead of single quotes

parmeet · 2022-02-02T22:36:56Z

Overall LGTM. Left some nit comments. Also think @erip has a valid point about writing unicode strings when we mock data. Should we do this for all our tests moving forward?

Yes, let's add a follow-up item here or even better if we can also track this in the issue #1493 in general.

Nayef211 · 2022-02-03T14:04:56Z

Yes, let's add a follow-up item here or even better if we can also track this in the issue #1493 in general.

Added this as a follow up item to the GH issue!

parmeet added 2 commits January 29, 2022 18:25

intermediate state

1766139

add multi30k mocked test

b3e6b15

pytorch-bot bot added the ciflow/default label Jan 30, 2022

facebook-github-bot added the cla signed label Jan 30, 2022

minor edit

c8e7224

erip reviewed Jan 30, 2022

View reviewed changes

Nayef211 approved these changes Jan 31, 2022

View reviewed changes

Nayef211 mentioned this pull request Jan 31, 2022

Revamp TorchText Dataset Testing Strategy #1493

Closed

27 tasks

fix single quote

861631a

parmeet merged commit 69825a1 into pytorch:main Feb 2, 2022

parmeet deleted the multi30k_test branch February 2, 2022 22:37

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Multi30k mocked testing #1554

Multi30k mocked testing #1554

Uh oh!

parmeet commented Jan 30, 2022

Uh oh!

erip Jan 30, 2022

Uh oh!

parmeet Feb 2, 2022

Uh oh!

parmeet Feb 2, 2022

Uh oh!

Nayef211 left a comment

Uh oh!

Nayef211 Jan 31, 2022

Uh oh!

parmeet commented Feb 2, 2022

Uh oh!

Nayef211 commented Feb 3, 2022

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

4 participants

Multi30k mocked testing #1554

Multi30k mocked testing #1554

Uh oh!

Conversation

parmeet commented Jan 30, 2022

Uh oh!

erip Jan 30, 2022

Choose a reason for hiding this comment

Uh oh!

parmeet Feb 2, 2022

Choose a reason for hiding this comment

Uh oh!

parmeet Feb 2, 2022

Choose a reason for hiding this comment

Uh oh!

Nayef211 left a comment

Choose a reason for hiding this comment

Uh oh!

Nayef211 Jan 31, 2022

Choose a reason for hiding this comment

Uh oh!

parmeet commented Feb 2, 2022

Uh oh!

Nayef211 commented Feb 3, 2022

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

4 participants