-
Notifications
You must be signed in to change notification settings - Fork 814
Multi30k mocked testing #1554
Multi30k mocked testing #1554
Conversation
| with open(txt_file, "w") as f: | ||
| for i in range(5): | ||
| rand_string = " ".join( | ||
| random.choice(string.ascii_letters) for i in range(seed) | ||
| ) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
One thought: since all of our datasets are utf-8 files, does it make sense to write unicode strings to make sure we don't have lingering bugs from default encodings when opening files? Maybe this is overkill, but it's been a big source of bugs when I did mostly windows development.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Ya, agreed. I think it is a good suggestion!
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I will keep it as a follow-up item as generating random UTF-8 string is not trivial.
Nayef211
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Overall LGTM. Left some nit comments. Also think @erip has a valid point about writing unicode strings when we mock data. Should we do this for all our tests moving forward?
test/datasets/test_multi30k.py
Outdated
| rand_string = " ".join( | ||
| random.choice(string.ascii_letters) for i in range(seed) | ||
| ) | ||
| content = f'{rand_string}\n' |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
nit: can we use double quotes here instead of single quotes
Added this as a follow up item to the GH issue! |
Reference issue #1493