-
Notifications
You must be signed in to change notification settings - Fork 7.2k
[POC] Base class for dataset tests #3402
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
11773da to
e85f996
Compare
|
I suspect the tests fail on Windows, because |
Codecov Report
@@ Coverage Diff @@
## master #3402 +/- ##
==========================================
+ Coverage 74.83% 75.13% +0.30%
==========================================
Files 105 105
Lines 9722 9722
Branches 1563 1563
==========================================
+ Hits 7275 7305 +30
+ Misses 1960 1930 -30
Partials 487 487
Continue to review full report at Codecov.
|
fmassa
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This looks great!
I have a few comments, specially about the handling of PIL opened images on Windows, let me know what you think.
About your questions:
1 - I might be missing something here, but why not leave those to be available in the config?
2 - Let's leave transform testing out for now. There might be a way to test that the transform is called without having to let the user specify it. Maybe by creating dummy transformer classes that return a specific dummy value, and checking the presence of this dummy value in the returned dataset.
I think that would be only reasonable if there is a good and independent default value: vision/torchvision/datasets/mnist.py Line 263 in c1f85d3
The datasets I mentioned require an additional directory where the assumptions cannot be met. The annotation directory should be inside temporary directory which is only created shortly before vision/torchvision/datasets/ucf101.py Line 58 in c1f85d3
Thus if the annotation directory is inside In most cases the temporary directory passed to
So instead of checking if the returned values are correct you only want to check if the |
fmassa
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM, thanks!
Summary: * add base class for datasets tests * add better type hints * add documentation to subclasses * add utility functions to create files / folders of random images and videos * fix imports * remove class properties * fix smoke test * fix type hints * fix random size generation * add Caltech256 as example * add utility function to create grid of combinations * add CIFAR100? as example * lint * add missing import * improve documentation * create 1 frame videos by default * remove obsolete check * return path of files created with utility functions * [test] close PIL file handles before deletion * fix video folder creation * generalize file handle closing * fix lazy imports * add test for transforms * fix explanation comment * lint * force load opened PIL images * lint * copy default config to avoid inplace modification * enable additional arg forwarding Reviewed By: NicolasHug Differential Revision: D26605320 fbshipit-source-id: c54903ea5a6b57ddfa57c560e579e0834cf0cc11
Proof-of-concept for #3375.
We provide the fully-documented
DatasetTestCaseclass that needs to be subclassed for each individual dataset. In most cases the user only has to configure theDATASET_CLASSto indicate which dataset should be tested and supply fake data within theinject_fake_datamethod. I've added tests forCaltech256,CIFAR10, andCIFAR100to showcase this.The user can add additional
CONFIGSthat are all tested instead of only the standard configuration. This can be used to test if the dataset works as intended for different splits or other parameter combinations.Two things we need to discuss:
Some datasets require additional arguments without default values. For example
vision/torchvision/datasets/ucf101.py
Line 46 in 61e00d5
vision/torchvision/datasets/coco.py
Lines 49 to 52 in 61e00d5
We need an interface for the user to be able to provide them. My idea was to put this into
inject_fake_data. In addition to theinfodict, it could return a sequence of arguments passed to the constructor. If not returned, we could simply assume that onlyrootshould be passed, making this optional.As of now, no tests for the
(target_)transforms?is implemented. Writing the test is trivial, but we need a way to indicate which transform is applied to which output argument. Most of the timetransformis applied to the first,target_transformto the second, andtransformsto the first two. But there are exceptions to this:vision/torchvision/datasets/phototour.py
Lines 106 to 109 in 61e00d5
vision/torchvision/datasets/ucf101.py
Lines 108 to 111 in 61e00d5
My idea is to provide
TRANSFORM_IDCSandTARGET_TRANSFORM_IDCSclass attributes (TRANSFORMS_IDCSis not necessary as it can be combined from the other two). These would be mandatory, but could have good default values in(Image|Video)DatasetTestCase.