-
Couldn't load subscription status.
- Fork 7.2k
Open
Description
The prototype datasets change the interface in two ways:
- The input parameters are a little different in some cases. For example in the current API the MNIST dataset would be instantiated with
datasets.MNIST(..., train=True)whereas now it looks likedatasets.load("mnist", split="train"). - The output is completely different. Before we returned a tuple (sometimes of varying length) whereas now we always return a dictionary. Furthermore, before we used
PILimages andnumpyarrays as return types, whereas now we always use tensor subclasses.
To lower the burden to move to the new style datasets a little, we could have a legacy: bool = False keyword argument on datasets.load(). For all datasets that have a legacy variant, we could simply implement two functions that map the input and output. If a dataset doesn't support a legacy variant, we could simply error out.