@@ -103,7 +103,20 @@ By just changing ``device_id=0`` to ``device_id=self.trainer.local_rank`` we can
103103 return train_data
104104
105105
106- Lightning works seamlessly with all kinds of custom data iterables,
107- but unfortunately it cannot support the entire featureset with arbitrary iterables as some are specific to dataloaders.
108- These features are mainly automatic replacement of the sampler and fully fault-tolerant training as these dataloaders
109- typically don't expose sampling APIs to fast-forward or save and load states.
106+ Limitiations
107+ ------------
108+ Lightning works with all kinds of custom data iterables as shown above. There are, however, a few features that cannot
109+ be supported this way. These restrictions come from the fact that for their support,
110+ Lightning needs to know a lot on the internals of these iterables.
111+
112+ - In a distributed multi-GPU setting (ddp),
113+ Lightning automatically replaces the DataLoader's sampler with its distributed counterpart.
114+ This makes sure that each GPU sees a different part of the dataset.
115+ As sampling can be implemented in arbitrary ways with custom iterables,
116+ there is no way for Lightning to know, how to replace the sampler.
117+
118+ - When training fails for some reason, Lightning is able to extract all of the relevant data from the model,
119+ optimizers, trainer and dataloader to resume it at the exact same batch it crashed.
120+ This feature is called fault-tolerance and is limited to PyTorch DataLoaders as well as
121+ Lighning also needs to know a lot about sampling, fast forwarding and random number handling to enable this,
122+ meaning that this cannot be supported for arbitrary iterables either.
0 commit comments