-
Notifications
You must be signed in to change notification settings - Fork 3.6k
Description
The Docker jobs on our CI are very flaky and fail often, making our CI red :/
Some recent examples:
https://github.com/PyTorchLightning/pytorch-lightning/runs/3956925878
https://github.com/PyTorchLightning/pytorch-lightning/runs/3956925126
I have spent a good amount of time trying to debug them (e.g. #9676), but it is difficult since the failures come and go.
As part of #9445, we want to work towards a state where all of our CI jobs are required, which means eventually for these Docker tests we should aim to mark them as required, or delete them.
Personally I believe we should delete them, since they seem to be extremely flaky, and use up a lot of our time trying to fix, which could be better spent elsewhere. However I don't know much about the history of these tests so I'd like to hear others' opinions on this matter. What do people think is the best course of action here?