Skip to content

Docker tests consistently flaky and failing #10060

@daniellepintz

Description

@daniellepintz

The Docker jobs on our CI are very flaky and fail often, making our CI red :/
Some recent examples:
https://github.com/PyTorchLightning/pytorch-lightning/runs/3956925878
https://github.com/PyTorchLightning/pytorch-lightning/runs/3956925126

I have spent a good amount of time trying to debug them (e.g. #9676), but it is difficult since the failures come and go.

As part of #9445, we want to work towards a state where all of our CI jobs are required, which means eventually for these Docker tests we should aim to mark them as required, or delete them.

Personally I believe we should delete them, since they seem to be extremely flaky, and use up a lot of our time trying to fix, which could be better spent elsewhere. However I don't know much about the history of these tests so I'd like to hear others' opinions on this matter. What do people think is the best course of action here?

cc @carmocca @akihironitta @Borda

Metadata

Metadata

Assignees

No one assigned

    Labels

    ciContinuous Integrationwon't fixThis will not be worked on

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions