Skip to content

Conversation

@carmocca
Copy link
Contributor

@carmocca carmocca commented Oct 27, 2021

What does this PR do?

The 0.4.1 release breaks our tests.

Even though this release was more than a month ago. The nightly job that generates the CUDA docker image used by Azure CI has been failing for a long time.

That was until yesterday, when #10088 fixed it. At midnight that day, the image got rebuilt, now with the new fairscale version (0.4.1). This explains why the failure did not appear before.

Fixes #10196

Proof:

Fixed image build (with 0.4.1, 21h ago): https://github.com/PyTorchLightning/pytorch-lightning/runs/4016407989?check_suite_focus=true#step:5:1957

Previous failing build (2 days ago): https://github.com/PyTorchLightning/pytorch-lightning/runs/4003726347?check_suite_focus=true

Haven't checked when was the last successful build.

This is a good example why #10060 is important.

Does your PR introduce any breaking changes? If yes, please list them.

None

Before submitting

  • [n/a] Was this discussed/approved via a GitHub issue? (not for typos and docs)
  • Did you read the contributor guideline, Pull Request section?
  • Did you make sure your PR does only one thing, instead of bundling different changes together?
  • [n/a] Did you make sure to update the documentation with your changes? (if necessary)
  • [n/a] Did you write any new necessary tests? (not for typos and docs)
  • [n/a] Did you verify new and existing tests pass locally with your changes?
  • [n/a] Did you list all the breaking changes introduced by this pull request?
  • [n/a] Did you update the CHANGELOG? (not for typos, docs, test updates, or internal minor changes/refactorings)

PR review

  • Is this pull request ready for review? (if not, please submit in draft mode)
  • Check that all items from Before submitting are resolved
  • Make sure the title is self-explanatory and the description concisely explains the PR
  • Add labels and milestones (and optionally projects) to the PR so it can be classified

@carmocca carmocca added priority: 0 High priority task ci Continuous Integration labels Oct 27, 2021
@carmocca carmocca added this to the v1.6 milestone Oct 27, 2021
@carmocca carmocca self-assigned this Oct 27, 2021
to trigger the rebuild
Copy link
Contributor

@daniellepintz daniellepintz left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for the fix!!

@mergify mergify bot added the ready PRs ready to be merged label Oct 27, 2021
@mergify mergify bot requested a review from a team October 27, 2021 21:34
@awaelchli awaelchli enabled auto-merge (squash) October 27, 2021 21:55
This reverts commit 19fefe3.
@awaelchli awaelchli merged commit 3a4e997 into master Oct 27, 2021
@awaelchli awaelchli deleted the hotfix/ci-fairscale branch October 27, 2021 23:24
@carmocca carmocca mentioned this pull request Jan 14, 2022
7 tasks
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

ci Continuous Integration priority: 0 High priority task ready PRs ready to be merged

Projects

None yet

Development

Successfully merging this pull request may close these issues.

Azure Pipelines / PL.pytorch-lightning (GPUs) failing with Bash exited with code '1'.

5 participants