Skip to content

Conversation

@Borda
Copy link
Collaborator

@Borda Borda commented Mar 14, 2022

What does this PR do?

Fixes #12314
Closes #12330
Closes #12353

GPU CI jobs are failing. MPI libs seems to be missing
Ref: https://github.com/horovod/horovod/blob/980ce053bf438f3f389c44d7eb90dbb4ed4cdde1/horovod/torch/__init__.py#L53

Does your PR introduce any breaking changes? If yes, please list them.

Before submitting

  • Was this discussed/approved via a GitHub issue? (not for typos and docs)
  • Did you read the contributor guideline, Pull Request section?
  • Did you make sure your PR does only one thing, instead of bundling different changes together?
  • Did you make sure to update the documentation with your changes? (if necessary)
  • Did you write any new necessary tests? (not for typos and docs)
  • Did you verify new and existing tests pass locally with your changes?
  • Did you list all the breaking changes introduced by this pull request?
  • Did you update the CHANGELOG? (not for typos, docs, test updates, or internal minor changes/refactorings)

PR review

Anyone in the community is welcome to review the PR.
Before you start reviewing make sure you have read Review guidelines. In short, see the following bullet-list:

  • Is this pull request ready for review? (if not, please submit in draft mode)
  • Check that all items from Before submitting are resolved
  • Make sure the title is self-explanatory and the description concisely explains the PR
  • Add labels and milestones (and optionally projects) to the PR so it can be classified

Did you have fun?

Make sure you had fun coding 🙃

cc @carmocca @akihironitta @Borda @tchaton @rohitgr7

@Borda Borda added the priority: 1 Medium priority task label Mar 14, 2022
@akihironitta akihironitta added ci Continuous Integration priority: 1 Medium priority task and removed priority: 1 Medium priority task labels Mar 14, 2022
@akihironitta akihironitta added this to the 1.6 milestone Mar 14, 2022
@awaelchli
Copy link
Contributor

@Borda did you try pushing the docker image? #12330 (review)

@Lightning-AI Lightning-AI deleted a comment from awaelchli Mar 17, 2022
@Borda Borda force-pushed the docker/mpi-horovod branch from 1c57c17 to 1c7a1ba Compare March 18, 2022 07:54
@Borda Borda marked this pull request as ready for review March 18, 2022 08:30
@Borda Borda requested review from carmocca and tchaton as code owners March 18, 2022 08:30
@Borda Borda requested review from a team, akihironitta and awaelchli March 18, 2022 08:30
@Borda Borda mentioned this pull request Mar 18, 2022
12 tasks
@Borda Borda changed the title Docker: Horovod w. MPI Docker: fix NCCL building Horovod Mar 18, 2022
Copy link
Contributor

@tchaton tchaton left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM !

@mergify mergify bot added the ready PRs ready to be merged label Mar 18, 2022
@mergify mergify bot requested a review from a team March 18, 2022 12:42
@Borda Borda enabled auto-merge (squash) March 18, 2022 14:19
@Borda Borda merged commit efa870e into master Mar 18, 2022
@Borda Borda deleted the docker/mpi-horovod branch March 18, 2022 14:23
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

ci Continuous Integration priority: 1 Medium priority task ready PRs ready to be merged

Projects

None yet

Development

Successfully merging this pull request may close these issues.

AttributeError: module 'horovod.torch' has no attribute 'nccl_built'

4 participants