Skip to content
This repository was archived by the owner on Mar 21, 2024. It is now read-only.

Conversation

@ant0nsc
Copy link
Contributor

@ant0nsc ant0nsc commented Jun 17, 2021

This fixes in issue where in test set inference on multi-GPU jobs with LightningContainer models got stuck, attempting to
communicate with processes that are dead already.

Closes #493

Please follow the guidelines for PRs contained here. Checklist:

  • Ensure that your PR is small, and implements one change.
  • Add unit tests for all functions that you introduced or modified.
  • Run PyCharm's code cleanup tools on your Python files.
  • Link the correct GitHub issue for tracking.
  • Update the Changelog file: Describe your change in terms of
    Added/Changed/Removed/... in the "Upcoming" section.
  • When merging your PR, replace the default merge message with a description of your PR,
    and if needed a motivation why that change was required.

@ant0nsc ant0nsc enabled auto-merge (squash) June 17, 2021 15:02
@ant0nsc ant0nsc merged commit 9749954 into main Jun 17, 2021
@ant0nsc ant0nsc deleted the antonsc/inference_fix branch June 17, 2021 16:34
Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

Multi-node training jobs for LightningContainer models can get stuck at inference time

3 participants