Skip to content
This repository was archived by the owner on Mar 21, 2024. It is now read-only.

Conversation

@ant0nsc
Copy link
Contributor

@ant0nsc ant0nsc commented Dec 9, 2021

Workaround for an issue with low-priority preemption: checkpoint files are not available.

Please follow the guidelines for PRs contained here. Checklist:

  • Ensure that your PR is small, and implements one change.
  • Add unit tests for all functions that you introduced or modified.
  • Run PyCharm's code cleanup tools on your Python files.
  • Link the correct GitHub issue for tracking.
  • Update the Changelog file: Describe your change in terms of
    Added/Changed/Removed/... in the "Upcoming" section.
  • When merging your PR, replace the default merge message with a description of your PR,
    and if needed a motivation why that change was required.

javier-alvarez
javier-alvarez previously approved these changes Dec 9, 2021
Copy link
Contributor

@javier-alvarez javier-alvarez left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks good but a lot of things have been moved, so it is a bit difficult to review the real changes.

@ant0nsc ant0nsc enabled auto-merge (squash) December 9, 2021 17:08
temp_folder = download_checkpoints_to_temp_folder()
available_checkpoints = find_all_recovery_checkpoints(temp_folder)
if available_checkpoints is not None:
return extract_latest_checkpoint_and_epoch(available_checkpoints)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is there a potential scenario in which we are not currently running in AML, but we want to call download_checkpoints_to_temp_folder, i.e. for a previous Run?

@ant0nsc ant0nsc merged commit c7eef5e into main Dec 9, 2021
@ant0nsc ant0nsc deleted the antonsc/checkpointhotfix branch December 9, 2021 20:18
Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

4 participants