-
Notifications
You must be signed in to change notification settings - Fork 521
[pytorch] | [build, test] | [ec2, ecs, eks, sagemaker] PT 1.4 patch release #216
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
| account_id: &ACCOUNT_ID <set-$ACCOUNT_ID-in-environment> | ||
| region: ®ION <set-$REGION-in-environment> | ||
| framework: &FRAMEWORK pytorch | ||
| version: &VERSION 1.5.0 |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Our main buildspec file will be for 1.5 only.
Changes in 1.4 will far and fewer than PT1.5.
Please create a sperate buildspce for PT1.4 not PT1.5
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
What I want to do is to use it as a one time release purpose file. Will recover to PT1.5 once PT 1.4 patch release done.
Which method do you prefer? To do modification on root buildspec.yml? It also requires a recover operation
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
For now, this is fine. Let's use buildspec.yml for containers we want to build currently.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks for confirming.
|
|
||
| # Add any script or repo as required | ||
| RUN cd /var && wget https://raw.githubusercontent.com/kubeflow/pytorch-operator/master/examples/mnist/mnist.py | ||
| COPY mnist.py /var/mnist.py |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Why this change?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This is introduced since PT 1.5. PT team change the script and we now maintain it as a separate artifact https://github.com/aws/deep-learning-containers/blob/master/pytorch/training/docker/build_artifacts/mnist.py
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Is this necessary now that we are building a PT 1.4 image?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Talked with Tushar. We think it should be a general change and should also apply to PT1.4. @TusharKanekiDey
| LOG_DIR=${HOME_DIR}/logs | ||
|
|
||
| git clone https://github.com/pytorch/examples.git ${HOME_DIR}/artifacts/examples | ||
| sed -i "s/param.add_(-0.1 \* param.grad)/param.data.add_(-0.1 \* param.grad)/g" ${HOME_DIR}/artifacts/examples/regression/main.py # temporary patch for isssue https://github.com/pytorch/examples/issues/782 |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Why this change?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
There is a issue with this test. The PT example repo fixed a similar issue, but seems forgot to fix this file. pytorch/examples#779
I opened a PR for their repo but did not have response yet. This is a temp approach to make the test pass.
Issue #, if available:
Checklist
Description:
Tests run:
DLC image/dockerfile:
Additional context:
By submitting this pull request, I confirm that my contribution is made under the terms of the Apache 2.0 license. I confirm that you can use, modify, copy, and redistribute this contribution, under the terms of your choice.