Skip to content

Commit d5f35ec

Browse files
CI/CD: Add CUDA version to docker image tags (#13831)
* append cuda version to tags * revertme: push to hub * Update docker readme * Build base-conda-py3.9-torch1.12-cuda11.3.1 * Use new images in conda tests * revertme: push to hub * Revert "revertme: push to hub" This reverts commit 0f7d534. * Revert "revertme: push to hub" This reverts commit 46a05fc. * Run conda if workflow edited * Run gpu testing if workflow edited * Use new tags in release/Dockerfile * Build base-cuda and PL release images with all combinations * Update release docker * Update conda from py3.9-torch1.12 to py3.10-torch.1.12 * Fix ubuntu version * Revert conda * revertme: push to hub * Don't build Python 3.10 for now... * Fix pl release builder * updating version contribute to the error? docker/buildx#456 * Update actions' versions * Update slack user to notify * Don't use 11.6.0 to avoid bagua incompatibility * Don't use 11.1, and use 11.1.1 * Update .github/workflows/ci-pytorch_test-conda.yml Co-authored-by: Luca Medeiros <[email protected]> * Update trigger * Ignore artfacts from tutorials * Trim docker images to distribute * Add an image for tutorials * Update conda image 3.8x1.10 * Try different conda variants * No need to set cuda for conda jobs * Update who to notify ipu failure * Don't push * update filenaem Co-authored-by: Luca Medeiros <[email protected]>
1 parent ddb476d commit d5f35ec

File tree

8 files changed

+87
-88
lines changed

8 files changed

+87
-88
lines changed

.azure/gpu-benchmark.yml

Lines changed: 1 addition & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -28,7 +28,7 @@ jobs:
2828
cancelTimeoutInMinutes: "2"
2929
pool: azure-jirka-spot
3030
container:
31-
image: "pytorchlightning/pytorch_lightning:base-cuda-py3.9-torch1.12"
31+
image: "pytorchlightning/pytorch_lightning:base-cuda-py3.9-torch1.12-cuda11.3.1"
3232
options: "--runtime=nvidia -e NVIDIA_VISIBLE_DEVICES=all --shm-size=32g"
3333
workspace:
3434
clean: all

.azure/gpu-tests.yml

Lines changed: 2 additions & 2 deletions
Original file line numberDiff line numberDiff line change
@@ -26,7 +26,7 @@ jobs:
2626
strategy:
2727
matrix:
2828
'PyTorch - stable':
29-
image: "pytorchlightning/pytorch_lightning:base-cuda-py3.9-torch1.12"
29+
image: "pytorchlightning/pytorch_lightning:base-cuda-py3.9-torch1.12-cuda11.3.1"
3030
# how long to run the job before automatically cancelling
3131
timeoutInMinutes: "80"
3232
# how much time to give 'run always even if cancelled tasks' before stopping them
@@ -44,7 +44,7 @@ jobs:
4444

4545
- bash: |
4646
CHANGED_FILES=$(git diff --name-status origin/master -- . | awk '{print $2}')
47-
FILTER='src/pytorch_lightning|requirements/pytorch|tests/tests_pytorch|examples/pl_*|.azure/*'
47+
FILTER='src/pytorch_lightning|requirements/pytorch|tests/tests_pytorch|examples/pl_*|.azure/gpu-tests.yml'
4848
echo $CHANGED_FILES > changed_files.txt
4949
MATCHES=$(cat changed_files.txt | grep -E $FILTER)
5050
echo $MATCHES

.github/workflows/ci-pytorch-test-conda.yml

Lines changed: 1 addition & 3 deletions
Original file line numberDiff line numberDiff line change
@@ -22,13 +22,11 @@ jobs:
2222
strategy:
2323
fail-fast: false
2424
matrix:
25-
# nightly: add when there's a release candidate
2625
include:
2726
- {python-version: "3.8", pytorch-version: "1.9"}
2827
- {python-version: "3.8", pytorch-version: "1.10"}
2928
- {python-version: "3.9", pytorch-version: "1.11"}
3029
- {python-version: "3.9", pytorch-version: "1.12"}
31-
3230
timeout-minutes: 30
3331

3432
steps:
@@ -45,7 +43,7 @@ jobs:
4543
id: skip
4644
shell: bash -l {0}
4745
run: |
48-
FILTER='src/pytorch_lightning|requirements/pytorch|tests/tests_pytorch|examples/pl_*'
46+
FILTER='src/pytorch_lightning|requirements/pytorch|tests/tests_pytorch|examples/pl_*|.github/workflows/ci-pytorch-test-conda.yml'
4947
echo "${{ steps.changed-files.outputs.all_changed_files }}" | tr " " "\n" > changed_files.txt
5048
MATCHES=$(cat changed_files.txt | grep -E $FILTER)
5149
echo $MATCHES

.github/workflows/cicd-pytorch-dockers.yml

Lines changed: 42 additions & 38 deletions
Original file line numberDiff line numberDiff line change
@@ -29,17 +29,22 @@ jobs:
2929
strategy:
3030
fail-fast: false
3131
matrix:
32-
# the config used in '.azure-pipelines/gpu-tests.yml' since the Dockerfile uses the cuda image
33-
python_version: ["3.9"]
34-
pytorch_version: ["1.12"]
32+
include:
33+
# We only release one docker image per PyTorch version.
34+
# The matrix here is the same as the one in release-docker.yml.
35+
- {python_version: "3.9", pytorch_version: "1.9", cuda_version: "11.1.1"}
36+
- {python_version: "3.9", pytorch_version: "1.10", cuda_version: "11.3.1"}
37+
- {python_version: "3.9", pytorch_version: "1.11", cuda_version: "11.3.1"}
38+
- {python_version: "3.9", pytorch_version: "1.12", cuda_version: "11.3.1"}
3539
steps:
36-
- uses: actions/checkout@v2
40+
- uses: actions/checkout@v3
3741
- uses: docker/setup-buildx-action@v2
38-
- uses: docker/build-push-action@v2
42+
- uses: docker/build-push-action@v3
3943
with:
4044
build-args: |
4145
PYTHON_VERSION=${{ matrix.python_version }}
4246
PYTORCH_VERSION=${{ matrix.pytorch_version }}
47+
CUDA_VERSION=${{ matrix.cuda_version }}
4348
file: dockers/release/Dockerfile
4449
push: false # pushed in release-docker.yml only when PL is released
4550
timeout-minutes: 50
@@ -53,14 +58,14 @@ jobs:
5358
python_version: ["3.7"]
5459
xla_version: ["1.12"]
5560
steps:
56-
- uses: actions/checkout@v2
61+
- uses: actions/checkout@v3
5762
- uses: docker/setup-buildx-action@v2
58-
- uses: docker/login-action@v1
63+
- uses: docker/login-action@v2
5964
if: env.PUSH_TO_HUB == 'true'
6065
with:
6166
username: ${{ secrets.DOCKER_USERNAME }}
6267
password: ${{ secrets.DOCKER_PASSWORD }}
63-
- uses: docker/build-push-action@v2
68+
- uses: docker/build-push-action@v3
6469
with:
6570
build-args: |
6671
PYTHON_VERSION=${{ matrix.python_version }}
@@ -85,30 +90,31 @@ jobs:
8590
fail-fast: false
8691
matrix:
8792
include:
88-
# the config used in '.azure-pipelines/gpu-tests.yml'
89-
- {python_version: "3.9", pytorch_version: "1.12", cuda_version: "11.3.1", ubuntu_version: "20.04"}
90-
# latest (used in Tutorials)
91-
- {python_version: "3.8", pytorch_version: "1.9", cuda_version: "11.1.1", ubuntu_version: "20.04"}
92-
- {python_version: "3.9", pytorch_version: "1.10", cuda_version: "11.1.1", ubuntu_version: "20.04"}
93-
- {python_version: "3.9", pytorch_version: "1.11", cuda_version: "11.3.1", ubuntu_version: "20.04"}
93+
# These are the base images for PL release docker images,
94+
# so include at least all of the combinations in release-dockers.yml.
95+
- {python_version: "3.9", pytorch_version: "1.9", cuda_version: "11.1.1"}
96+
- {python_version: "3.9", pytorch_version: "1.10", cuda_version: "11.3.1"}
97+
- {python_version: "3.9", pytorch_version: "1.11", cuda_version: "11.3.1"}
98+
- {python_version: "3.9", pytorch_version: "1.12", cuda_version: "11.3.1"}
99+
# Used in Lightning-AI/tutorials
100+
- {python_version: "3.8", pytorch_version: "1.9", cuda_version: "11.1.1"}
94101
steps:
95-
- uses: actions/checkout@v2
102+
- uses: actions/checkout@v3
96103
- uses: docker/setup-buildx-action@v2
97-
- uses: docker/login-action@v1
104+
- uses: docker/login-action@v2
98105
if: env.PUSH_TO_HUB == 'true'
99106
with:
100107
username: ${{ secrets.DOCKER_USERNAME }}
101108
password: ${{ secrets.DOCKER_PASSWORD }}
102-
- uses: docker/build-push-action@v2
109+
- uses: docker/build-push-action@v3
103110
with:
104111
build-args: |
105112
PYTHON_VERSION=${{ matrix.python_version }}
106113
PYTORCH_VERSION=${{ matrix.pytorch_version }}
107114
CUDA_VERSION=${{ matrix.cuda_version }}
108-
UBUNTU_VERSION=${{ matrix.ubuntu_version }}
109115
file: dockers/base-cuda/Dockerfile
110116
push: ${{ env.PUSH_TO_HUB }}
111-
tags: pytorchlightning/pytorch_lightning:base-cuda-py${{ matrix.python_version }}-torch${{ matrix.pytorch_version }}
117+
tags: pytorchlightning/pytorch_lightning:base-cuda-py${{ matrix.python_version }}-torch${{ matrix.pytorch_version }}-cuda${{ matrix.cuda_version }}
112118
timeout-minutes: 95
113119
- uses: ravsamhq/notify-slack-action@v1
114120
if: failure() && env.PUSH_TO_HUB == 'true'
@@ -126,25 +132,23 @@ jobs:
126132
fail-fast: false
127133
matrix:
128134
include:
129-
- {python_version: "3.8", pytorch_version: "1.9", cuda_version: "11.1.1"}
130-
- {python_version: "3.8", pytorch_version: "1.10", cuda_version: "11.1.1"}
131-
- {python_version: "3.9", pytorch_version: "1.11", cuda_version: "11.3.1"}
132-
# nightly: add when there's a release candidate
133-
# - {python_version: "3.9", pytorch_version: "1.12"}
135+
- {python_version: "3.8", pytorch_version: "1.9"}
136+
- {python_version: "3.8", pytorch_version: "1.10"}
137+
- {python_version: "3.9", pytorch_version: "1.11"}
138+
- {python_version: "3.9", pytorch_version: "1.12"}
134139
steps:
135-
- uses: actions/checkout@v2
140+
- uses: actions/checkout@v3
136141
- uses: docker/setup-buildx-action@v2
137-
- uses: docker/login-action@v1
142+
- uses: docker/login-action@v2
138143
if: env.PUSH_TO_HUB == 'true'
139144
with:
140145
username: ${{ secrets.DOCKER_USERNAME }}
141146
password: ${{ secrets.DOCKER_PASSWORD }}
142-
- uses: docker/build-push-action@v2
147+
- uses: docker/build-push-action@v3
143148
with:
144149
build-args: |
145150
PYTHON_VERSION=${{ matrix.python_version }}
146151
PYTORCH_VERSION=${{ matrix.pytorch_version }}
147-
CUDA_VERSION=${{ matrix.cuda_version }}
148152
file: dockers/base-conda/Dockerfile
149153
push: ${{ env.PUSH_TO_HUB }}
150154
tags: pytorchlightning/pytorch_lightning:base-conda-py${{ matrix.python_version }}-torch${{ matrix.pytorch_version }}
@@ -168,14 +172,14 @@ jobs:
168172
# the config used in 'dockers/ci-runner-ipu/Dockerfile'
169173
- {python_version: "3.9", pytorch_version: "1.9"}
170174
steps:
171-
- uses: actions/checkout@v2
175+
- uses: actions/checkout@v3
172176
- uses: docker/setup-buildx-action@v2
173-
- uses: docker/login-action@v1
177+
- uses: docker/login-action@v2
174178
if: env.PUSH_TO_HUB == 'true'
175179
with:
176180
username: ${{ secrets.DOCKER_USERNAME }}
177181
password: ${{ secrets.DOCKER_PASSWORD }}
178-
- uses: docker/build-push-action@v2
182+
- uses: docker/build-push-action@v3
179183
with:
180184
build-args: |
181185
PYTHON_VERSION=${{ matrix.python_version }}
@@ -184,7 +188,7 @@ jobs:
184188
push: ${{ env.PUSH_TO_HUB }}
185189
tags: pytorchlightning/pytorch_lightning:base-ipu-py${{ matrix.python_version }}-torch${{ matrix.pytorch_version }}
186190
timeout-minutes: 100
187-
- uses: docker/build-push-action@v2
191+
- uses: docker/build-push-action@v3
188192
with:
189193
build-args: |
190194
PYTHON_VERSION=${{ matrix.python_version }}
@@ -199,7 +203,7 @@ jobs:
199203
status: ${{ job.status }}
200204
token: ${{ secrets.GITHUB_TOKEN }}
201205
notification_title: ${{ format('IPU; {0} py{1} for *{2}*', runner.os, matrix.python_version, matrix.pytorch_version) }}
202-
message_format: '{emoji} *{workflow}* {status_message}, see <{run_url}|detail>, cc: <@U01BULUS2BG>' # SeanNaren
206+
message_format: '{emoji} *{workflow}* {status_message}, see <{run_url}|detail>, cc: <@U01GD29QCAV>' # kaushikb11
203207
env:
204208
SLACK_WEBHOOK_URL: ${{ secrets.SLACK_WEBHOOK_URL }}
205209

@@ -212,14 +216,14 @@ jobs:
212216
# the config used in 'dockers/ci-runner-hpu/Dockerfile'
213217
- {gaudi_version: "1.5.0", pytorch_version: "1.11.0"}
214218
steps:
215-
- uses: actions/checkout@v2
219+
- uses: actions/checkout@v3
216220
- uses: docker/setup-buildx-action@v2
217-
- uses: docker/login-action@v1
221+
- uses: docker/login-action@v2
218222
if: env.PUSH_TO_HUB == 'true'
219223
with:
220224
username: ${{ secrets.DOCKER_USERNAME }}
221225
password: ${{ secrets.DOCKER_PASSWORD }}
222-
- uses: docker/build-push-action@v2
226+
- uses: docker/build-push-action@v3
223227
with:
224228
build-args: |
225229
DIST=latest
@@ -243,10 +247,10 @@ jobs:
243247
runs-on: ubuntu-20.04
244248
steps:
245249
- name: Checkout
246-
uses: actions/checkout@v2
250+
uses: actions/checkout@v3
247251
- name: Build Conda Docker
248252
# publish master/release
249-
uses: docker/build-push-action@v2
253+
uses: docker/build-push-action@v3
250254
with:
251255
file: dockers/nvidia/Dockerfile
252256
push: false

.github/workflows/release-docker.yml

Lines changed: 22 additions & 9 deletions
Original file line numberDiff line numberDiff line change
@@ -1,6 +1,5 @@
11
name: Docker
2-
# https://www.docker.com/blog/first-docker-github-action-is-here
3-
# https://github.com/docker/build-push-action
2+
43
on:
54
push:
65
branches: [master, "release/*"]
@@ -15,8 +14,12 @@ jobs:
1514
strategy:
1615
fail-fast: false
1716
matrix:
18-
python_version: ["3.7", "3.8", "3.9"]
19-
pytorch_version: ["1.9", "1.10"]
17+
include:
18+
# We only release one docker image per PyTorch version.
19+
- {python_version: "3.9", pytorch_version: "1.9", cuda_version: "11.1.1"}
20+
- {python_version: "3.9", pytorch_version: "1.10", cuda_version: "11.3.1"}
21+
- {python_version: "3.9", pytorch_version: "1.11", cuda_version: "11.3.1"}
22+
- {python_version: "3.9", pytorch_version: "1.12", cuda_version: "11.3.1"}
2023
steps:
2124
- name: Checkout
2225
uses: actions/checkout@v2
@@ -32,19 +35,29 @@ jobs:
3235
username: ${{ secrets.DOCKER_USERNAME }}
3336
password: ${{ secrets.DOCKER_PASSWORD }}
3437
dockerfile: dockers/release/Dockerfile
35-
build_args: PYTHON_VERSION=${{ matrix.python_version }},PYTORCH_VERSION=${{ matrix.pytorch_version }},LIGHTNING_VERSION=${{ steps.get_version.outputs.RELEASE_VERSION }}
36-
tags: "${{ steps.get_version.outputs.RELEASE_VERSION }}-py${{ matrix.python_version }}-torch${{ matrix.pytorch_version }},latest-py${{ matrix.python_version }}-torch${{ matrix.pytorch_version }}"
38+
build_args: |
39+
PYTHON_VERSION=${{ matrix.python_version }}
40+
PYTORCH_VERSION=${{ matrix.pytorch_version }}
41+
CUDA_VERSION=${{ matrix.cuda_version }}
42+
LIGHTNING_VERSION=${{ steps.get_version.outputs.RELEASE_VERSION }}
43+
tags: |
44+
${{ steps.get_version.outputs.RELEASE_VERSION }}-py${{ matrix.python_version }}-torch${{ matrix.pytorch_version }}-cuda${{ matrix.cuda_version }}
45+
latest-py${{ matrix.python_version }}-torch${{ matrix.pytorch_version }}-cuda${{ matrix.cuda_version }}
3746
timeout-minutes: 55
3847

3948
- name: Publish Latest to Docker
4049
uses: docker/[email protected]
41-
# only on releases and latest Python and PyTorch
42-
if: matrix.python_version == '3.9' && matrix.pytorch_version == '1.10'
50+
# Only latest Python and PyTorch
51+
if: matrix.python_version == '3.9' && matrix.pytorch_version == '1.12'
4352
with:
4453
repository: pytorchlightning/pytorch_lightning
4554
username: ${{ secrets.DOCKER_USERNAME }}
4655
password: ${{ secrets.DOCKER_PASSWORD }}
4756
dockerfile: dockers/release/Dockerfile
48-
build_args: PYTHON_VERSION=${{ matrix.python_version }},PYTORCH_VERSION=${{ matrix.pytorch_version }},LIGHTNING_VERSION=${{ steps.get_version.outputs.RELEASE_VERSION }}
57+
build_args: |
58+
PYTHON_VERSION=${{ matrix.python_version }}
59+
PYTORCH_VERSION=${{ matrix.pytorch_version }}
60+
CUDA_VERSION=${{ matrix.cuda_version }}
61+
LIGHTNING_VERSION=${{ steps.get_version.outputs.RELEASE_VERSION }}
4962
tags: "latest"
5063
timeout-minutes: 55

.gitignore

Lines changed: 6 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -165,3 +165,9 @@ hars*
165165
artifacts/*
166166
*docs/examples*
167167
*docs/source-app/api*
168+
169+
# tutorials
170+
our_model.tar
171+
test.png
172+
saved_models
173+
data/

dockers/README.md

Lines changed: 11 additions & 34 deletions
Original file line numberDiff line numberDiff line change
@@ -1,36 +1,17 @@
11
# Docker images
22

3-
## Builds images form attached Dockerfiles
3+
## Build images from Dockerfiles
44

55
You can build it on your own, note it takes lots of time, be prepared.
66

77
```bash
8-
git clone <git-repository>
9-
docker image build -t pytorch-lightning:latest -f dockers/conda/Dockerfile .
10-
```
11-
12-
or with specific arguments
13-
14-
```bash
15-
git clone <git-repository>
16-
docker image build \
17-
-t pytorch-lightning:base-cuda-py3.9-pt1.10 \
18-
-f dockers/base-cuda/Dockerfile \
19-
--build-arg PYTHON_VERSION=3.9 \
20-
--build-arg PYTORCH_VERSION=1.10 \
21-
.
22-
```
8+
git clone https://github.com/Lightning-AI/lightning.git
239

24-
or nightly version from Conda
10+
# build with the default arguments
11+
docker image build -t pytorch-lightning:latest -f dockers/base-cuda/Dockerfile .
2512

26-
```bash
27-
git clone <git-repository>
28-
docker image build \
29-
-t pytorch-lightning:base-conda-py3.9-pt1.11 \
30-
-f dockers/base-conda/Dockerfile \
31-
--build-arg PYTHON_VERSION=3.9 \
32-
--build-arg PYTORCH_VERSION=1.11 \
33-
.
13+
# build with specific arguments
14+
docker image build -t pytorch-lightning:base-cuda-py3.9-torch1.11-cuda11.3.1 -f dockers/base-cuda/Dockerfile --build-arg PYTHON_VERSION=3.9 --build-arg PYTORCH_VERSION=1.11 --build-arg CUDA_VERSION=11.3.1 .
3415
```
3516

3617
To run your docker use
@@ -49,7 +30,7 @@ docker image rm pytorch-lightning:latest
4930

5031
## Run docker image with GPUs
5132

52-
To run docker image with access to you GPUs you need to install
33+
To run docker image with access to your GPUs, you need to install
5334

5435
```bash
5536
# Add the package repositories
@@ -61,10 +42,10 @@ sudo apt-get update && sudo apt-get install -y nvidia-container-toolkit
6142
sudo systemctl restart docker
6243
```
6344

64-
and later run the docker image with `--gpus all` so for example
45+
and later run the docker image with `--gpus all`. For example,
6546

6647
```
67-
docker run --rm -it --gpus all pytorchlightning/pytorch_lightning:base-cuda-py3.9-torch1.10
48+
docker run --rm -it --gpus all pytorchlightning/pytorch_lightning:base-cuda-py3.9-torch1.11-cuda11.3.1
6849
```
6950

7051
## Run Jupyter server
@@ -73,15 +54,11 @@ Inspiration comes from https://u.group/thinking/how-to-put-jupyter-notebooks-in-
7354

7455
1. Build the docker image:
7556
```bash
76-
docker image build \
77-
-t pytorch-lightning:v1.3.1 \
78-
-f dockers/nvidia/Dockerfile \
79-
--build-arg LIGHTNING_VERSION=1.3.1 \
80-
.
57+
docker image build -t pytorch-lightning:v1.6.5 -f dockers/nvidia/Dockerfile --build-arg LIGHTNING_VERSION=1.6.5 .
8158
```
8259
1. start the server and map ports:
8360
```bash
84-
docker run --rm -it --runtime=nvidia -e NVIDIA_VISIBLE_DEVICES=all -p 8888:8888 pytorch-lightning:v1.3.1
61+
docker run --rm -it --runtime=nvidia -e NVIDIA_VISIBLE_DEVICES=all -p 8888:8888 pytorch-lightning:v1.6.5
8562
```
8663
1. Connect in local browser:
8764
- copy the generated path e.g. `http://hostname:8888/?token=0719fa7e1729778b0cec363541a608d5003e26d4910983c6`

0 commit comments

Comments
 (0)