Skip to content

Conversation

@trz42
Copy link
Owner

@trz42 trz42 commented Mar 20, 2024

PR to test new deploy code relying on result files ... see EESSI/eessi-bot-software-layer#263

Software to be installed:

1 out of 3 required modules missing:

* CaDiCaL/1.3.0-GCC-10.3.0 (CaDiCaL-1.3.0-GCC-10.3.0.eb)

Test scenarios:

  1. Build with some successes and some failures, set bot:deploy label and check what gets uploaded and where (which S3 buckets). Document upload_policy, _prefix settings for upload directories, bucket specs, ...

@eessi-bot-devel-trz42
Copy link

Instance dev-PR254 is configured to build:

  • arch x86_64/amd/zen2 for repo nessi-2022.11-swl-deb10
  • arch x86_64/amd/zen2 for repo nessi-2023.06-cl
  • arch x86_64/amd/zen2 for repo nessi-2023.06-swl-deb10
  • arch x86_64/amd/zen2 for repo nessi-2023.06-swl-deb11
  • arch aarch64/generic for repo nessi-2022.11-swl-deb10
  • arch aarch64/generic for repo nessi-2023.06-cl
  • arch aarch64/generic for repo nessi-2023.06-swl-deb10
  • arch aarch64/generic for repo nessi-2023.06-swl-deb11
  • arch aarch64/thunderx2 for repo nessi-2022.11-swl-deb10
  • arch aarch64/thunderx2 for repo nessi-2023.06-cl
  • arch aarch64/thunderx2 for repo nessi-2023.06-swl-deb10
  • arch aarch64/thunderx2 for repo nessi-2023.06-swl-deb11

@trz42
Copy link
Owner Author

trz42 commented Mar 20, 2024

bot: build repo:swl-deb10 arch:zen2
bot: build repo:swl-deb10 arch:generic

@eessi-bot-devel-trz42
Copy link

eessi-bot-devel-trz42 bot commented Mar 20, 2024

Updates by the bot instance dev-PR254 (click for details)

@eessi-bot-devel-trz42
Copy link

eessi-bot-devel-trz42 bot commented Mar 20, 2024

New job on instance dev-PR254 for architecture x86_64-amd-zen2 for repository nessi-2022.11-swl-deb10 in job dir /home/thomarob/bot-devel/test_sync_feb24/jobs/2024.03/pr_73/154334

date job status comment
Mar 20 10:40:56 AM UTC 2024 submitted job id 154334 awaits release by job manager
Mar 20 10:43:16 AM UTC 2024 released job awaits launch by Slurm scheduler
Mar 20 10:44:29 AM UTC 2024 running job 154334 is running
Mar 20 10:53:10 AM UTC 2024 finished
😢 FAILURE (click triangle for details)
Details
✅ job output file slurm-154334.out
❌ found message matching ERROR:
✅ no message matching FAILED:
✅ no message matching required modules missing:
❌ no message matching No missing installations
✅ found message matching .tar.gz created!
Artefacts
eessi-2022.11-software-linux-x86_64-amd-zen2-1710931953.tar.gzsize: 34 MiB (35942302 bytes)
entries: 61050
modules under 2022.11/software/linux/x86_64/amd/zen2/modules/all
EasyBuild/4.7.2.lua
EasyBuild/4.9.0.lua
software under 2022.11/software/linux/x86_64/amd/zen2/software
EasyBuild/4.7.2
EasyBuild/4.9.0
other under 2022.11/software/linux/x86_64/amd/zen2
2022.11/scripts/gpu_support/nvidia/install_cuda_host_injections.sh
2022.11/scripts/gpu_support/nvidia/link_nvidia_host_libraries.sh
2022.11/scripts/utils.sh
Mar 20 10:53:10 AM UTC 2024 test result (no tests yet)

@eessi-bot-devel-trz42
Copy link

eessi-bot-devel-trz42 bot commented Mar 20, 2024

New job on instance dev-PR254 for architecture x86_64-amd-zen2 for repository nessi-2023.06-swl-deb10 in job dir /home/thomarob/bot-devel/test_sync_feb24/jobs/2024.03/pr_73/154335

date job status comment
Mar 20 10:40:57 AM UTC 2024 submitted job id 154335 awaits release by job manager
Mar 20 10:43:13 AM UTC 2024 released job awaits launch by Slurm scheduler
Mar 20 10:44:26 AM UTC 2024 running job 154335 is running
Mar 20 10:55:14 AM UTC 2024 finished
😁 SUCCESS (click triangle for details)
Details
✅ job output file slurm-154335.out
✅ no message matching ERROR:
✅ no message matching FAILED:
✅ no message matching required modules missing:
✅ found message(s) matching No missing installations
✅ found message matching .tar.gz created!
Artefacts
eessi-2023.06-software-linux-x86_64-amd-zen2-1710932055.tar.gzsize: 1 MiB (1360913 bytes)
entries: 25
modules under 2023.06/software/linux/x86_64/amd/zen2/modules/all
CaDiCaL/1.3.0-GCC-10.3.0.lua
software under 2023.06/software/linux/x86_64/amd/zen2/software
CaDiCaL/1.3.0-GCC-10.3.0
other under 2023.06/software/linux/x86_64/amd/zen2
.lmod/cache/spiderT.lua
.lmod/cache/spiderT.luac_5.1
.lmod/cache/timestamp
Mar 20 10:55:14 AM UTC 2024 test result (no tests yet)
Mar 20 02:35:14 PM UTC 2024 uploaded transfer of eessi-2023.06-software-linux-x86_64-amd-zen2-1710932055.tar.gz to S3 bucket succeeded

@eessi-bot-devel-trz42
Copy link

eessi-bot-devel-trz42 bot commented Mar 20, 2024

New job on instance dev-PR254 for architecture aarch64-generic for repository nessi-2022.11-swl-deb10 in job dir /home/thomarob/bot-devel/test_sync_feb24/jobs/2024.03/pr_73/154336

date job status comment
Mar 20 10:41:04 AM UTC 2024 submitted job id 154336 awaits release by job manager
Mar 20 10:43:07 AM UTC 2024 released job awaits launch by Slurm scheduler
Mar 20 10:44:23 AM UTC 2024 running job 154336 is running
Mar 20 10:46:41 AM UTC 2024 finished
🤷 UNKNOWN (click triangle for details)
  • Job results file _bot_job154336.result does not exist in job directory, or parsing it failed.
  • No artefacts were found/reported.
Mar 20 10:46:41 AM UTC 2024 test result
🤷 UNKNOWN (click triangle for details)
  • Job test file _bot_job154336.test does not exist in job directory, or parsing it failed.

@eessi-bot-devel-trz42
Copy link

eessi-bot-devel-trz42 bot commented Mar 20, 2024

New job on instance dev-PR254 for architecture aarch64-generic for repository nessi-2023.06-swl-deb10 in job dir /home/thomarob/bot-devel/test_sync_feb24/jobs/2024.03/pr_73/154337

date job status comment
Mar 20 10:41:05 AM UTC 2024 submitted job id 154337 awaits release by job manager
Mar 20 10:43:10 AM UTC 2024 released job awaits launch by Slurm scheduler
Mar 20 10:43:19 AM UTC 2024 running job 154337 is running
Mar 20 10:52:05 AM UTC 2024 finished
😁 SUCCESS (click triangle for details)
Details
✅ job output file slurm-154337.out
✅ no message matching ERROR:
✅ no message matching FAILED:
✅ no message matching required modules missing:
✅ found message(s) matching No missing installations
✅ found message matching .tar.gz created!
Artefacts
eessi-2023.06-software-linux-aarch64-generic-1710931902.tar.gzsize: 1 MiB (1273440 bytes)
entries: 25
modules under 2023.06/software/linux/aarch64/generic/modules/all
CaDiCaL/1.3.0-GCC-10.3.0.lua
software under 2023.06/software/linux/aarch64/generic/software
CaDiCaL/1.3.0-GCC-10.3.0
other under 2023.06/software/linux/aarch64/generic
.lmod/cache/spiderT.lua
.lmod/cache/spiderT.luac_5.1
.lmod/cache/timestamp
Mar 20 10:52:05 AM UTC 2024 test result (no tests yet)
Mar 20 02:35:31 PM UTC 2024 uploaded transfer of eessi-2023.06-software-linux-aarch64-generic-1710931902.tar.gz to S3 bucket succeeded

@trz42
Copy link
Owner Author

trz42 commented Mar 20, 2024

All 4 builds jobs launched via (#73 (comment)) have finished. Job 154336 was cancelled to produce a job with UNKNOWN status, particularly the job directory does not contain the results file _bot_job154336.result (verified on eX3 if it doesn't exist or if it couldn't be read).

The bot's configuration when processing the bot: deploy label:

bucket_name = {
    "nessi-2022.11-swl-deb10": "dev-pr254-3",
    "nessi-2023.06-swl-deb10": "dev-pr254-2",
    "nessi-2023.06-swl-deb11": "dev-pr254"}

upload_policy = latest

metadata_prefix = {
    "nessi-2022.11-swl-deb10": "new11-10/'${github_repository}'/'${pull_request_number}'",
    "nessi-2023.06-swl-deb10": "new06-10/'${github_repository}'/'${pull_request_number}'",
    "nessi-2023.06-swl-deb11": "new06-11/'${github_repository}'/'${pull_request_number}'"}

tarball_prefix = {
    "nessi-2022.11-swl-deb10": "tb22.11/'${github_repository}'/'${pull_request_number}'",
    "nessi-2023.06-swl-deb10": "tb23.06-deb10/'${github_repository}'/'${pull_request_number}'",
    "nessi-2023.06-swl-deb11": "tb23.06-deb11/'${github_repository}'/'${pull_request_number}'"}

@trz42 trz42 added the bot:deploy Instruct bot to deploy built artefacts to Stratum 0 label Mar 20, 2024
@trz42
Copy link
Owner Author

trz42 commented Mar 20, 2024

None of the successful jobs was accurately identified as SUCCESS. Part of the log (pyghee.log)

[20240320-T12:01:37] deploy_built_artefacts(): job_dirs = /home/thomarob/bot-devel/test_sync_feb24/jobs/2024.03/pr_73/154334,/home/thomarob/bot-devel/test_sync_feb24/jobs/2024.03/pr_73/154335,/home/thomarob/bot-devel/test_sync_feb24/jobs/2024.03/pr_73/154336,/home/thomarob/bot-devel/test_sync_feb24/jobs/2024.03/pr_73/154337
[20240320-T12:01:37] Found metadata file at /home/thomarob/bot-devel/test_sync_feb24/jobs/2024.03/pr_73/154334/_bot_job154334.result
[20240320-T12:01:37] Found metadata file at /home/thomarob/bot-devel/test_sync_feb24/jobs/2024.03/pr_73/154334/_bot_job154334.metadata
[20240320-T12:01:37] Found metadata file at /home/thomarob/bot-devel/test_sync_feb24/jobs/2024.03/pr_73/154334/_bot_job154334.result
[20240320-T12:01:37] check_job_status(): found status 'FAILURE' from '/home/thomarob/bot-devel/test_sync_feb24/jobs/2024.03/pr_73/154334/_bot_job154334.result'

[20240320-T12:01:37] determine_successful_jobs(): FAILED job in '/home/thomarob/bot-devel/test_sync_feb24/jobs/2024.03/pr_73/154334'
[20240320-T12:01:37] Found metadata file at /home/thomarob/bot-devel/test_sync_feb24/jobs/2024.03/pr_73/154335/_bot_job154335.result
[20240320-T12:01:37] Found metadata file at /home/thomarob/bot-devel/test_sync_feb24/jobs/2024.03/pr_73/154335/_bot_job154335.metadata
[20240320-T12:01:37] Found metadata file at /home/thomarob/bot-devel/test_sync_feb24/jobs/2024.03/pr_73/154335/_bot_job154335.result
[20240320-T12:01:37] check_job_status(): found status 'FAILURE' from '/home/thomarob/bot-devel/test_sync_feb24/jobs/2024.03/pr_73/154335/_bot_job154335.result'

[20240320-T12:01:37] determine_successful_jobs(): FAILED job in '/home/thomarob/bot-devel/test_sync_feb24/jobs/2024.03/pr_73/154335'
[20240320-T12:01:37] No metadata file found at /home/thomarob/bot-devel/test_sync_feb24/jobs/2024.03/pr_73/154336/_bot_job154336.result.
[20240320-T12:01:37] Found metadata file at /home/thomarob/bot-devel/test_sync_feb24/jobs/2024.03/pr_73/154336/_bot_job154336.metadata
[20240320-T12:01:37] No metadata file found at /home/thomarob/bot-devel/test_sync_feb24/jobs/2024.03/pr_73/154336/_bot_job154336.result.
[20240320-T12:01:37] check_job_status(): no result file '/home/thomarob/bot-devel/test_sync_feb24/jobs/2024.03/pr_73/154336/_bot_job154336.result' or reading it failed

[20240320-T12:01:37] determine_successful_jobs(): FAILED job in '/home/thomarob/bot-devel/test_sync_feb24/jobs/2024.03/pr_73/154336'
[20240320-T12:01:37] Found metadata file at /home/thomarob/bot-devel/test_sync_feb24/jobs/2024.03/pr_73/154337/_bot_job154337.result
[20240320-T12:01:37] Found metadata file at /home/thomarob/bot-devel/test_sync_feb24/jobs/2024.03/pr_73/154337/_bot_job154337.metadata
[20240320-T12:01:37] Found metadata file at /home/thomarob/bot-devel/test_sync_feb24/jobs/2024.03/pr_73/154337/_bot_job154337.result
[20240320-T12:01:37] check_job_status(): found status 'FAILURE' from '/home/thomarob/bot-devel/test_sync_feb24/jobs/2024.03/pr_73/154337/_bot_job154337.result'

[20240320-T12:01:37] determine_successful_jobs(): FAILED job in '/home/thomarob/bot-devel/test_sync_feb24/jobs/2024.03/pr_73/154337'
[20240320-T12:01:37] determine_artefacts_to_deploy(): num successful jobs 0

@trz42
Copy link
Owner Author

trz42 commented Mar 20, 2024

Added a bit more logging output.

@trz42 trz42 added bot:deploy Instruct bot to deploy built artefacts to Stratum 0 and removed bot:deploy Instruct bot to deploy built artefacts to Stratum 0 labels Mar 20, 2024
@trz42
Copy link
Owner Author

trz42 commented Mar 20, 2024

The string values seem ok. Changed the comparison operator too.

diff --git a/tasks/deploy.py b/tasks/deploy.py
index 7855416..7092545 100644
--- a/tasks/deploy.py
+++ b/tasks/deploy.py
@@ -189,7 +189,9 @@ def check_job_status(job_dir):
         log(f"{fn}(): no result file '{job_result_file_path}' or reading it failed\n")
         return False

-    if job_status is job_metadata.JOB_RESULT_SUCCESS:
+    log(f"{fn}(): job status is {job_status} (compare against {job_metadata.JOB_RESULT_SUCCESS})\n")
+
+    if job_status == job_metadata.JOB_RESULT_SUCCESS:
         # case (2): result file && status = SUCCESS --> return True
         log(f"{fn}(): found status 'SUCCESS' from '{job_result_file_path}'\n")
         return True

@trz42 trz42 added bot:deploy Instruct bot to deploy built artefacts to Stratum 0 and removed bot:deploy Instruct bot to deploy built artefacts to Stratum 0 labels Mar 20, 2024
@trz42
Copy link
Owner Author

trz42 commented Mar 26, 2024

Rerun builds after updates to bot PR (also removed unused settings in app.cfg)

bot: build repo:swl-deb10 arch:zen2
bot: build repo:swl-deb10 arch:generic

@eessi-bot-devel-trz42
Copy link

eessi-bot-devel-trz42 bot commented Mar 26, 2024

Updates by the bot instance dev-PR254 (click for details)

@eessi-bot-devel-trz42
Copy link

eessi-bot-devel-trz42 bot commented Mar 26, 2024

New job on instance dev-PR254 for architecture x86_64-amd-zen2 for repository nessi-2022.11-swl-deb10 in job dir /home/thomarob/bot-devel/test_sync_feb24/jobs/2024.03/pr_73/159202

date job status comment
Mar 26 04:53:19 PM UTC 2024 submitted job id 159202 awaits release by job manager
Mar 26 04:53:31 PM UTC 2024 released job awaits launch by Slurm scheduler
Mar 26 04:54:45 PM UTC 2024 running job 159202 is running
Mar 26 05:08:20 PM UTC 2024 finished
😢 FAILURE (click triangle for details)
Details
✅ job output file slurm-159202.out
❌ found message matching ERROR:
✅ no message matching FAILED:
✅ no message matching required modules missing:
❌ no message matching No missing installations
✅ found message matching .tar.gz created!
Artefacts
eessi-2022.11-software-linux-x86_64-amd-zen2-1711472853.tar.gzsize: 34 MiB (35982437 bytes)
entries: 61050
modules under 2022.11/software/linux/x86_64/amd/zen2/modules/all
EasyBuild/4.7.2.lua
EasyBuild/4.9.0.lua
software under 2022.11/software/linux/x86_64/amd/zen2/software
EasyBuild/4.7.2
EasyBuild/4.9.0
other under 2022.11/software/linux/x86_64/amd/zen2
2022.11/scripts/gpu_support/nvidia/install_cuda_host_injections.sh
2022.11/scripts/gpu_support/nvidia/link_nvidia_host_libraries.sh
2022.11/scripts/utils.sh
Mar 26 05:08:20 PM UTC 2024 test result (no tests yet)

@eessi-bot-devel-trz42
Copy link

eessi-bot-devel-trz42 bot commented Mar 26, 2024

New job on instance dev-PR254 for architecture x86_64-amd-zen2 for repository nessi-2023.06-swl-deb10 in job dir /home/thomarob/bot-devel/test_sync_feb24/jobs/2024.03/pr_73/159203

date job status comment
Mar 26 04:53:21 PM UTC 2024 submitted job id 159203 awaits release by job manager
Mar 26 04:53:28 PM UTC 2024 released job awaits launch by Slurm scheduler
Mar 26 04:54:42 PM UTC 2024 running job 159203 is running
Mar 26 04:59:05 PM UTC 2024 finished
😁 SUCCESS (click triangle for details)
Details
✅ job output file slurm-159203.out
✅ no message matching ERROR:
✅ no message matching FAILED:
✅ no message matching required modules missing:
✅ found message(s) matching No missing installations
✅ found message matching .tar.gz created!
Artefacts
eessi-2023.06-software-linux-x86_64-amd-zen2-1711472289.tar.gzsize: 1 MiB (1360881 bytes)
entries: 25
modules under 2023.06/software/linux/x86_64/amd/zen2/modules/all
CaDiCaL/1.3.0-GCC-10.3.0.lua
software under 2023.06/software/linux/x86_64/amd/zen2/software
CaDiCaL/1.3.0-GCC-10.3.0
other under 2023.06/software/linux/x86_64/amd/zen2
.lmod/cache/spiderT.lua
.lmod/cache/spiderT.luac_5.1
.lmod/cache/timestamp
Mar 26 04:59:05 PM UTC 2024 test result (no tests yet)
Mar 26 05:18:58 PM UTC 2024 uploaded transfer of eessi-2023.06-software-linux-x86_64-amd-zen2-1711472289.tar.gz to S3 bucket succeeded

@eessi-bot-devel-trz42
Copy link

eessi-bot-devel-trz42 bot commented Mar 26, 2024

New job on instance dev-PR254 for architecture aarch64-generic for repository nessi-2022.11-swl-deb10 in job dir /home/thomarob/bot-devel/test_sync_feb24/jobs/2024.03/pr_73/159204

date job status comment
Mar 26 04:53:27 PM UTC 2024 submitted job id 159204 awaits release by job manager
Mar 26 04:54:34 PM UTC 2024 released job awaits launch by Slurm scheduler
Mar 26 04:55:51 PM UTC 2024 finished
🤷 UNKNOWN (click triangle for details)
  • Job results file _bot_job159204.result does not exist in job directory, or parsing it failed.
  • No artefacts were found/reported.
Mar 26 04:55:51 PM UTC 2024 test result
🤷 UNKNOWN (click triangle for details)
  • Job test file _bot_job159204.test does not exist in job directory, or parsing it failed.

@eessi-bot-devel-trz42
Copy link

eessi-bot-devel-trz42 bot commented Mar 26, 2024

New job on instance dev-PR254 for architecture aarch64-generic for repository nessi-2023.06-swl-deb10 in job dir /home/thomarob/bot-devel/test_sync_feb24/jobs/2024.03/pr_73/159205

date job status comment
Mar 26 04:53:29 PM UTC 2024 submitted job id 159205 awaits release by job manager
Mar 26 04:54:37 PM UTC 2024 released job awaits launch by Slurm scheduler
Mar 26 04:54:39 PM UTC 2024 running job 159205 is running
Mar 26 04:58:01 PM UTC 2024 finished
😁 SUCCESS (click triangle for details)
Details
✅ job output file slurm-159205.out
✅ no message matching ERROR:
✅ no message matching FAILED:
✅ no message matching required modules missing:
✅ found message(s) matching No missing installations
✅ found message matching .tar.gz created!
Artefacts
eessi-2023.06-software-linux-aarch64-generic-1711472211.tar.gzsize: 1 MiB (1273914 bytes)
entries: 25
modules under 2023.06/software/linux/aarch64/generic/modules/all
CaDiCaL/1.3.0-GCC-10.3.0.lua
software under 2023.06/software/linux/aarch64/generic/software
CaDiCaL/1.3.0-GCC-10.3.0
other under 2023.06/software/linux/aarch64/generic
.lmod/cache/spiderT.lua
.lmod/cache/spiderT.luac_5.1
.lmod/cache/timestamp
Mar 26 04:58:01 PM UTC 2024 test result (no tests yet)
Mar 26 05:21:47 PM UTC 2024 uploaded transfer of eessi-2023.06-software-linux-aarch64-generic-1711472211.tar.gz to S3 bucket succeeded

@trz42
Copy link
Owner Author

trz42 commented Mar 26, 2024

Same build result as before. Re-setting bot:deploy label to verify if deployment code still works as intended.

@trz42 trz42 added bot:deploy Instruct bot to deploy built artefacts to Stratum 0 and removed bot:deploy Instruct bot to deploy built artefacts to Stratum 0 labels Mar 26, 2024
@trz42
Copy link
Owner Author

trz42 commented Mar 26, 2024

Some config settings weren't updated which lead to the error

[20240326-T18:10:40] WARNING: A crash occurred!
Traceback (most recent call last):
  File "/home/thomarob/bot-devel/test_sync_feb24/venv_bot_p310/lib/python3.10/site-packages/pyghee/lib.py", line 170, in process_event
    self.handle_event(event_info, log_file=log_file)
  File "/home/thomarob/bot-devel/test_sync_feb24/venv_bot_p310/lib/python3.10/site-packages/pyghee/lib.py", line 102, in handle_event
    handler(event_info, log_file=log_file)
  File "/home/thomarob/bot-devel/test_sync_feb24/eessi-bot-software-layer/eessi_bot_event_handler.py", line 382, in handle_pull_request_event
    handler(event_info, pr)
  File "/home/thomarob/bot-devel/test_sync_feb24/eessi-bot-software-layer/eessi_bot_event_handler.py", line 314, in handle_pull_request_labeled_event
    deploy_built_artefacts(pr, event_info)
  File "/home/thomarob/bot-devel/test_sync_feb24/eessi-bot-software-layer/tasks/deploy.py", line 589, in deploy_built_artefacts
    upload_artefact(job_dir, payload, timestamp, repo_name, pr.number, pr_comment_id)
  File "/home/thomarob/bot-devel/test_sync_feb24/eessi-bot-software-layer/tasks/deploy.py", line 291, in upload_artefact
    if artefact_prefix.lstrip().startswith('{'):
AttributeError: 'NoneType' object has no attribute 'lstrip'

@trz42 trz42 added bot:deploy Instruct bot to deploy built artefacts to Stratum 0 and removed bot:deploy Instruct bot to deploy built artefacts to Stratum 0 labels Mar 26, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

bot:deploy Instruct bot to deploy built artefacts to Stratum 0 development

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants