-
Notifications
You must be signed in to change notification settings - Fork 62
Closed
Description
The live test we have for Nexus add/removal currently fails:
root@oxz_switch:~# TMPDIR=/var/tmp ./cargo-nextest nextest run --profile=live-tests --archive-file live-tests-archive/omicron-live-tests.tar.zst --workspace-remap live-tests-archive
Extracting 1 binary, 1 build script output directory, and 3 linked paths to /var/tmp/nextest-archive-UDJMeA
Extracted 46 files to /var/tmp/nextest-archive-UDJMeA in 1.07s
info: experimental features enabled: setup-scripts
------------
Nextest run ID 5b49c339-0df6-49d6-ab3c-8d8ed1b7df4d with nextest profile: live-tests
Starting 1 test across 1 binary
SLOW [> 60.000s] omicron-live-tests::test_nexus_add_remove test_nexus_add_remove
SLOW [>120.000s] omicron-live-tests::test_nexus_add_remove test_nexus_add_remove
FAIL [ 122.103s] omicron-live-tests::test_nexus_add_remove test_nexus_add_remove
---- STDOUT: omicron-live-tests::test_nexus_add_remove test_nexus_add_remove
running 1 test
test test_nexus_add_remove has been running for over 60 seconds
test test_nexus_add_remove ... FAILED
failures:
failures:
test_nexus_add_remove
test result: FAILED. 0 passed; 1 failed; 0 ignored; 0 measured; 0 filtered out; finished in 122.06s
---- STDERR: omicron-live-tests::test_nexus_add_remove test_nexus_add_remove
log file: /var/tmp/test_nexus_add_remove-3ad37aa113db9b44-test_nexus_add_remove.28927.0.log
note: configured to log to "/var/tmp/test_nexus_add_remove-3ad37aa113db9b44-test_nexus_add_remove.28927.0.log"
note: using DNS server for subnet fd00:1122:3344::/48
thread 'test_nexus_add_remove' panicked at live-tests/tests/test_nexus_add_remove.rs:180:6:
called `Result::unwrap()` on an `Err` value: TimedOut(60.063460023s)
note: run with `RUST_BACKTRACE=1` environment variable to display a backtrace
Pool dropped without invoking `terminate`
Cancelling due to test failure
------------
Summary [ 122.111s] 1 test run: 0 passed, 1 failed, 0 skipped
FAIL [ 122.103s] omicron-live-tests::test_nexus_add_remove test_nexus_add_remove
error: test run failed
This is timing out after 60s waiting for a Nexus instance to have recovered the saga that was running on the Nexus instance that the test just expunged.
The problem is: the current blueprint reflects that the Nexus instance is expunged, but is not yet ready for cleanup:
# omdb nexus blueprints show current | grep -i nexus
note: Nexus URL not specified. Will pick one from DNS.
note: using DNS server for subnet fd00:1122:3344::/48
note: (if this is not right, use --dns-server to specify an alternate DNS server)
note: using Nexus URL http://[fd00:1122:3344:101::6]:12221
oxp_ad5f9396-95d1-43cd-8109-17dbe94437f5/crypt/zone/oxz_nexus_7801b712-dbcd-476d-9aa8-5f188539a209 2e7954c2-85c6-4f08-80b7-5e16de7cfe9a expunged none none off
oxp_c53bb8e5-2cf4-4c0e-a943-609a824c60aa/crypt/zone/oxz_nexus_c6bc048f-bfef-40b0-9ebd-763d0714b9e0 be232f76-4156-406c-b511-95b69572f669 in service none none off
nexus 7801b712-dbcd-476d-9aa8-5f188539a209 install dataset expunged ⏳ fd00:1122:3344:103::21
nexus c6bc048f-bfef-40b0-9ebd-763d0714b9e0 install dataset in service fd00:1122:3344:103::5
oxp_d446e628-c624-4b0b-a617-627449e71681/crypt/zone/oxz_nexus_ae79633f-feee-48f2-b7ad-f14ce5a54e47 7aaa721d-c7b8-45e9-95d2-8dedd52c0f59 in service none none off
nexus ae79633f-feee-48f2-b7ad-f14ce5a54e47 install dataset in service fd00:1122:3344:101::6
oxp_ba8d35c8-c4b0-49e5-b3bc-87dbf005e05e/crypt/zone/oxz_nexus_83bd1f6d-11db-4642-bbc2-a4a4f69755df d5d4d9c0-6325-4a33-b532-d00a879cbbe9 in service none none off
nexus 83bd1f6d-11db-4642-bbc2-a4a4f69755df install dataset in service fd00:1122:3344:102::5
Note the ⏳ -- that means the zone is not yet ready for cleanup.
I expect this has been broken since #7713. After that PR, this test should wait first for an inventory collection to reflect the Nexus zone really gone, then generate a new blueprint that should show the zone ready for cleanup, then make that the target, and then wait for the saga to be recovered.
Metadata
Metadata
Assignees
Labels
No labels