-
Notifications
You must be signed in to change notification settings - Fork 62
Closed
Description
I made a new bundle on rack3 (the first time ever in this environment) and its state didn't not progress beyond collecting after many hours.
oxide --profile colo bundle list
[
{
"id": "c6b507df-cb67-47c4-8887-ba9fc0fc0034",
"reason_for_creation": "Created by external API",
"state": "collecting",
"time_created": "2025-07-09T01:01:12.246530Z"
}
]
I looked up the bundle's dataset info from the database
root@[fd00:1122:3344:116::3]:32221/omicron> select * from support_bundle;
id | time_created | reason_for_creation | reason_for_failure | state | zpool_id | dataset_id | assigned_nexus
---------------------------------------+------------------------------+-------------------------+--------------------+------------+--------------------------------------+--------------------------------------+---------------------------------------
c6b507df-cb67-47c4-8887-ba9fc0fc0034 | 2025-07-09 01:01:12.24653+00 | Created by external API | NULL | collecting | de682b18-afaf-4d53-b62e-934f6bd4a1f8 | 003d27e0-57e4-4d55-963e-af47e4e526f1 | 95ebe94d-0e68-421d-9260-c30bd7fe4bd6
(1 row)
The dataset that was supposed to receive the bundle remained empty:
BRM42220015 # ls -l /pool/ext/de682b18-afaf-4d53-b62e-934f6bd4a1f8/crypt/debug/c6b507df-cb67-47c4-8887-ba9fc0fc0034/
total 0
The assigned nexus log showed that the collector background task was doing the work:
angela@castle /staff/angela $ grep c6b507df oxide-nexus.log.1752023701 | head -30 | looker
01:01:18.330Z INFO 95ebe94d-0e68-421d-9260-c30bd7fe4bd6 (ServerContext): SupportBundleCollector: Found bundle to collect
background_task = support_bundle_collector
bundle = c6b507df-cb67-47c4-8887-ba9fc0fc0034
bundles_in_queue = 1
file = nexus/src/app/background/tasks/support_bundle_collector.rs:364
01:01:18.330Z INFO 95ebe94d-0e68-421d-9260-c30bd7fe4bd6 (ServerContext): Collecting bundle as local file
background_task = support_bundle_collector
bundle = c6b507df-cb67-47c4-8887-ba9fc0fc0034
file = nexus/src/app/background/tasks/support_bundle_collector.rs:562
01:01:18.367Z DEBG 95ebe94d-0e68-421d-9260-c30bd7fe4bd6 (ServerContext): client request
background_task = support_bundle_collector
body = None
bundle = c6b507df-cb67-47c4-8887-ba9fc0fc0034
method = GET
uri = http://[fd00:1122:3344:11f::2]:12225/local/all-sp-ids
01:01:18.367Z DEBG 95ebe94d-0e68-421d-9260-c30bd7fe4bd6 (ServerContext): client response
background_task = support_bundle_collector
bundle = c6b507df-cb67-47c4-8887-ba9fc0fc0034
result = Ok(Response { url: "http://[fd00:1122:3344:11f::2]:12225/local/all-sp-ids", status: 200, headers: {"content-type": "application/json", "x-request-id": "46a2fcee-08ee-49a9-8f72-4329dd215192", "content-length": "929", "date": "Wed, 09 Jul 2025 01:01:17 GMT"} })
01:01:18.367Z DEBG 95ebe94d-0e68-421d-9260-c30bd7fe4bd6 (ServerContext): client request
background_task = support_bundle_collector
body = None
bundle = c6b507df-cb67-47c4-8887-ba9fc0fc0034
method = GET
uri = http://[fd00:1122:3344:11f::2]:12225/sp/sled/7/task-dump
01:01:18.367Z DEBG 95ebe94d-0e68-421d-9260-c30bd7fe4bd6 (ServerContext): client request
background_task = support_bundle_collector
body = None
bundle = c6b507df-cb67-47c4-8887-ba9fc0fc0034
method = GET
uri = http://[fd00:1122:3344:11f::2]:12225/sp/sled/24/task-dump
01:01:18.367Z DEBG 95ebe94d-0e68-421d-9260-c30bd7fe4bd6 (ServerContext): client request
background_task = support_bundle_collector
body = None
bundle = c6b507df-cb67-47c4-8887-ba9fc0fc0034
method = GET
uri = http://[fd00:1122:3344:11f::2]:12225/sp/switch/0/task-dump
01:01:18.367Z DEBG 95ebe94d-0e68-421d-9260-c30bd7fe4bd6 (ServerContext): client request
background_task = support_bundle_collector
body = None
bundle = c6b507df-cb67-47c4-8887-ba9fc0fc0034
method = GET
uri = http://[fd00:1122:3344:11f::2]:12225/sp/sled/28/task-dump
01:01:18.367Z DEBG 95ebe94d-0e68-421d-9260-c30bd7fe4bd6 (ServerContext): client request
background_task = support_bundle_collector
body = None
bundle = c6b507df-cb67-47c4-8887-ba9fc0fc0034
method = GET
uri = http://[fd00:1122:3344:11f::2]:12225/sp/sled/20/task-dump
01:01:18.367Z DEBG 95ebe94d-0e68-421d-9260-c30bd7fe4bd6 (ServerContext): client request
background_task = support_bundle_collector
body = None
bundle = c6b507df-cb67-47c4-8887-ba9fc0fc0034
method = GET
uri = http://[fd00:1122:3344:11f::2]:12225/sp/switch/1/task-dump
01:01:18.367Z DEBG 95ebe94d-0e68-421d-9260-c30bd7fe4bd6 (ServerContext): client request
background_task = support_bundle_collector
body = None
bundle = c6b507df-cb67-47c4-8887-ba9fc0fc0034
method = GET
uri = http://[fd00:1122:3344:11f::2]:12225/sp/sled/29/task-dump
01:01:18.367Z DEBG 95ebe94d-0e68-421d-9260-c30bd7fe4bd6 (ServerContext): client request
background_task = support_bundle_collector
body = None
bundle = c6b507df-cb67-47c4-8887-ba9fc0fc0034
method = GET
uri = http://[fd00:1122:3344:11f::2]:12225/sp/sled/22/task-dump
01:01:18.367Z DEBG 95ebe94d-0e68-421d-9260-c30bd7fe4bd6 (ServerContext): client request
background_task = support_bundle_collector
body = None
bundle = c6b507df-cb67-47c4-8887-ba9fc0fc0034
method = GET
uri = http://[fd00:1122:3344:11f::2]:12225/sp/sled/0/task-dump
01:01:18.367Z DEBG 95ebe94d-0e68-421d-9260-c30bd7fe4bd6 (ServerContext): client request
background_task = support_bundle_collector
body = None
bundle = c6b507df-cb67-47c4-8887-ba9fc0fc0034
method = GET
uri = http://[fd00:1122:3344:11f::2]:12225/sp/sled/5/task-dump
01:01:18.367Z DEBG 95ebe94d-0e68-421d-9260-c30bd7fe4bd6 (ServerContext): client request
background_task = support_bundle_collector
body = None
bundle = c6b507df-cb67-47c4-8887-ba9fc0fc0034
method = GET
uri = http://[fd00:1122:3344:11f::2]:12225/sp/sled/4/task-dump
01:01:18.367Z DEBG 95ebe94d-0e68-421d-9260-c30bd7fe4bd6 (ServerContext): client request
background_task = support_bundle_collector
body = None
bundle = c6b507df-cb67-47c4-8887-ba9fc0fc0034
method = GET
uri = http://[fd00:1122:3344:11f::2]:12225/sp/sled/6/task-dump
01:01:18.367Z DEBG 95ebe94d-0e68-421d-9260-c30bd7fe4bd6 (ServerContext): client request
background_task = support_bundle_collector
body = None
bundle = c6b507df-cb67-47c4-8887-ba9fc0fc0034
method = GET
uri = http://[fd00:1122:3344:11f::2]:12225/sp/sled/9/task-dump
01:01:18.367Z DEBG 95ebe94d-0e68-421d-9260-c30bd7fe4bd6 (ServerContext): client request
background_task = support_bundle_collector
body = None
bundle = c6b507df-cb67-47c4-8887-ba9fc0fc0034
method = GET
uri = http://[fd00:1122:3344:11f::2]:12225/sp/power/0/task-dump
01:01:18.368Z DEBG 95ebe94d-0e68-421d-9260-c30bd7fe4bd6 (ServerContext): client request
background_task = support_bundle_collector
body = None
bundle = c6b507df-cb67-47c4-8887-ba9fc0fc0034
method = GET
uri = http://[fd00:1122:3344:11f::2]:12225/sp/sled/31/task-dump
01:01:18.368Z DEBG 95ebe94d-0e68-421d-9260-c30bd7fe4bd6 (ServerContext): client request
background_task = support_bundle_collector
body = None
bundle = c6b507df-cb67-47c4-8887-ba9fc0fc0034
method = GET
uri = http://[fd00:1122:3344:11f::2]:12225/sp/sled/13/task-dump
01:01:18.373Z DEBG 95ebe94d-0e68-421d-9260-c30bd7fe4bd6 (ServerContext): client response
background_task = support_bundle_collector
bundle = c6b507df-cb67-47c4-8887-ba9fc0fc0034
result = Ok(Response { url: "http://[fd00:1122:3344:11f::2]:12225/sp/sled/7/task-dump", status: 200, headers: {"content-type": "application/json", "x-request-id": "bf0a7286-96f8-41aa-9ee7-3dc58ee4f194", "content-length": "1", "date": "Wed, 09 Jul 2025 01:01:18 GMT"} })
01:01:18.373Z DEBG 95ebe94d-0e68-421d-9260-c30bd7fe4bd6 (ServerContext): client response
background_task = support_bundle_collector
bundle = c6b507df-cb67-47c4-8887-ba9fc0fc0034
result = Ok(Response { url: "http://[fd00:1122:3344:11f::2]:12225/sp/power/0/task-dump", status: 200, headers: {"content-type": "application/json", "x-request-id": "b8704e84-5aeb-43e8-8772-63ecfec50588", "content-length": "1", "date": "Wed, 09 Jul 2025 01:01:18 GMT"} })
01:01:18.373Z DEBG 95ebe94d-0e68-421d-9260-c30bd7fe4bd6 (ServerContext): client request
background_task = support_bundle_collector
body = None
bundle = c6b507df-cb67-47c4-8887-ba9fc0fc0034
method = GET
uri = http://[fd00:1122:3344:11f::2]:12225/sp/sled/3/task-dump
01:01:18.373Z DEBG 95ebe94d-0e68-421d-9260-c30bd7fe4bd6 (ServerContext): client request
background_task = support_bundle_collector
body = None
bundle = c6b507df-cb67-47c4-8887-ba9fc0fc0034
method = GET
uri = http://[fd00:1122:3344:11f::2]:12225/sp/sled/2/task-dump
01:01:18.373Z DEBG 95ebe94d-0e68-421d-9260-c30bd7fe4bd6 (ServerContext): client response
background_task = support_bundle_collector
bundle = c6b507df-cb67-47c4-8887-ba9fc0fc0034
result = Ok(Response { url: "http://[fd00:1122:3344:11f::2]:12225/sp/sled/28/task-dump", status: 200, headers: {"content-type": "application/json", "x-request-id": "4c05f0a6-8541-40c1-b34d-0366f22fa767", "content-length": "1", "date": "Wed, 09 Jul 2025 01:01:18 GMT"} })
01:01:18.373Z DEBG 95ebe94d-0e68-421d-9260-c30bd7fe4bd6 (ServerContext): client response
background_task = support_bundle_collector
bundle = c6b507df-cb67-47c4-8887-ba9fc0fc0034
result = Ok(Response { url: "http://[fd00:1122:3344:11f::2]:12225/sp/sled/5/task-dump", status: 200, headers: {"content-type": "application/json", "x-request-id": "838690ad-3c40-4bd7-90a8-5f416ecc19ad", "content-length": "1", "date": "Wed, 09 Jul 2025 01:01:18 GMT"} })
01:01:18.373Z DEBG 95ebe94d-0e68-421d-9260-c30bd7fe4bd6 (ServerContext): client response
background_task = support_bundle_collector
bundle = c6b507df-cb67-47c4-8887-ba9fc0fc0034
result = Ok(Response { url: "http://[fd00:1122:3344:11f::2]:12225/sp/sled/6/task-dump", status: 200, headers: {"content-type": "application/json", "x-request-id": "3d38a033-ce0f-4587-b977-a16ca9e86162", "content-length": "1", "date": "Wed, 09 Jul 2025 01:01:18 GMT"} })
01:01:18.373Z DEBG 95ebe94d-0e68-421d-9260-c30bd7fe4bd6 (ServerContext): client response
background_task = support_bundle_collector
bundle = c6b507df-cb67-47c4-8887-ba9fc0fc0034
result = Ok(Response { url: "http://[fd00:1122:3344:11f::2]:12225/sp/sled/29/task-dump", status: 200, headers: {"content-type": "application/json", "x-request-id": "1ab05012-1d88-4098-92b4-6132508078e9", "content-length": "1", "date": "Wed, 09 Jul 2025 01:01:18 GMT"} })
01:01:18.373Z DEBG 95ebe94d-0e68-421d-9260-c30bd7fe4bd6 (ServerContext): client response
background_task = support_bundle_collector
bundle = c6b507df-cb67-47c4-8887-ba9fc0fc0034
result = Ok(Response { url: "http://[fd00:1122:3344:11f::2]:12225/sp/sled/0/task-dump", status: 200, headers: {"content-type": "application/json", "x-request-id": "78360e15-080d-4f1a-abeb-a847c66cd10c", "content-length": "1", "date": "Wed, 09 Jul 2025 01:01:18 GMT"} })
01:01:18.373Z DEBG 95ebe94d-0e68-421d-9260-c30bd7fe4bd6 (ServerContext): client response
background_task = support_bundle_collector
bundle = c6b507df-cb67-47c4-8887-ba9fc0fc0034
result = Ok(Response { url: "http://[fd00:1122:3344:11f::2]:12225/sp/switch/0/task-dump", status: 200, headers: {"content-type": "application/json", "x-request-id": "c92d82c0-63c2-4500-be40-98fcd18a0016", "content-length": "1", "date": "Wed, 09 Jul 2025 01:01:18 GMT"} })
In between, the collector hit some errors:
01:01:33.380Z ERRO 95ebe94d-0e68-421d-9260-c30bd7fe4bd6 (ServerContext): failed to capture task dumps
background_task = support_bundle_collector
bundle = c6b507df-cb67-47c4-8887-ba9fc0fc0034
error = SP power 1: failed to get task dump count from SP: Error Response: status: 503 Service Unavailable; headers: {"content-type": "application/json", "x-request-id": "424eb413-81c0-4c33-82c0-0a029020114d", "content-length": "198", "date": "Wed, 09 Jul 2025 01:01:20 GMT"}; value: Error { error_code: Some("SpCommunicationFailed"), message: "error communicating with SP SpIdentifier { typ: Power, slot: 1 }: no SP discovered", request_id: "424eb413-81c0-4c33-82c0-0a029020114d" }
file = nexus/src/app/background/tasks/support_bundle_collector.rs:1031
01:01:33.380Z ERRO 95ebe94d-0e68-421d-9260-c30bd7fe4bd6 (ServerContext): failed to capture task dumps
background_task = support_bundle_collector
bundle = c6b507df-cb67-47c4-8887-ba9fc0fc0034
error = SP sled 24: failed to get task dump count from SP: Error Response: status: 503 Service Unavailable; headers: {"content-type": "application/json", "x-request-id": "be60285f-baf5-4b8c-b419-6ac16e4efc58", "content-length": "198", "date": "Wed, 09 Jul 2025 01:01:20 GMT"}; value: Error { error_code: Some("SpCommunicationFailed"), message: "error communicating with SP SpIdentifier { typ: Sled, slot: 24 }: no SP discovered", request_id: "be60285f-baf5-4b8c-b419-6ac16e4efc58" }
file = nexus/src/app/background/tasks/support_bundle_collector.rs:1031
01:01:33.381Z ERRO 95ebe94d-0e68-421d-9260-c30bd7fe4bd6 (ServerContext): failed to capture task dumps
background_task = support_bundle_collector
bundle = c6b507df-cb67-47c4-8887-ba9fc0fc0034
error = SP sled 9: failed to get task dump count from SP: Error Response: status: 503 Service Unavailable; headers: {"content-type": "application/json", "x-request-id": "d6f417ea-c5ec-419d-bedf-e006360a9ebf", "content-length": "197", "date": "Wed, 09 Jul 2025 01:01:20 GMT"}; value: Error { error_code: Some("SpCommunicationFailed"), message: "error communicating with SP SpIdentifier { typ: Sled, slot: 9 }: no SP discovered", request_id: "d6f417ea-c5ec-419d-bedf-e006360a9ebf" }
file = nexus/src/app/background/tasks/support_bundle_collector.rs:1031
01:01:33.381Z ERRO 95ebe94d-0e68-421d-9260-c30bd7fe4bd6 (ServerContext): failed to capture task dumps
background_task = support_bundle_collector
bundle = c6b507df-cb67-47c4-8887-ba9fc0fc0034
error = SP sled 22: failed to get task dump count from SP: Error Response: status: 503 Service Unavailable; headers: {"content-type": "application/json", "x-request-id": "0b44e793-3d95-4774-bbf4-f2628d3cae40", "content-length": "224", "date": "Wed, 09 Jul 2025 01:01:31 GMT"}; value: Error { error_code: Some("SpCommunicationFailed"), message: "error communicating with SP SpIdentifier { typ: Sled, slot: 22 }: RPC call failed (gave up after 5 attempts)", request_id: "0b44e793-3d95-4774-bbf4-f2628d3cae40" }
file = nexus/src/app/background/tasks/support_bundle_collector.rs:1031
01:01:33.381Z ERRO 95ebe94d-0e68-421d-9260-c30bd7fe4bd6 (ServerContext): failed to capture task dumps
background_task = support_bundle_collector
bundle = c6b507df-cb67-47c4-8887-ba9fc0fc0034
error = SP sled 4: failed to get task dump count from SP: Communication Error: error sending request for url (http://[fd00:1122:3344:11f::2]:12225/sp/sled/4/task-dump): error sending request for url (http://[fd00:1122:3344:11f::2]:12225/sp/sled/4/task-dump): operation timed out
file = nexus/src/app/background/tasks/support_bundle_collector.rs:1031
01:01:33.381Z ERRO 95ebe94d-0e68-421d-9260-c30bd7fe4bd6 (ServerContext): failed to capture task dumps
background_task = support_bundle_collector
bundle = c6b507df-cb67-47c4-8887-ba9fc0fc0034
error = SP sled 18: failed to get task dump count from SP: Communication Error: error sending request for url (http://[fd00:1122:3344:11f::2]:12225/sp/sled/18/task-dump): error sending request for url (http://[fd00:1122:3344:11f::2]:12225/sp/sled/18/task-dump): operation timed out
file = nexus/src/app/background/tasks/support_bundle_collector.rs:1031
01:01:33.381Z ERRO 95ebe94d-0e68-421d-9260-c30bd7fe4bd6 (ServerContext): failed to capture task dumps
background_task = support_bundle_collector
bundle = c6b507df-cb67-47c4-8887-ba9fc0fc0034
error = SP sled 30: failed to get task dump count from SP: Communication Error: error sending request for url (http://[fd00:1122:3344:11f::2]:12225/sp/sled/30/task-dump): error sending request for url (http://[fd00:1122:3344:11f::2]:12225/sp/sled/30/task-dump): operation timed out
file = nexus/src/app/background/tasks/support_bundle_collector.rs:1031
And it was supposedly completed after 6+ mins:
01:07:36.752Z INFO 95ebe94d-0e68-421d-9260-c30bd7fe4bd6 (ServerContext): Bundle Collection completed
background_task = support_bundle_collector
bundle = c6b507df-cb67-47c4-8887-ba9fc0fc0034
file = nexus/src/app/background/tasks/support_bundle_collector.rs:485
It's unclear if the errors caused the bundle to not be persisted or there were some other errors it hit that contributed to the stuck status.
Metadata
Metadata
Assignees
Labels
support-bundlesSupport bundlesSupport bundles