Skip to content

blueprint_executor unable to backfill omicron zones after getting stuck in waiting for time sync #7824

@askfongjojo

Description

@askfongjojo

I expunged a disk on a sled in the dublin lab (sled 15) that was expected to lead to these diffs (fixed - were inverted originally):

    datasets:
    -----------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------
    dataset name                                                                                                dataset id                             disposition    quota     reservation   compression
    -----------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------
...  
*   oxp_342b9661-b423-4aca-8c63-531252fdc722/crypt/cockroachdb                                                  a042ad80-9378-462d-abe2-4ffbfcf1b781   - in service   none      none          off        
     └─                                                                                                                                                + expunged                                        
*   oxp_342b9661-b423-4aca-8c63-531252fdc722/crucible                                                           3465507a-8666-4b95-bab3-e6b1a41b13e7   - in service   none      none          off        
     └─                                                                                                                                                + expunged                                        
*   oxp_342b9661-b423-4aca-8c63-531252fdc722/crypt/clickhouse                                                   b77683cb-06d3-4ff8-8ea0-273e028aebdb   - in service   none      none          off        
     └─                                                                                                                                                + expunged                                        
*   oxp_342b9661-b423-4aca-8c63-531252fdc722/crypt/internal_dns                                                 ebedcdd9-251b-49b6-8bf3-08caf663e473   - in service   none      none          off        
     └─                                                                                                                                                + expunged                                        
*   oxp_342b9661-b423-4aca-8c63-531252fdc722/crypt/zone                                                         e9833406-4481-4b22-9ed1-8a576c1f389e   - in service   none      none          off        
     └─                                                                                                                                                + expunged                                        
*   oxp_342b9661-b423-4aca-8c63-531252fdc722/crypt/zone/oxz_clickhouse_fdeabeab-c7d9-4368-a6af-1a339a00310b     2e9965d7-8982-4f03-af6b-c83543da4388   - in service   none      none          off        
     └─                                                                                                                                                + expunged                                        
*   oxp_342b9661-b423-4aca-8c63-531252fdc722/crypt/zone/oxz_cockroachdb_84852644-b3a1-4816-8ea2-84892d885c39    dccf1567-cd53-4fcb-a755-e59d9fff122d   - in service   none      none          off        
     └─                                                                                                                                                + expunged                                        
*   oxp_342b9661-b423-4aca-8c63-531252fdc722/crypt/zone/oxz_crucible_d0ec1728-9b33-41cf-ad07-02f498d77247       c71dab11-3053-4610-92ab-1aefd12246a2   - in service   none      none          off        
     └─                                                                                                                                                + expunged                                        
*   oxp_342b9661-b423-4aca-8c63-531252fdc722/crypt/zone/oxz_internal_dns_fd45a90b-4670-4ecc-886a-b86e66ab1838   71301b1d-34ed-4e35-a74d-ddf1676188db   - in service   none      none          off        
     └─                                                                                                                                                + expunged                                        
*   oxp_342b9661-b423-4aca-8c63-531252fdc722/crypt/zone/oxz_ntp_ad78e840-9360-4b8d-9287-4dd19e6a51e1            943373eb-2190-4e04-a65c-e30480a9aa72   - in service   none      none          off        
     └─                                                                                                                                                + expunged                                        
*   oxp_342b9661-b423-4aca-8c63-531252fdc722/crypt/debug                                                        b8bd16f2-66b4-4e59-a483-9fb2fd06bf36   - in service   100 GiB   none          gzip-9     
     └─                                                                                                                                                + expunged                                        
+   oxp_14955990-3463-4734-a653-852250f6b275/crypt/cockroachdb                                                  5b00771f-f9d0-47ac-b014-4e2e6329bd42   in service     none      none          off        
+   oxp_14955990-3463-4734-a653-852250f6b275/crypt/clickhouse                                                   b2c0184b-ff2f-40f5-b387-c6ace38c1c29   in service     none      none          off        
+   oxp_14955990-3463-4734-a653-852250f6b275/crypt/zone/oxz_clickhouse_e0504fdd-5c45-4209-aa03-edafc8d94393     a3b7224b-131f-4172-8903-dafbe2c0057a   in service     none      none          off        
+   oxp_14955990-3463-4734-a653-852250f6b275/crypt/zone/oxz_cockroachdb_714185f6-3407-4366-a060-a4623c92a962    42298e40-b942-409d-8863-4fb318f95932   in service     none      none          off        
+   oxp_14955990-3463-4734-a653-852250f6b275/crypt/zone/oxz_ntp_558843e4-96ad-4bf6-b4c2-200bf0fb8520            157d682b-fe04-48e1-90d7-f1253fe844f8   expunged       none      none          off        
+   oxp_14955990-3463-4734-a653-852250f6b275/crypt/zone/oxz_ntp_e81dd898-d063-41b3-a13a-418a8c031345            21292d97-1166-4c0b-aa72-f4374735e38c   in service     none      none          off      


    omicron zones:
    ---------------------------------------------------------------------------------------------------------------
    zone type      zone id                                image source      disposition      underlay IP           
    ---------------------------------------------------------------------------------------------------------------
...
*   boundary_ntp   ad78e840-9360-4b8d-9287-4dd19e6a51e1   install dataset   - in service     fd00:1122:3344:102::10
     └─                                                                     + expunged ⏳                           
*   clickhouse     fdeabeab-c7d9-4368-a6af-1a339a00310b   install dataset   - in service     fd00:1122:3344:102::5 
     └─                                                                     + expunged ⏳                           
*   cockroach_db   84852644-b3a1-4816-8ea2-84892d885c39   install dataset   - in service     fd00:1122:3344:102::3 
     └─                                                                     + expunged ⏳                           
*   crucible       d0ec1728-9b33-41cf-ad07-02f498d77247   install dataset   - in service     fd00:1122:3344:102::6 
     └─                                                                     + expunged ⏳                           
*   internal_dns   fd45a90b-4670-4ecc-886a-b86e66ab1838   install dataset   - in service     fd00:1122:3344:2::1   
     └─                                                                     + expunged ⏳                           
+   boundary_ntp   e81dd898-d063-41b3-a13a-418a8c031345   install dataset   in service       fd00:1122:3344:102::22
+   clickhouse     e0504fdd-5c45-4209-aa03-edafc8d94393   install dataset   in service       fd00:1122:3344:102::23
+   cockroach_db   714185f6-3407-4366-a060-a4623c92a962   install dataset   in service       fd00:1122:3344:102::24
+   internal_ntp   558843e4-96ad-4bf6-b4c2-200bf0fb8520   install dataset   expunged ⏳       fd00:1122:3344:102::21

The executor ended with the following warnings:

root@oxz_switch0:~# omdb nexus background-tasks show blueprint_executor 
task: "blueprint_executor"
  configured period: every 1m
  currently executing: no
  last completed activation: iter 292, triggered by a periodic timer firing
    started at 2025-03-19T05:37:35.210Z (49s ago) and ran for 18866ms
    target blueprint: e7894fab-7808-4e40-b5db-025972456c79                                                                                                                                                                                                                                                                                                                                                                                   
    execution:        enabled                                                                                                                                                                                                                                                                                                                                                                                                                
    status:           completed (13 steps)                                                                                                                                                                                                                                                                                                                                                                                                   
    warning:          at: Deploy single-node clickhouse cluster: failed to initialize single-node clickhouse database: Communication Error: error sending request for url (http://[fd00:1122:3344:102::23]:8888/init)                                                                                                                                                                                         
...

The complete output is attached here.

blueprint_executor_output.txt

Here are some relevant error messages from the sled-agent on the sled involved:

06:36:35.697Z INFO SledAgent (dropshot (SledAgent)): request completed
    error_message_external = Service Unavailable
    error_message_internal = Time not yet synchronized
    file = /home/build/.cargo/registry/src/index.crates.io-1949cf8c6b5b557f/dropshot-0.16.0/src/server.rs:855
    latency_us = 505909
    local_addr = [fd00:1122:3344:102::1]:12345
    method = PUT
    remote_addr = [fd00:1122:3344:104::5]:35762
    req_id = 4f9c25ad-4130-4c29-828e-3d6e6f893b97
    response_code = 503
    uri = /omicron-config
06:36:36.942Z WARN SledAgent (ServiceManager): Zone failed to start
    file = sled-agent/src/services.rs:3391
    zone = oxz_cockroachdb_714185f6-3407-4366-a060-a4623c92a962
06:36:36.943Z WARN SledAgent (ServiceManager): Zone failed to start
    file = sled-agent/src/services.rs:3391
    zone = oxz_clickhouse_e0504fdd-5c45-4209-aa03-edafc8d94393

Metadata

Metadata

Assignees

No one assigned

    Labels

    expungeexpunge sled or disk issues

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions