-
Notifications
You must be signed in to change notification settings - Fork 53
Description
How Omicron manages storage
Zones generally store data in one of two ways:
- "Anywhere within their zone filesystem" - this is the case for data such as logs, tmp directories within zones, etc
- "A durable dataset mounted at a well-known location" - this is the case for zones like Crucible, CockroachDB, and Clickhouse, which explicitly manage storage that should be durable across reboots
How DNS manages storage
The internal and external DNS services are configured to use the "anywhere within their zone filesystem" storage.
omicron/smf/internal-dns/config.toml
Line 14 in 7e8273e
storage_path = "/var/oxide/dns" |
omicron/smf/external-dns/config.toml
Line 14 in 7e8273e
storage_path = "/var/oxide/dns" |
In general, this data acting as a cache from Nexus is fine - after all, Nexus periodically updates these values, bumps a generation number, and should ensure that a (redundant!) number of DNS servers are storing this information.
However in the case of cold boot, it's pretty important that at least one internal DNS server boots, to provide the necessary machinery for:
- The CockroachDB instances to boot up and find each other, and
- Also for Nexus to be able to find CockroachDB
Why would this be a problem
@jclulow has identified that the "service model" for zones means that we should be able to delete/recreate zone filesystems arbitrarily, and recreate them with persistent dataset, to recreate a stable state after cold boot.
Personally, I'm on board with this, as it means there's less "reconstruction" logic the Sled Agent needs to perform after reboot (e.g., parsing of vnics, IP interfaces, etc). We can just set everything as temporary, and recreate what we need, relying only on datasets to be durable.