Skip to content

Conversation

@iximeow
Copy link
Member

@iximeow iximeow commented Apr 24, 2025

this is probably the more exciting part of the issues outlined in #6944. the changes here get us to the point that for both internal and external DNS, we have:

  • A/AAAA records for the DNS servers in the internal/external group (named ns1.<zone>, ns2.<zone>, ...)
  • NS records for those servers at the zone apex, one for each of the ns*.<zone> described above
  • an SOA record synthesized on-demand for the zone apex for each of oxide.internal (for internal DNS) and $delegated_domain (for external DNS)
  • the SOA's serial is updated whenever the zone is changed. serial numbers are effectively the DNS config generation, so they start from 1 and tick upward with each change. this is different from most SOA serial schemes (in particular the ones that would use YYYYMMDDNN numbering schemes) but so far as i can tell this is consistent with RFC 1035 requirements.

we do not support zone transfers here. i believe the SOA record here would be reasonable to guide zone transfers if we did, but obviously that's not something i've tested.

SOA fields

the SOA record's RNAME is hardcoded to admin@<zone_name>. this is out of expediency to provide something, but it's probably wrong most of the time. there's no way to get an MX record installed for <zone_name> in the rack's external DNS servers, so barring DNS hijinks in the deployed environment, this will be a dead address. problems here are:

  • we would want to take in an administrative email at rack setup time, so that would be minor plumbing
  • more importantly, what to backfill this with for deployed systems?

it seems like the best answer here is to allow configuration of the rack's delegated domain and zone after initial setup, and being able to update an administrative email would fit in pretty naturally there. but we don't have that right now, so admin@ it is. configuration of external DNS is probably more important in the context of zone transfers and permitting a list of remote addresses to whom we're willing to permit zone transfers. so it feels like this is in the API's future at some point.

bonus

one minorly interesting observation along the way is that external DNS servers in particular are reachable at a few addresses - whichever public address they get in the rack's internal address range, and whichever address they get in the external address range. the public address is what's used for A/AAAA records. so, if you're looking around from inside a DNS zone you can get odd-looking answers like:

# 172.30.1.5 is the internal address that an external DNS server is bound to.
# oxide.test is the delegated domain for this local Omicron deployment.
root@oxz_external_dns_68c5e255:~# dig +short ns2.oxide.test @172.30.1.5
192.168.0.161
root@oxz_external_dns_68c5e255:~# dig +short soa oxide.test @172.30.1.5
ns1.oxide.test. admin.oxide.test. 2 3600 600 18000 150
root@oxz_external_dns_68c5e255:~# dig +short ns oxide.test @172.30.1.5
ns1.oxide.test.
ns2.oxide.test.
# 192.168.0.160 is an external address for this same server.
# there are no records referencing 172.30.1.5 here.
root@oxz_external_dns_68c5e255:~# dig +short ns oxide.test @192.168.0.160
ns1.oxide.test.
ns2.oxide.test.
root@oxz_external_dns_68c5e255:~# dig +short ns1.oxide.test @192.168.0.160
192.168.0.160

@iximeow iximeow added the release notes reminder to include this in the release notes label Apr 24, 2025
@iximeow iximeow force-pushed the ixi/dns-ns-and-soa branch 2 times, most recently from 842455b to f349290 Compare April 25, 2025 21:50
@iximeow iximeow force-pushed the ixi/dns-ns-and-soa branch from f349290 to fa47ab1 Compare April 25, 2025 22:08
Comment on lines +174 to +207
impl From<Srv> for DnsRecord {
fn from(srv: Srv) -> Self {
DnsRecord::Srv(srv)
}
}

#[derive(
Clone,
Debug,
Serialize,
Deserialize,
JsonSchema,
PartialEq,
Eq,
PartialOrd,
Ord,
)]
pub struct Srv {
pub prio: u16,
pub weight: u16,
pub port: u16,
pub target: String,
}

impl From<v1::config::Srv> for Srv {
fn from(other: v1::config::Srv) -> Self {
Srv {
prio: other.prio,
weight: other.weight,
port: other.port,
target: other.target,
}
}
}
Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

the other option here is to use the v1::config::Srv type directly in v2, because it really has not changed. weaving the V1/V2 types together seems more difficult to think about generally, but i'm very open to the duplication being more confusing if folks feel that way.

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I would probably use the v1 types directly but I can see going either way.

@iximeow iximeow marked this pull request as ready for review May 1, 2025 21:58
Copy link
Collaborator

@davepacheco davepacheco left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Nice -- this is looking pretty good! I don't think anything here is a real blocker but it would be good to cleanup if we can.

zones: vec![dns_zone_blueprint],
time_created: chrono::Utc::now(),
generation: blueprint_generation.next(),
serial: new_dns_generation.as_u64().try_into().map_err(|_| {
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I see -- it looks like you split the difference here. The configuration distinguishes between "serial" and "generation", but this is the only place that sets them, and it always makes them the same. So we don't have to worry about maintaining a serial in lockstep with the generation when we update the database.

This seems fine.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

yeah, i really like the status quo that there is not a DnsConfigParams which can result in the DNS server failing to serve records. to maintain that either DnsConfigParams::generation should become a u32 (seems very wrong), or serial ends up a distinct u32.

#[derive(Clone, Debug, Serialize, Deserialize, JsonSchema, PartialEq, Eq)]
pub struct DnsConfigZone {
pub zone_name: String,
pub names: HashMap<String, Vec<DnsRecord>>,
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

super nitty and unimportant, but: I feel like records was more accurate. I guess I expect maps to be named either by what each key-value pair represents or what the value represents, not what the key represents. But now I wonder how universal that is!

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

i suppose i was thinking about this as: a "name" is the pair of a label and a collection of records, and we often happen to call the label a "name". that's not totally accurate, since the key here could be multiple labels anyway. but i agree with your instinct and this is why it didn't strike me as confusing at first :)

this was a simple change, i'll probably revert it and add a few comments on the relevant test asserts instead.

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

You could be right if people read "DNS name" to refer to the (label, records) pair. I tend to use that interchangeably with "label" but maybe that's wrong.

Anyway, not a big deal either way, though there's something to be said for not having different names for the same thing in two different API versions. Then again, we can probably remove API version 1 in the next release anyway.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

already reverted it! i expect i'm the outlier here, and either way it ends up ambiguous in some circumstances.

iximeow and others added 4 commits May 30, 2025 12:57
confusing name options abound. "names" is ambiguous with the keys,
"records" is ambiguous with the values, maybe it would be better to call
this "subdomains"???? but for now stick with what we've got and add some
clarifying comments.

This reverts commit ff63ea1.
* incorrect comments around the internal DNS expunge test
* internal DNS config does not need to track external DNS separately
@iximeow iximeow force-pushed the ixi/dns-ns-and-soa branch from 89ea63f to 7de2cc1 Compare May 31, 2025 01:41
Copy link
Collaborator

@davepacheco davepacheco left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks good! I think there were two items going to be done as follow-ups:

  • remove unneeded service_name arguments from some of the DnsConfigBuilder methods
  • use named constants for the API versions where we're defining the dropshot dynamic version policy

@iximeow iximeow merged commit 3e68262 into main Jun 2, 2025
18 checks passed
@iximeow iximeow deleted the ixi/dns-ns-and-soa branch June 2, 2025 23:07
iximeow added a commit that referenced this pull request Jun 4, 2025
This reverts commit 3e68262.

The change in 3e68262 does not handle upgrade sufficiently: when
internal DNS starts it tries to parse a `CurrentConfig` from a json blob
describing the previous server version's config. As Angela found on
dogfood, this means internal DNS will fail to start with an error about
"missing field `serial` at line 1 column ..."

We're reverting this to unblock dogfood, but will add this back in with
additional logic to handle this case.
iximeow added a commit that referenced this pull request Jun 4, 2025
This reverts commit 3e68262.

The change in 3e68262 does not handle upgrade sufficiently: when
internal DNS starts it tries to parse a `CurrentConfig` from a json blob
describing the previous server version's config. As Angela found on
dogfood, this means internal DNS will fail to start with an error about
"missing field `serial` at line 1 column ..."

We're reverting this to unblock dogfood, but will add this back in with
additional logic to handle this case.
iximeow added a commit that referenced this pull request Jun 5, 2025
PR #8047 comes with an unfortunate upgrade-only bug: before an upgrade,
a system's DNS servers would write out a configuration without the new
`serial` field added in 8047. When upgraded, the DNS servers would then
try to load that config, see it is missing a `serial` field, and error
for every query.

We expect to replace the previous-format configuration immediately after
upgrade by regenerating a blueprint for the current system and executing
it. But we should be able to use the previous-format configuration
anyway, so that DNS functions enough to get a control plane capable of
planning and executing that blueprint.
iximeow added a commit that referenced this pull request Jun 5, 2025
PR #8047 comes with an unfortunate upgrade-only bug: before an upgrade,
a system's DNS servers would write out a configuration without the new
`serial` field added in 8047. When upgraded, the DNS servers would then
try to load that config, see it is missing a `serial` field, and error
for every query.

We expect to replace the previous-format configuration immediately after
upgrade by regenerating a blueprint for the current system and executing
it. But we should be able to use the previous-format configuration
anyway, so that DNS functions enough to get a control plane capable of
planning and executing that blueprint.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

release notes reminder to include this in the release notes

Projects

None yet

Development

Successfully merging this pull request may close these issues.

4 participants