-
Notifications
You must be signed in to change notification settings - Fork 62
[omdb] Policy to set which ClickHouse installation oximeter reads from #7522
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
plotnick
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks for tackling this, it looks like a great start!
I have two general naming suggestions. First, I think we should use read instead of reads everywhere. "Reads" sounds like a verb to me, and read/write seems like a more familiar pair of terms than reads/writes.
Second, I think (but please correct me if I'm mistaken) that your version field is basically a generation number, so I think it should be called generation (or something like that) and use the provided Generation API types. We seem to be converging on semantic versions everywhere, so I think the term version should probably be reserved for those.
| .authorize(authz::Action::Modify, &authz::BLUEPRINT_CONFIG) | ||
| .await?; | ||
|
|
||
| let num_inserted = if policy.version == 1 { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Again completely up to you, but I initially had this same kind of no-rows/some-rows split in #7518, and found that it simplified a lot of things to insert a default row (single_node in this case, probably) as part of the schema migration so that there was always guaranteed to be at least one row.
karencfv
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks for the review @plotnick!
I have two general naming suggestions. First, I think we should use read instead of reads everywhere. "Reads" sounds like a verb to me, and read/write seems like a more familiar pair of terms than reads/writes.
Yeah, that makes sense. Now that I look at it I don't really know why I used "reads" in the first place 😄
Second, I think (but please correct me if I'm mistaken) that your version field is basically a generation number, so I think it should be called generation (or something like that) and use the provided Generation API types. We seem to be converging on semantic versions everywhere, so I think the term version should probably be reserved for those.
Hm, I used "version" because I saw it being used in other places https://github.com/oxidecomputer/omicron/blob/main/nexus/db-model/src/deployment.rs#L113-L117 https://github.com/oxidecomputer/omicron/blob/main/nexus/db-model/src/deployment.rs#L60-L70 https://github.com/oxidecomputer/omicron/blob/main/nexus/db-model/src/clickhouse_policy.rs#L31-L37 🤔
andrewjstone
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Karen, this looks awesome! Great job!
I only had some minor comments.
|
|
||
| #[derive(Debug, Clone, Serialize, Deserialize, JsonSchema)] | ||
| pub struct OximeterReadPolicy { | ||
| pub version: u32, |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Seems like we should be using a Generation here as we do elsewhere.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Looks like it needs to be u32 because we later convert it to Sql32
| assert!(policy.version > 0); | ||
| let prev_version = policy.version - 1; | ||
|
|
||
| sql_query( |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Nice job!
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
thanks, but I basically copy/pasted it from yours lol https://github.com/oxidecomputer/omicron/blob/main/nexus/db-queries/src/db/datastore/clickhouse_policy.rs#L126-L134
nexus/internal-api/src/lib.rs
Outdated
| /// Get the current oximeter read policy | ||
| #[endpoint { | ||
| method = GET, | ||
| path = "/oximeter-read/policy" |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I don't think that oximeter-read really means anything without the word policy. I'd make it all one path like oximeter-read-policy. That way it's only a noun.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
yeah, that makes sense. I changed it to "/oximeter/read-policy" bc it seemed nicer though
|
Thanks for taking the time to review this PR @andrewjstone! |

Overview
Implements a new policy that can set whether oximeter reads from a single node, or the replicated cluster.
Usage
Within the switch zone, omdb can be used to switch between policies
Manual testing
After deploying Omicron locally for the first time we can see the oximeter read policy is set to single node and the oximeter-reader SRV record points to the same address clickhouse-native does
Now we deploy a clickhouse cluster, but do not change the oximeter read policy. Internal DNS remains at the same version it was
We verify that the oximeter-reader SRV record still points to the same address as clickhouse-native
Now we verify queries are being directed to the single node only.
I ran several queries through OxQL and we can see then in the clickana dashboard for the single node
In the clickana dashboard for the cluster we see no queries (the initial peak is when the cluster was created)
We now change the oximeter-reader policy to cluster, we can see interal DNS has changed too
We verify that the oximeter-reader SRV record still points to the same address as clickhouse-cluster-native
Now we verify queries are being directed to the cluster only.
I ran several queries through OxQL and we can see then in the clickana dashboard for the cluster
In the clickana dashboard for the single node we see no new queries (the peak in the middle are the previous queries)
Closes: #7420, #5999, #7338