-
Notifications
You must be signed in to change notification settings - Fork 576
OCPEDGE-2084: Add PacemakerStatus CRD for two-node fencing #2544
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: master
Are you sure you want to change the base?
Conversation
|
@jaypoulz: This pull request references OCPEDGE-2084 which is a valid jira issue. Warning: The referenced jira issue has an invalid target version for the target branch this PR targets: expected the story to target the "4.21.0" version, but no target version was set. In response to this:
Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the openshift-eng/jira-lifecycle-plugin repository. |
|
Hello @jaypoulz! Some important instructions when contributing to openshift/api: |
|
[APPROVALNOTIFIER] This PR is NOT APPROVED This pull-request has been approved by: The full list of commands accepted by this bot can be found here.
Needs approval from an approver in each of these files:
Approvers can indicate their approval by writing |
|
@jaypoulz: This pull request references OCPEDGE-2084 which is a valid jira issue. Warning: The referenced jira issue has an invalid target version for the target branch this PR targets: expected the story to target the "4.21.0" version, but no target version was set. In response to this:
Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the openshift-eng/jira-lifecycle-plugin repository. |
58218ce to
96e327f
Compare
2ba442d to
29b9fec
Compare
|
@jaypoulz thank you for the PR, do you mind making the CI happy? |
29b9fec to
26f7821
Compare
|
Hi @saschagrunert :) Working on it! :D |
|
A few open questions I have:
That said, it doesn't work like a normal config - there's no spec and it shouldn't be created during bootstrap. The CRD just needs to be present when the CEO runs an cronjob to post an update to it.
|
b0ff230 to
1b57b09
Compare
b9b727f to
fdd53e9
Compare
|
Yeah, I'll ignore the CI failures for now, running
I'm new to API review, but my gut feeling tells me that a dedicated
You can also try to run it in a container by
Do you mind elaborating on that? Do you mean generating the code for the unions? API docs ref: https://github.com/openshift/enhancements/blob/master/dev-guide/api-conventions.md#writing-a-union-in-go @jaypoulz is there an OpenShift enhancement available for this change? |
etcd/tnf/v1alpha1/tests/pacemakerstatuses.tnf.etcd.openshift.io/DualReplica.yaml
Outdated
Show resolved
Hide resolved
c620199 to
6b36b92
Compare
|
What I was asking about was: https://github.com/openshift/enhancements/blob/master/dev-guide/api-conventions.md#do-not-use-boolean-fields Do we usually provide constants for the non-boolean fields for easy reference? |
3bfc09e to
b505119
Compare
I saw the kinds of constants I was thinking about in the control plan topology type, so I decided to proceed in that direction. Should be more obvious what I meant now. :) |
|
@saschagrunert CI is happy 🥹 |
b505119 to
6db7ce8
Compare
|
Or at least it was when I wrote that comment - I decided to update the READMEs. 🤞 I didn't break anything |
|
/test okd-scos-e2e-aws-ovn |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thank you for the links! My review is mostly about docs and naming conventions. Let's enhance on that. 👍
etcd/.codegen.yaml
Outdated
| @@ -0,0 +1,3 @@ | |||
| swaggerdocs: | |||
| commentPolicy: Warn | |||
|
|
|||
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
nit: remove empty line
etcd/v1alpha1/doc.go
Outdated
| // +kubebuilder:validation:Optional | ||
| // +groupName=etcd.openshift.io | ||
| package v1alpha1 | ||
|
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
nit: remove that empty eol, tools like gofmt will complain about that.
| ) | ||
|
|
||
| // QuorumStatusType represents the quorum status of a Pacemaker cluster | ||
| // +kubebuilder:validation:Enum=Quorate;NoQuorum |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think we need to document the valid values, like:
| // +kubebuilder:validation:Enum=Quorate;NoQuorum | |
| // Valid values are Quorate (cluster has quorum) and NoQuorum (cluster does not have quorum). | |
| // +kubebuilder:validation:Enum=Quorate;NoQuorum |
The same applies to NodeOnlineStatusType, NodeModeType, ResourceActiveStatusType below.
| // status contains the actual pacemaker cluster status information collected from the cluster. | ||
| // +optional | ||
| Status *PacemakerStatusStatus `json:"status,omitempty"` |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Please document what happens when it's not present.
| // status contains the actual pacemaker cluster status information collected from the cluster. | |
| // +optional | |
| Status *PacemakerStatusStatus `json:"status,omitempty"` | |
| // status contains the actual pacemaker cluster status information collected from the cluster. | |
| // When not present, … | |
| // +optional | |
| Status *PacemakerStatusStatus `json:"status,omitempty"` |
| // lastUpdated is the timestamp when this status was last updated | ||
| // +optional | ||
| LastUpdated metav1.Time `json:"lastUpdated,omitempty"` |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Same here, in which case can this be not present? Please document it.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
this one convinced me that last updated should be required :)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Question - this is the only required field, but the linter is unhappy if I don't include omitempty on it. Am I doing something wrong?
| // pacemakerdState indicates if pacemaker is running | ||
| // +kubebuilder:validation:MinLength=1 | ||
| // +kubebuilder:validation:MaxLength=16 | ||
| // +optional | ||
| PacemakerdState string `json:"pacemakerdState,omitempty"` |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Consider an enum type here instead of just a string.
The same suggestion would apply to ResourceStatus.Role, NodeHistoryEntry.Operation, FencingEvent.Action and FencingEvent.Status.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
NodeHistoryEntry.Operation is not a great fit for this, because resource agents can define custom operations. I don't want to validate our way out of potentially helpful information. The others I think I can nail down.
| } | ||
|
|
||
| // PacemakerStatusStatus contains the actual pacemaker cluster status information | ||
| type PacemakerStatusStatus struct { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We could rename PacemakerStatus to Pacemaker and PacemakerStatusStatus to PacemakerStatus to avoid the doubled status status.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I decided to go with PacemakerCluster for the top-level object since pacemaker just felt wrong.
I prefixed the other possible conflict fields with pacemaker.
| } | ||
|
|
||
| // NodeStatus represents the status of a single node in the Pacemaker cluster | ||
| type NodeStatus struct { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Consider naming that PacemakerNodeStatus to avoid (future) conflicts.
| } | ||
|
|
||
| // ResourceStatus represents the status of a single resource in the Pacemaker cluster | ||
| type ResourceStatus struct { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Consider naming that PacemakerResourceStatus to avoid (future) conflicts.
|
|
||
| // PacemakerSummary provides a high-level summary of cluster state | ||
| type PacemakerSummary struct { | ||
| // pacemakerdState indicates if pacemaker is running |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Is the Pacemakerd referencing to a daemon? If so, we should probably name it PacemakerDaemonState to have a clearer naming.
f6f91ba to
4fb527a
Compare
3f45017 to
2fb0282
Compare
4d4fea5 to
6c69f2a
Compare
Introduces etcd.openshift.io/v1alpha1 API group with a PacemakerCluster custom resource. This provides visibility into Pacemaker cluster health for Two Node Fencing etcd deployments. The status-only resource is populated by a privileged controller and consumed by the cluster-etcd-operator healthcheck controller. This API is not gated because it's only created by CEO once the transition to an ExternalEtcd has occured.
6c69f2a to
cbe66c9
Compare
|
@jaypoulz: The following tests failed, say
Full PR test history. Your PR dashboard. Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes-sigs/prow repository. I understand the commands that are listed here. |
Introduces tnf.etcd.openshift.io/v1alpha1 API group with PacemakerStatus custom resource. This provides visibility into Pacemaker cluster health for dual-replica etcd deployments. The status-only resource is populated by a privileged controller and consumed by the cluster-etcd-operator healthcheck controller. Not gated because it's only used by CEO when two-node has transitioned.
Works in conjunction with openshift/cluster-etcd-operator#1487