Skip to content
Open
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
2 changes: 2 additions & 0 deletions etcd/.codegen.yaml
Original file line number Diff line number Diff line change
@@ -0,0 +1,2 @@
swaggerdocs:
commentPolicy: Warn
46 changes: 46 additions & 0 deletions etcd/README.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,46 @@
# etcd.openshift.io API Group

This API group contains CRDs related to etcd cluster management. Specifically, this is only used for TNF (Two Node Fencing)
for gathering status updates from the node to ensure the cluster-admin is warned about unhealthy setups.

## API Versions

### v1alpha1

Contains the `PacemakerCluster` custom resource for monitoring Pacemaker cluster health in TNF (Two Node Fencing) deployments.

#### PacemakerCluster

- **Feature Gate**: None - this CRD is gated by cluster-etcd-operator start-up. It will only be created once a TNF cluster has transitioned to external etcd.
- **Component**: `two-node-fencing`
- **Scope**: Cluster-scoped singleton resource named "cluster"
- **Resource Path**: `pacemakerclusters.etcd.openshift.io`

The `PacemakerCluster` resource provides visibility into the health and status of a Pacemaker-managed cluster. It is periodically updated by the cluster-etcd-operator's status collector running as a privileged CronJob.

**Status Fields:**
- `lastUpdated` (required): Timestamp when status was last collected - used to detect stale data
- `summary`: High-level cluster health metrics
- `pacemakerDaemonState`: Running state (enum: `Running`, `KnownNotRunning`)
- `quorumStatus`: Quorum state (enum: `Quorate`, `NoQuorum`)
- `nodesOnline`, `nodesTotal`: Node counts (0-2)
- `resourcesStarted`, `resourcesTotal`: Resource counts (0-16)
- `nodes`: Detailed status of each node (1-2 nodes)
- Name, IPv4/IPv6 addresses, online status (enum), mode (enum: `Active`, `Standby`)
- `resources`: Detailed status of each resource (1-16 resources)
- Name, resource agent, role (enum: `Started`, `Stopped`), active status (enum), node assignment
- `nodeHistory`: Recent operation failures for troubleshooting (up to 16 entries, last 5 minutes)
- `fencingHistory`: Recent fencing events (up to 16 events, last 24 hours)
- Target node, action (enum: `reboot`, `off`, `on`), status (enum: `success`, `failed`, `pending`), completion timestamp
- `collectionError`: Any errors encountered during status collection (max 2KB)
- `rawXML`: Full XML output from `pcs status xml` for debugging (max 256KB)

**Design Principles:**
The API follows "Act on Deterministic Information":
- All fields except `lastUpdated` are optional
- Missing data indicates unknown state, not error
- Operator only acts on definitive information
- Unknown state preserves the last known health condition

**Usage:**
The cluster-etcd-operator healthcheck controller watches this resource and updates operator conditions based on the cluster state.
26 changes: 26 additions & 0 deletions etcd/install.go
Original file line number Diff line number Diff line change
@@ -0,0 +1,26 @@
package etcd

import (
"k8s.io/apimachinery/pkg/runtime"
"k8s.io/apimachinery/pkg/runtime/schema"

v1alpha1 "github.com/openshift/api/etcd/v1alpha1"
)

const (
GroupName = "etcd.openshift.io"
)

var (
schemeBuilder = runtime.NewSchemeBuilder(v1alpha1.Install)
// Install is a function which adds every version of this group to a scheme
Install = schemeBuilder.AddToScheme
)

func Resource(resource string) schema.GroupResource {
return schema.GroupResource{Group: GroupName, Resource: resource}
}

func Kind(kind string) schema.GroupKind {
return schema.GroupKind{Group: GroupName, Kind: kind}
}
7 changes: 7 additions & 0 deletions etcd/v1alpha1/Makefile
Original file line number Diff line number Diff line change
@@ -0,0 +1,7 @@
.PHONY: verify-with-container
verify-with-container:
$(MAKE) -f ../../Makefile $@

.PHONY: update-with-container
update-with-container:
$(MAKE) -f ../../Makefile $@
43 changes: 43 additions & 0 deletions etcd/v1alpha1/README.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,43 @@
# etcd.openshift.io/v1alpha1

This API group contains types related to two-node fencing for etcd cluster management.

## PacemakerCluster

The `PacemakerCluster` CRD provides visibility into the health and status of Pacemaker-managed clusters in dual-replica (two-node) OpenShift deployments.

### Feature Gate

- **Feature Gate**: None - this CRD is gated by cluster-etcd-operator start-up. It will only be created once a TNF cluster has transitioned to external etcd.
- **Component**: `two-node-fencing`

### Usage

The PacemakerCluster resource is a cluster-scoped, status-only singleton named "cluster". It is periodically updated by a privileged controller that runs `pcs status xml` and parses the output into structured fields for health checking.

### Status Fields

- **LastUpdated** (required): Timestamp when status was last collected
- **Summary**: High-level cluster state including:
- `pacemakerDaemonState`: Running state of the pacemaker daemon (enum: `Running`, `KnownNotRunning`)
- `quorumStatus`: Whether cluster has quorum (enum: `Quorate`, `NoQuorum`)
- `nodesOnline`, `nodesTotal`: Node counts
- `resourcesStarted`, `resourcesTotal`: Resource counts
- **Nodes**: Detailed per-node status (name, IPv4/IPv6 addresses, online status, mode)
- **Resources**: Detailed per-resource status (name, resource agent type, role enum, active status, node assignment)
- **NodeHistory**: Recent operation history for troubleshooting (operation failures within last 5 minutes)
- **FencingHistory**: Recent fencing events (events within last 24 hours)
- **RawXML**: Complete XML output from `pcs status xml` (for debugging only, max 256KB)
- **CollectionError**: Any errors encountered during status collection

### Design Principles

The API follows a "Design Principle: Act on Deterministic Information" approach:
- Almost all fields are optional except `lastUpdated`
- Missing data means "unknown" not "error"
- The operator only transitions between PacemakerHealthy and PacemakerDegraded states based on deterministic information
- When information is unavailable, the last known state is preserved

### Notes

The spec field is reserved but unused - all meaningful data is in the status subresource.
8 changes: 8 additions & 0 deletions etcd/v1alpha1/doc.go
Original file line number Diff line number Diff line change
@@ -0,0 +1,8 @@
// +k8s:deepcopy-gen=package,register
// +k8s:defaulter-gen=TypeMeta
// +k8s:openapi-gen=true
// +openshift:featuregated-schema-gen=true

// +kubebuilder:validation:Optional
// +groupName=etcd.openshift.io
package v1alpha1
39 changes: 39 additions & 0 deletions etcd/v1alpha1/register.go
Original file line number Diff line number Diff line change
@@ -0,0 +1,39 @@
package v1alpha1

import (
metav1 "k8s.io/apimachinery/pkg/apis/meta/v1"
"k8s.io/apimachinery/pkg/runtime"
"k8s.io/apimachinery/pkg/runtime/schema"
)

var (
GroupName = "etcd.openshift.io"
GroupVersion = schema.GroupVersion{Group: GroupName, Version: "v1alpha1"}
schemeBuilder = runtime.NewSchemeBuilder(addKnownTypes)
// Install is a function which adds this version to a scheme
Install = schemeBuilder.AddToScheme

// SchemeGroupVersion generated code relies on this name
// Deprecated
SchemeGroupVersion = GroupVersion
// AddToScheme exists solely to keep the old generators creating valid code
// DEPRECATED
AddToScheme = schemeBuilder.AddToScheme
)

// Resource generated code relies on this being here, but it logically belongs to the group
// DEPRECATED
func Resource(resource string) schema.GroupResource {
return schema.GroupResource{Group: GroupName, Resource: resource}
}

func addKnownTypes(scheme *runtime.Scheme) error {
metav1.AddToGroupVersion(scheme, GroupVersion)

scheme.AddKnownTypes(GroupVersion,
&PacemakerCluster{},
&PacemakerClusterList{},
)

return nil
}
Original file line number Diff line number Diff line change
@@ -0,0 +1,18 @@
apiVersion: apiextensions.k8s.io/v1 # Hack because controller-gen complains if we don't have this
name: "PacemakerCluster"
crdName: pacemakerclusters.etcd.openshift.io
tests:
onCreate:
- name: Should be able to create a minimal PacemakerCluster
initial: |
apiVersion: etcd.openshift.io/v1alpha1
kind: PacemakerCluster
metadata:
name: cluster
spec: {}
expected: |
apiVersion: etcd.openshift.io/v1alpha1
kind: PacemakerCluster
metadata:
name: cluster
spec: {}
Loading