Skip to content

Log failing calls to health indicators #22632

@ttddyy

Description

@ttddyy

This is similar to #22509 to improve root cause analysis when probe endpoints returned non 200 response.

I am migrating k8s http probes to use readiness and liveness health group endpoints(/actuator/health/[readiness|liveness]).
When these endpoints return non UP status(other than 200 response), k8s stops traffic or shutdown the pod. When such event happens, k8s http probe only record the returned http status for the reason of its probe failure.
This makes hard to investigate WHY readiness/liveness probes returned non 200 response when somebody needs to investigate the failure reason later. Even if k8s could record body of probe response, it would be nicer to have such information in application log.

I wrote this implementation to our services to log information when health endpoints returns non UP response.

@Slf4j
public class LoggingHealthEndpointWebExtension extends HealthEndpointWebExtension {

	public LoggingHealthEndpointWebExtension(HealthContributorRegistry registry, HealthEndpointGroups groups) {
		super(registry, groups);
	}

	@Override
	public WebEndpointResponse<HealthComponent> health(ApiVersion apiVersion, SecurityContext securityContext,
			boolean showAll, String... path) {
		WebEndpointResponse<HealthComponent> response = super.health(apiVersion, securityContext, showAll, path);
		HealthComponent health = response.getBody();
		if (health == null) {
			return response;
		}

		Status status = health.getStatus();
		if (status != Status.UP) {
			Map<String, HealthComponent> components = new TreeMap<>();
			if (health instanceof CompositeHealth) {
				Map<String, HealthComponent> details = ((CompositeHealth) health).getComponents();
				if (details != null) {
					components.putAll(details);
				}
			}
			log.warn("Health endpoints {} returned {}. components={}", path, status, components);
		}

		return response;
	}

}

If HealthEndpointSupport could have logging capability (or HealthEndpointWebExtension and ReactiveHealthEndpointWebExtension for web only), then we don't need to have this custom implementation.

Something like:

boolean enableLogging;

if(this.enableLogging && health.getStatus() != Status.UP) {
  log.warn(...);
}

Metadata

Metadata

Assignees

Labels

Type

No type

Projects

No projects

Milestone

Relationships

None yet

Development

No branches or pull requests

Issue actions