Skip to content

Conversation

@AurelienRichez
Copy link
Contributor

@AurelienRichez AurelienRichez commented May 11, 2021

Description

This PR rewrite the healthcheck endpoint to have more precise info :

  • it checks that we have at least one peer (before it only checked that the service responded without an error, but not the content of the response itself)
  • it returns block numbers for best stored block, best known block and best block we are fetching
  • it returns a 500 when the state did not change change for too long
  • it returns the sync status (STARTING, SYNCING, SYNCED)

It can still be improved by returning the number of healthy peers instead of the raw number of peers.

Important Changes Introduced

The healthcheck endpoint is not backward compatible (for instance I renamed the field description to name because it seems more accurate)

two new parameters :

  • mantis.network.rpc.http.health.no-update-duration-threshold
  • mantis.network.rpc.http.health.syncing-status-threshold

The values of some parameters are given as a generic parameter "info" which is a string. I think it is alright for a healthcheck endpoint but we might want to consider if we want something more strongly typed.

Testing

You can call <node_url>/healthcheck to see the endpoint, and check that it moves correctly

The result should look like this :

{
  "checks": [
    {
      "name": "peerCount",
      "status": "OK",
      "info": "3"
    },
    {
      "name": "bestStoredBlock",
      "status": "OK",
      "info": "80000"
    },
    {
      "name": "bestKnownBlock",
      "status": "OK",
      "info": "12721559"
    },
    {
      "name": "bestFetchingBlock",
      "status": "OK",
      "info": "80000"
    },
    {
      "name": "updateStatus",
      "status": "OK"
    },
    {
      "name": "syncStatus",
      "status": "OK",
      "info": "SYNCING"
    }
  ]
}

Copy link
Contributor

@dzajkowski dzajkowski left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

looks good, a few small comments

@AurelienRichez AurelienRichez force-pushed the feature/ETCM-829-rewrite-healthcheck branch from 94c824a to 7e61707 Compare May 19, 2021 12:15
@AurelienRichez AurelienRichez force-pushed the feature/ETCM-829-rewrite-healthcheck branch from e26a38b to 05be6c8 Compare May 20, 2021 07:20
Aurélien Richez added 4 commits May 20, 2021 10:37
 - adds predicate to existing health checks
 - Adds a way to set additional info in the healthcheck
 - Adds updateStatus
 - Check if the server seems stuck for too long
 - Adds syncStatus
@AurelienRichez AurelienRichez force-pushed the feature/ETCM-829-rewrite-healthcheck branch from 05be6c8 to 4382059 Compare May 20, 2021 08:37
@AurelienRichez AurelienRichez merged commit aafb637 into develop May 20, 2021
@AurelienRichez AurelienRichez deleted the feature/ETCM-829-rewrite-healthcheck branch May 20, 2021 09:57
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

4 participants