-
Notifications
You must be signed in to change notification settings - Fork 25.6k
Introduce Maintenance Mode to ILM #31164
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
This PR introduces a concept of a maintenance mode for the
lifecycle service. During maintenance mode, no policies are
executed.
To be placed into maintenance mode, users must first issue a
request to be placed in maintenance mode. Once the service
is assured that no policies are in actions that are not to be
interrupted (like ShrinkAction), the service will place itself
in maintenance mode.
APIs introduced:
- POST _xpack/index_lifecycle/maintenance/_request
- issues a request to be placed into maintenenance mode.
This is not immediate, since we must first verify that
it is safe to go from REQUESTED -> IN maintenance mode.
- POST _xpack/index_lifecycle/maintenance/_stop
- issues a request to be taken out (this is immediate)
- GET _xpack/index_lifecycle/maintenance
- get back the current mode our lifecycle management is in
|
Pinging @elastic/es-core-infra |
|
Hi @colings86, I'd appreciate a review (even though it is WIP)! Things I have not implemented yet, that may or may not make sense in this PR:
Things that I found contentious:
any early feedback would be much appreciated, thanks! |
|
@colings86 for the sake of moving things forward (and since this is already a large enough PR), I will move the API work to follow-up PRs. A review for what has been done so far would be great! |
colings86
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I left some comments but I think it looks pretty good so far.
| import org.elasticsearch.xpack.core.indexlifecycle.ErrorStep; | ||
| import org.elasticsearch.xpack.core.indexlifecycle.InitializePolicyContextStep; | ||
| import org.elasticsearch.xpack.core.indexlifecycle.LifecycleSettings; | ||
| import org.elasticsearch.xpack.core.indexlifecycle.ShrinkAction; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I don't think this import is needed?
|
|
||
| boolean maintenanceModeToChange = currentMetadata.getMaintenanceMode().equals(mode) == false; | ||
| boolean maintenanceModeRequested = OperationMode.MAINTENANCE_REQUESTED.equals(mode); | ||
| boolean inMaintenanceMode = OperationMode.MAINTENANCE.equals(currentMetadata.getMaintenanceMode()); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We should do == checks on enums rather than .equals checks
| boolean inMaintenanceMode = OperationMode.MAINTENANCE.equals(currentMetadata.getMaintenanceMode()); | ||
| if ((inMaintenanceMode && maintenanceModeRequested) || maintenanceModeToChange == false) { | ||
| return currentState; | ||
| } |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think we also need to check that if mode == OperationMode.MAINTENANCE then currentMetadata.getMaintenanceMode() == OperationMode.MAINTENANCE_REQUESTED is also true so we don't end up with a race condition where the user requests NORMAL mode while we are iterating through the indices in IndexLifecycleService and when it finishes iterating it suddenly goes into MAINTENANCE mode ignoring the fact that the user cancelled the request
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
in other words, you can only enter maintenance mode if you were previously in maintenance-requested mode
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
yes
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
and in other words, the state transition for operation modes looks like this
<current_mode> → [<next_mode>, ...]
NORMAL → [NORMAL, MAINTENANCE_REQUESTED]
MAINTENANCE_REQUESTED → [NORMAL, MAINTENANCE]
MAINTENANCE → [NORMAL]
|
thanks @colings86, I've made the valid mode transitions clearer and more restrictive. |
colings86
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I left a comment about the valid mode changes but otherwise I think this is very close.
| static { | ||
| VALID_MODE_CHANGES.put(OperationMode.NORMAL, Sets.newHashSet(OperationMode.MAINTENANCE_REQUESTED)); | ||
| VALID_MODE_CHANGES.put(OperationMode.MAINTENANCE_REQUESTED, Sets.newHashSet(OperationMode.NORMAL, OperationMode.MAINTENANCE)); | ||
| VALID_MODE_CHANGES.put(OperationMode.MAINTENANCE, Sets.newHashSet(OperationMode.NORMAL)); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Instead of having these here, we could utilise the OperationMode enum and pass the valid next modes into each enum via a constructor then we could have a isValidChange(OperationMode) method on it which check the mode in the parameter exists in the list of valid modes on that enum value. This will mean that the OperationMode enum completely owns the state transitions
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
hmm, I tried this but it got just as complicated because the constructor elements cannot be forward references. and I preferred not to use strings.
Instead, I'll move all the logic to this isValidChange method. which will encapsulate the state transitions allowed using switch statements and conditionals
colings86
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM
- POST _xpack/index_lifecycle/maintenance/_request
- issues a request to be placed into maintenenance mode.
This is not immediate, since we must first verify that
it is safe to go from REQUESTED -> IN maintenance mode.
- POST _xpack/index_lifecycle/maintenance/_stop
- issues a request to be taken out (this is immediate)
- GET _xpack/index_lifecycle/maintenance
- get back the current mode our lifecycle management is in
- maintenance mode update task was hardened to support uninstalled metadata
- if no metadata is installed, the request/stop actions will install metadata
and proceed to try and change it (default start mode is NORMAL)
follow-up to elastic#31164.
- POST _xpack/index_lifecycle/_stop
- issues a request to be placed into STOPPED mode (maintenance mode).
This is not immediate, since we must first verify that
it is safe to go from STOPPING -> STOPPED.
- POST _xpack/index_lifecycle/_start
- issues a request to be placed back into RUNNING mode (immediately)
- GET _xpack/index_lifecycle/_status
- get back the current mode our lifecycle management is in
- update task was hardened to support uninstalled metadata
- if no metadata is installed, the start/stop actions will install metadata
and proceed to try and change it (default start mode is RUNNING)
- rename MAINTENANCE -> STOPPED, MAINTENANCE_REQUESTED -> STOPPING, NORMAL -> RUNNING
follow-up to #31164.
This PR introduces a concept of a maintenance mode for the
lifecycle service. During maintenance mode, no policies are
executed.
To be placed into maintenance mode, users must first issue a
request to be placed in maintenance mode. Once the service
is assured that no policies are in actions that are not to be
interrupted (like ShrinkAction), the service will place itself
in maintenance mode.
APIs to-be introduced:
- POST _xpack/index_lifecycle/maintenance/_request
- issues a request to be placed into maintenenance mode.
This is not immediate, since we must first verify that
it is safe to go from REQUESTED -> IN maintenance mode.
- POST _xpack/index_lifecycle/maintenance/_stop
- issues a request to be taken out (this is immediate)
- GET _xpack/index_lifecycle/maintenance
- get back the current mode our lifecycle management is in
- POST _xpack/index_lifecycle/_stop
- issues a request to be placed into STOPPED mode (maintenance mode).
This is not immediate, since we must first verify that
it is safe to go from STOPPING -> STOPPED.
- POST _xpack/index_lifecycle/_start
- issues a request to be placed back into RUNNING mode (immediately)
- GET _xpack/index_lifecycle/_status
- get back the current mode our lifecycle management is in
- update task was hardened to support uninstalled metadata
- if no metadata is installed, the start/stop actions will install metadata
and proceed to try and change it (default start mode is RUNNING)
- rename MAINTENANCE -> STOPPED, MAINTENANCE_REQUESTED -> STOPPING, NORMAL -> RUNNING
follow-up to #31164.
This PR introduces a concept of a maintenance mode for the
lifecycle service. During maintenance mode, no policies are
executed.
To be placed into maintenance mode, users must first issue a
request to be placed in maintenance mode. Once the service
is assured that no policies are in actions that are not to be
interrupted (like ShrinkAction), the service will place itself
in maintenance mode.
APIs introduced in follow-up PR
This is not immediate, since we must first verify that
it is safe to go from REQUESTED -> IN maintenance mode.