Skip to content

Conversation

@talevy
Copy link
Contributor

@talevy talevy commented Jun 7, 2018

This PR introduces a concept of a maintenance mode for the
lifecycle service. During maintenance mode, no policies are
executed.

To be placed into maintenance mode, users must first issue a
request to be placed in maintenance mode. Once the service
is assured that no policies are in actions that are not to be
interrupted (like ShrinkAction), the service will place itself
in maintenance mode.

APIs introduced in follow-up PR

  • POST _xpack/index_lifecycle/maintenance/_request
    • issues a request to be placed into maintenenance mode.
      This is not immediate, since we must first verify that
      it is safe to go from REQUESTED -> IN maintenance mode.
  • POST _xpack/index_lifecycle/maintenance/_stop
    • issues a request to be taken out (this is immediate)
  • GET _xpack/index_lifecycle/maintenance
    • get back the current mode our lifecycle management is in

This PR introduces a concept of a maintenance mode for the
lifecycle service. During maintenance mode, no policies are
executed.

To be placed into maintenance mode, users must first issue a
request to be placed in maintenance mode. Once the service
is assured that no policies are in actions that are not to be
interrupted (like ShrinkAction), the service will place itself
in maintenance mode.

APIs introduced:

- POST _xpack/index_lifecycle/maintenance/_request
   - issues a request to be placed into maintenenance mode.
     This is not immediate, since we must first verify that
     it is safe to go from REQUESTED -> IN maintenance mode.
- POST _xpack/index_lifecycle/maintenance/_stop
   - issues a request to be taken out (this is immediate)
- GET _xpack/index_lifecycle/maintenance
   - get back the current mode our lifecycle management is in
@talevy talevy added WIP :Data Management/ILM+SLM Index and Snapshot lifecycle management labels Jun 7, 2018
@elasticmachine
Copy link
Collaborator

Pinging @elastic/es-core-infra

@talevy talevy requested a review from colings86 June 7, 2018 00:27
@talevy
Copy link
Contributor Author

talevy commented Jun 7, 2018

Hi @colings86, I'd appreciate a review (even though it is WIP)!

Things I have not implemented yet, that may or may not make sense in this PR:

  • Actions for the endpoints
  • more tests

Things that I found contentious:

  • I chose to go with IndexMetaData for storing the MaintenanceMode since it felt wrong to be
    a setting that changes under the hood from the user (and we do not want people to edit it directly)
    • I notice that I am missing something from Diff logic, those cluster-state diff tests are failing, help there would be appreciated!

any early feedback would be much appreciated, thanks!

@talevy talevy force-pushed the ilm-maintenance branch from dda2da9 to fa46f10 Compare June 12, 2018 06:25
@talevy
Copy link
Contributor Author

talevy commented Jun 12, 2018

@colings86 for the sake of moving things forward (and since this is already a large enough PR), I will move the API work to follow-up PRs. A review for what has been done so far would be great!

Copy link
Contributor

@colings86 colings86 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I left some comments but I think it looks pretty good so far.

import org.elasticsearch.xpack.core.indexlifecycle.ErrorStep;
import org.elasticsearch.xpack.core.indexlifecycle.InitializePolicyContextStep;
import org.elasticsearch.xpack.core.indexlifecycle.LifecycleSettings;
import org.elasticsearch.xpack.core.indexlifecycle.ShrinkAction;
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't think this import is needed?


boolean maintenanceModeToChange = currentMetadata.getMaintenanceMode().equals(mode) == false;
boolean maintenanceModeRequested = OperationMode.MAINTENANCE_REQUESTED.equals(mode);
boolean inMaintenanceMode = OperationMode.MAINTENANCE.equals(currentMetadata.getMaintenanceMode());
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We should do == checks on enums rather than .equals checks

boolean inMaintenanceMode = OperationMode.MAINTENANCE.equals(currentMetadata.getMaintenanceMode());
if ((inMaintenanceMode && maintenanceModeRequested) || maintenanceModeToChange == false) {
return currentState;
}
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think we also need to check that if mode == OperationMode.MAINTENANCE then currentMetadata.getMaintenanceMode() == OperationMode.MAINTENANCE_REQUESTED is also true so we don't end up with a race condition where the user requests NORMAL mode while we are iterating through the indices in IndexLifecycleService and when it finishes iterating it suddenly goes into MAINTENANCE mode ignoring the fact that the user cancelled the request

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

in other words, you can only enter maintenance mode if you were previously in maintenance-requested mode

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

yes

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

and in other words, the state transition for operation modes looks like this

<current_mode> → [<next_mode>, ...]

NORMAL → [NORMAL, MAINTENANCE_REQUESTED]
MAINTENANCE_REQUESTED → [NORMAL, MAINTENANCE]
MAINTENANCE → [NORMAL]

@talevy talevy removed the WIP label Jun 12, 2018
@talevy
Copy link
Contributor Author

talevy commented Jun 12, 2018

thanks @colings86, I've made the valid mode transitions clearer and more restrictive.

Copy link
Contributor

@colings86 colings86 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I left a comment about the valid mode changes but otherwise I think this is very close.

static {
VALID_MODE_CHANGES.put(OperationMode.NORMAL, Sets.newHashSet(OperationMode.MAINTENANCE_REQUESTED));
VALID_MODE_CHANGES.put(OperationMode.MAINTENANCE_REQUESTED, Sets.newHashSet(OperationMode.NORMAL, OperationMode.MAINTENANCE));
VALID_MODE_CHANGES.put(OperationMode.MAINTENANCE, Sets.newHashSet(OperationMode.NORMAL));
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Instead of having these here, we could utilise the OperationMode enum and pass the valid next modes into each enum via a constructor then we could have a isValidChange(OperationMode) method on it which check the mode in the parameter exists in the list of valid modes on that enum value. This will mean that the OperationMode enum completely owns the state transitions

Copy link
Contributor Author

@talevy talevy Jun 13, 2018

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

hmm, I tried this but it got just as complicated because the constructor elements cannot be forward references. and I preferred not to use strings.

Instead, I'll move all the logic to this isValidChange method. which will encapsulate the state transitions allowed using switch statements and conditionals

Copy link
Contributor

@colings86 colings86 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM

@talevy talevy merged commit 0193979 into elastic:index-lifecycle Jun 15, 2018
@talevy talevy deleted the ilm-maintenance branch June 15, 2018 18:38
talevy added a commit to talevy/elasticsearch that referenced this pull request Jun 18, 2018
- POST _xpack/index_lifecycle/maintenance/_request
   - issues a request to be placed into maintenenance mode.
     This is not immediate, since we must first verify that
     it is safe to go from REQUESTED -> IN maintenance mode.
- POST _xpack/index_lifecycle/maintenance/_stop
   - issues a request to be taken out (this is immediate)
- GET _xpack/index_lifecycle/maintenance
   - get back the current mode our lifecycle management is in

- maintenance mode update task was hardened to support uninstalled metadata
- if no metadata is installed, the request/stop actions will install metadata
  and proceed to try and change it (default start mode is NORMAL)

follow-up to elastic#31164.
talevy added a commit that referenced this pull request Jun 21, 2018
- POST _xpack/index_lifecycle/_stop
   - issues a request to be placed into STOPPED mode (maintenance mode).
     This is not immediate, since we must first verify that
     it is safe to go from STOPPING -> STOPPED.
- POST _xpack/index_lifecycle/_start
   - issues a request to be placed back into RUNNING mode (immediately)
- GET _xpack/index_lifecycle/_status
   - get back the current mode our lifecycle management is in

- update task was hardened to support uninstalled metadata
- if no metadata is installed, the start/stop actions will install metadata
  and proceed to try and change it (default start mode is RUNNING)
- rename MAINTENANCE -> STOPPED, MAINTENANCE_REQUESTED -> STOPPING, NORMAL -> RUNNING

follow-up to #31164.
jasontedor pushed a commit that referenced this pull request Aug 17, 2018
This PR introduces a concept of a maintenance mode for the
lifecycle service. During maintenance mode, no policies are
executed.

To be placed into maintenance mode, users must first issue a
request to be placed in maintenance mode. Once the service
is assured that no policies are in actions that are not to be
interrupted (like ShrinkAction), the service will place itself
in maintenance mode.

APIs to-be introduced:

- POST _xpack/index_lifecycle/maintenance/_request
   - issues a request to be placed into maintenenance mode.
     This is not immediate, since we must first verify that
     it is safe to go from REQUESTED -> IN maintenance mode.
- POST _xpack/index_lifecycle/maintenance/_stop
   - issues a request to be taken out (this is immediate)
- GET _xpack/index_lifecycle/maintenance
   - get back the current mode our lifecycle management is in
jasontedor pushed a commit that referenced this pull request Aug 17, 2018
- POST _xpack/index_lifecycle/_stop
   - issues a request to be placed into STOPPED mode (maintenance mode).
     This is not immediate, since we must first verify that
     it is safe to go from STOPPING -> STOPPED.
- POST _xpack/index_lifecycle/_start
   - issues a request to be placed back into RUNNING mode (immediately)
- GET _xpack/index_lifecycle/_status
   - get back the current mode our lifecycle management is in

- update task was hardened to support uninstalled metadata
- if no metadata is installed, the start/stop actions will install metadata
  and proceed to try and change it (default start mode is RUNNING)
- rename MAINTENANCE -> STOPPED, MAINTENANCE_REQUESTED -> STOPPING, NORMAL -> RUNNING

follow-up to #31164.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

:Data Management/ILM+SLM Index and Snapshot lifecycle management

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants