Skip to content

Conversation

@talevy
Copy link
Contributor

@talevy talevy commented Jul 31, 2018

This was originally set to a few seconds while prototyping things.
This interval is for the scheduled trigger of policies. Policies
have this extra trigger beyond just on cluster-state changes because
cluster-state changes may not be happeneing in a cluster for
whatever reason, and we need to continue making progress. Updating
this value to be larger is reasonable since not all operations
are expected to be completed in the span of seconds, but instead in
minutes and hours.

@talevy talevy added >non-issue :Data Management/ILM+SLM Index and Snapshot lifecycle management labels Jul 31, 2018
@talevy talevy requested review from colings86 and dakrone July 31, 2018 22:31
@elasticmachine
Copy link
Collaborator

Pinging @elastic/es-core-infra

@dakrone
Copy link
Member

dakrone commented Jul 31, 2018

I take it this is the issue to kick off the discussion on what the interval should be? For what it's worth, I vote 5 minutes :)

@talevy
Copy link
Contributor Author

talevy commented Jul 31, 2018

@dakrone yup! let the voting begin! because I sure do not have a clue what the "right" interval is. Anywhere between 5min and 15min sounds reasonable to me. The 15min that was seeded in the PR was from one of my conversations with @colings86.

@colings86
Copy link
Contributor

colings86 commented Aug 1, 2018

@dakrone could you explain why you think 5 minutes is a good value here?

The poll interval is used for 2 purposes: 1) as a fallback to make sure we make progress in the event of a cluster change not happening or a client async listener not firing, 2) to trigger a check on the rollover conditions since an index breaking through these conditions does not cause a cluster state change

Reasons I think 15 minutes is a good value:

  • We expect typical deployments to take some time to perform actions so we don't want to fire too often since thats wasteful
  • 15 minutes is a small time compared to the time we are likely to be in a phase anyway (we expect to be in a phase for days generally although the hot phase might be for less than this in some cases)
  • If we are expecting indexes to rollover in the order of a few days then checking for rollover conditions every fifteen minutes feels to me like we are being responsive without checking too much since the index will not have overflowed the criteria by much in fifteen minutes
  • Users who have very high throughput (mainly thinking of those currently on hourly indexes) can change the poll interval to fit their use case easily by updating the cluster setting

@dakrone
Copy link
Member

dakrone commented Aug 2, 2018

@colings86 it's purely a gut feeling based on what I think a "medium but not long period of time" is.

It's like the definition of "a few", to me, "a few" to me is 3 to 8, and I'd like ILM to check every few minutes, so that led to the 5 minute interval.

This also includes a big caveat that I'll be completely happy if we go with 15 minutes, just wanted to explain my reasoning :)

@talevy
Copy link
Contributor Author

talevy commented Aug 2, 2018

Since many of these arguments feel like they could work with a variety of other minute values. I will play in the middle, two minutes above few, and an average of both suggestions... 10minutes?

@colings86
Copy link
Contributor

@dakrone thanks for explaining your reasoning. Personally I would be comfortable with anything from 5 minutes to 15 minutes. I think anything shorter than 5 minutes would be too often and anything long than 15 minutes might not be responsive enough for rollover. So I'm happy for the 10 minutes that @talevy proposed 😄

@dakrone
Copy link
Member

dakrone commented Aug 3, 2018

10 minutes it is!

@talevy talevy force-pushed the ilm-poll-interval branch from 18d8cf4 to d3a5e89 Compare August 3, 2018 18:57
This was originally set to a few seconds while prototyping things.
This interval is for the scheduled trigger of policies. Policies
have this extra trigger beyond just on cluster-state changes because
cluster-state changes may not be happeneing in a cluster for
whatever reason, and we need to continue making progress. Updating
this value to be larger is reasonable since not all operations
are expected to be completed in the span of seconds, but instead in
minutes and hours. 10 minutes is sane.
@talevy talevy force-pushed the ilm-poll-interval branch from d3a5e89 to 22e2bb7 Compare August 6, 2018 16:35
@talevy talevy requested review from colings86 and dakrone and removed request for colings86 and dakrone August 6, 2018 21:22
Copy link
Member

@dakrone dakrone left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM

@talevy
Copy link
Contributor Author

talevy commented Aug 6, 2018

thanks @dakrone!

@talevy talevy merged commit 0ad252d into elastic:index-lifecycle Aug 6, 2018
@talevy talevy deleted the ilm-poll-interval branch August 6, 2018 21:41
jasontedor pushed a commit that referenced this pull request Aug 17, 2018
)

This was originally set to a few seconds while prototyping things.
This interval is for the scheduled trigger of policies. Policies
have this extra trigger beyond just on cluster-state changes because
cluster-state changes may not be happeneing in a cluster for
whatever reason, and we need to continue making progress. Updating
this value to be larger is reasonable since not all operations
are expected to be completed in the span of seconds, but instead in
minutes and hours. 10 minutes is sane.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

:Data Management/ILM+SLM Index and Snapshot lifecycle management >non-issue

Projects

None yet

Development

Successfully merging this pull request may close these issues.

4 participants