Skip to content

Conversation

@talevy
Copy link
Contributor

@talevy talevy commented Nov 30, 2020

this commit introduces a new Rollup ILM Action that allows indices
to be rolled up according to a specific rollup config. The
action also allows for the new rolled up index to be associated with
a different policy than the original/source index.

Relates #42720.

Closes #48003.

@talevy talevy added >non-issue WIP :StorageEngine/Rollup Turn fine-grained time-based data into coarser-grained data v8.0.0 Team:Analytics Meta label for analytical engine team (ESQL/Aggs/Geo) v7.11.0 labels Nov 30, 2020
@talevy talevy force-pushed the rollup-ilm branch 5 times, most recently from 05169e9 to 33d61ad Compare November 30, 2020 23:15
@talevy talevy mentioned this pull request Nov 30, 2020
21 tasks
@talevy talevy force-pushed the rollup-ilm branch 2 times, most recently from 7a73bd4 to c09d9a7 Compare December 7, 2020 23:33
@talevy talevy removed the WIP label Dec 7, 2020
@talevy talevy marked this pull request as ready for review December 7, 2020 23:34
@elasticmachine
Copy link
Collaborator

Pinging @elastic/es-analytics-geo (Team:Analytics)

this commit introduces a new Rollup ILM Action that allows indices
to be rolled up according to a specific rollup config. The
action also allows for the new rolled up index to be associated with
a different policy than the original/source index. Optionally,
the original index can be deleted.

Relates elastic#42720.

Closes elastic#48003.
Copy link
Member

@not-napoleon not-napoleon left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This looks good to me, but probably worth having someone from ILM review as well.

@talevy
Copy link
Contributor Author

talevy commented Dec 9, 2020

run elasticsearch-ci/default-distro

Copy link
Contributor

@csoulios csoulios left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is great work and it looks a lot simpler than I had expected.
I left some comments wrt to rollup specific points for discussion.
Also, I am no expert in ILM. So, perhaps someone with more experience in this area could help

Copy link
Contributor

@jrodewig jrodewig left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Docs LGTM. Thanks @talevy!

@talevy talevy requested a review from dakrone January 27, 2021 20:35
@talevy
Copy link
Contributor Author

talevy commented Jan 27, 2021

thanks all! ready for another review!

@jrodewig jrodewig mentioned this pull request Jan 27, 2021
26 tasks
Copy link
Member

@dakrone dakrone left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This LGTM, thanks for iterating on this Tal!

I left one more comment, but it can be addressed either here or in a subsequent PR (if we decide a change is desired), up to you.

static {
if (RollupV2.isEnabled()) {
ORDERED_VALID_HOT_ACTIONS = Arrays.asList(SetPriorityAction.NAME, UnfollowAction.NAME, RolloverAction.NAME,
ReadOnlyAction.NAME, ShrinkAction.NAME, ForceMergeAction.NAME, RollupILMAction.NAME, SearchableSnapshotAction.NAME);
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I missed this before, but if we're going to rollup right after a shrink and force merge, what's the point of shrinking and force merging? Should we just rollup first and then do the shrink/forcemerge?

(I think this is okay to leave as is for now, but we might want to investigate making them mutually exclusive in the future so we don't waste resources doing a shrink and force merge right before we rollup and create a new index anyway. At the very least I think maybe force merge should move to after the Rollup?)

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

these are fair points. I figured these things would occur anyways, so order should not matter at the end of the day. That being said, waiting to do these operations before rolling up is not necessary. it might be interesting to benchmark and see if fewer shards ends up resulting in a faster rollup or not. but this should not be of concern to the user anyways.

I will move Rollup to happen before Shrink, but will bring this up with the team to see if there are any other opinions

@talevy talevy merged commit 5811deb into elastic:master Jan 29, 2021
@talevy talevy deleted the rollup-ilm branch January 29, 2021 01:52
talevy added a commit to talevy/elasticsearch that referenced this pull request Jan 29, 2021
this commit introduces a new Rollup ILM Action that allows indices
to be rolled up according to a specific rollup config. The
action also allows for the new rolled up index to be associated with
a different policy than the original/source index.

Relates elastic#42720.

Closes elastic#48003.

For more information about rollup, see the <<rollup-api, rollup action documentation>>

The name of the rolled up index will be the original index name of the managed index prefixed
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@talevy How does this work in the context of data streams? Does it work in the context of data streams? What are the index names there?

Copy link
Contributor

@jrodewig jrodewig Jan 29, 2021

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

ILM works on the index level, not directly on data streams. You can roll up a backing index for a data stream but not a data stream itself.

If the index is a backing index for a data stream, the rollup index is a backing index for the same stream. The name of the new rollup index is based on the name of the backing index. For example, the name of a rollup index for .ds-my-data-stream-2021.01.29-000001 would be rollup-.ds-my-data-stream-2021.01.29-000001.

I hope that helps. I'll open up a follow-up PR to better clarify this in the docs.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hey @ruflin and @jrodewig,

this is exactly how it works today, but due to some failure-scenarios we discussed recently, we're going to change this behavior.

In the context of datastreams, we will likely randomize the name of the rollup index to prevent collision and conflict with any past or rogue rollup actions. The belief is that the underlying name of the index should not be of concern to users since they will be added into the datastream and managed in ILM. Do you see any problems with this on your end @ruflin?

For this reason, it might make sense to hold off just a little bit on your docs change PR @jrodewig, but much appreciated!

I've opened #68225 to track it. Will likely change next week

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for the update, @talevy!

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The belief is that the underlying name of the index should not be of concern to users since they will be added into the datastream and managed in ILM

If this is the case, it is great. I was under the assumption all indices belong to a data stream must use the .ds-* prefix and be hidden. So I expect index names like .ds-rollup-* or something similar. This would also ensure there are no conflicts and makes it clear that these rollups belong to the data streams.

The rollup- prefix is what put me on the wrong track here and made me assume it does not belong to the data stream.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

it is actually common for indices managed by ILM to be renamed with a new prefix. For example, shrunken indices start with the shrunk- prefix.

independent of the name, the indices will be hidden and added to the backing datastream

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@talevy Interesting. Why are these not prefixed by a . and not inside .ds-*-shrunk-* or similar to indicate they belong together? I think I'm derailing but could you share a link where I can read up on this? The part I'm concerned is that this not seem to fully fit into our naming scheme: https://www.elastic.co/blog/an-introduction-to-the-elastic-data-stream-naming-scheme

talevy added a commit that referenced this pull request Jan 29, 2021
this commit introduces a new Rollup ILM Action that allows indices
to be rolled up according to a specific rollup config. The
action also allows for the new rolled up index to be associated with
a different policy than the original/source index.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

>non-issue :StorageEngine/Rollup Turn fine-grained time-based data into coarser-grained data Team:Analytics Meta label for analytical engine team (ESQL/Aggs/Geo) v7.12.0 v8.0.0-alpha1

Projects

None yet

Development

Successfully merging this pull request may close these issues.

Add an ILM action to rollup an index

9 participants