-
Notifications
You must be signed in to change notification settings - Fork 25.6k
Better extension point for custom cluster state parts #21243
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Better extension point for custom cluster state parts #21243
Conversation
fab869c to
aaae041
Compare
aaae041 to
5e70b64
Compare
nik9000
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I had a quick scan. I left a few "why" style things but I think this is the right approach here.
I like the idea of making a NamedWriteableRegistry-like thing for xcontent but it isn't clear to me that we know how to get it to apply well across the board. I don't want us to make some nice general abstraction and only use it for cluster state. If we made such a thing it'd be an empty victory if we didn't move queries and aggs to it.
So I think the right thing to do is to get this in and then use the lessons learned to open a followup for the more generalized abstraction. A migration of that size deserves to be done in many PRs with more time for review. Something like this we want sooner I think.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This change looks properly BWC to me. ++
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I feel like this should be a parameter to the source method. That way it isn't so hidden. It can be @Nullable and we can have javadoc explaining what it is for and when it is needed. I'm not sure if this source method is part of the public Java API or not because we really don't mark things so both everything is and nothing is. Anyway, I think it'd be fine to add a nullable parameter there.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Is the reason you are considering a "named writeable to xcontent" so we don't have to have these registry objects? I think that is a fine idea! I think some things won't work as they stand now but could be fixed. See things like QueryParseContext.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
A fine idea that we can work on later, just to clear.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I feel like this should be a parameter to the source method.
Initially I didn't want change any method signatures, but it isn't a big deal after all as like you said just null can be provided. To make it even better we can add an overloaded version of the method that just accepts source.
Is the reason you are considering a "named writeable to xcontent" so we don't have to have these registry objects?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think we should throw an exception if the proto doesn't resolve to anything.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
You made a Safe method for this already I think.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
agreed, we should do that here. I'll use the Safe variants for that.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Same comments as class above.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Add javadoc?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think the author of the parameter list meant to do one per line.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'd remove this note. We expect that only custom cluster state named writeables are used here but we support any named writeables. While we don't use this now we don't actively prevent using it and a plugin might use it.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think you meant to remove the note but didn't? I'm not sure.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Needs javadoc eventually.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
s/cRegister/pRegistry/`?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
++
|
And, yes, this looks like it maintains wire format and cluster state backwards compatibility to me. I wonder if we should add more stuff to |
I'm not sure if this requires its own qa module or can be included in the |
|
I think that using |
5e70b64 to
96ad8b8
Compare
|
@nik9000 I've updated the PR. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Maybe instead of supporting a null registry we can provide an empty one instead? That feels like it'd be cleaner.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
+1
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'm not a fan of this null support either. I'd prefer sending an empty registry over null.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think you meant to remove the note but didn't? I'm not sure.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Can you fix this javadoc? Those @param annotations are fairly useless.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Can you add some javadoc to explain what this does and some docs to explain why we need @ESIntegTestCase.ClusterScope and @LuceneTestCase.SuppressFileSystems?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Where does the file come from? I think we need to commit the script that generates it so we can make it when we release the next version, right?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I don't know how to automate this in a script, since it involves as plugin that specifies custom index metadata.
This file contains the code that creates the data directory and contains the plugin written on the 5.0.0 codebase: https://gist.github.com/martijnvg/ff48df064ad60ee983ec4fd7c7471a4c
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Just chatted with Nik and I'll remove the custom plugin in test that is used the test custom index metadata. Then the test is simpler and it can be inlined in OldIndexBackwardsCompatibilityIT class and allows scripting.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This'll fail the build.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
ouch, silly IDE
|
I think we're getting a longer and longer list of things that we need to run on upgrade. Maybe we need to start a running list somewhere.... |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Can you also add a test starting a node that doesn't have the plugin installed? The node should start, I think. Though I don't know what it should do with the data that it no longer understands. I think we have a setting like required_plugins to guard against accidental removal of a plugin so we can assume that any removal of the plugin is intentional. I think....
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Good point. I'll add that test.
I agree that the required_plugins setting is an extra guard, but if that setting is forgotten there is nothing stopping the node from starting up.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yeah! I think it is right to start the node though, right? You have to be able to uninstall plugins if you want, right?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
That is true, but that wasn't your intention, you can lose the custom part in the cluster state.
…by only allowing adding custom cluster, metadata and index metadata via the ClusterPlugin interface. * By piggybacking on the named writeable infrastructure when can serialize custom cluster metadata for the internal protocol. This seems to workout nice as the serialization code doesn't need to know about custom cluster metadata implementations. * For the xcontent serialization there is no such thing as named writaebles, so I passed down a registry holding prototypes for custom metadate on all the places we need to de-serialize custom metadata. We can maybe consider adding something like named writable on top of XContentParser? However passing down the registry doesn't seem to be too bad. * Custom index metadata doesn't work yet. I think we can remove it as it isn't used in any place. Not sure what others think about that, but that can be done in another PR. * This change needs tests.
96ad8b8 to
e7c31be
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
this is existing behavior but it's bad... forgetting to install a plugin immediately implies the deletion of it's state (and solving it will be tricky). Can we add this warn log and open an issue to deal with it? (maybe the plugin manager to should remove custom meta when uninstalling. not sure)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I've opened #21471
| DiscoverySettings discoverySettings, | ||
| ClusterName clusterName) { | ||
| ClusterName clusterName, | ||
| CustomPrototypeRegistry registry) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I wonder if we should pass these registries all over the place? maybe it's better to give a cluster diff and cluster state deserializers ? It doesn't feel like PublishClusterStateAction should care about the registry.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The main reason I needed to provide CustomPrototypeRegistry here is because in the handleIncomingClusterStateRequest(...) method (line 380-385) it creates a new StreamInput instance and have to wrap that with a NamedWriteableAwareStreamInput instance.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I might be missing something - but do we need both the NamedWriteable in the stream and the explicit registry when reading a diff? Diff<ClusterState> diff = lastSeenClusterState.readDiffFrom(in, registry);
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
but do we need both the NamedWriteable in the stream
Yes, otherwise nodes that receive a new cluster state don't know how the read the custom cluster state parts. (in the synchronized block line 388 till 407)
and the explicit registry when reading a diff?
Ideally we wouldn't need to pass the registry to the diff, but with the way it reads the custom diff map, I didn't see another of doing that without breaking bwc. (see here, here and here)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@martijnvg and I discussed this and Martijn is work on centralizing this code so all deserialization logic will be in one place. He's also looking into how it should be done if bwc was not a concern so we know how this should look going forward
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@bleskes Like we discussed last friday I tried to stop using named writeables for cluster state related stuff and use the registry directly (so that all code will follow the same deserialization logic), but I ran into an issue that I don't know how to solve without creating a big mess. There are a couple of places where we serialize cluster related stuff in the action framework:
ClusterRerouteResponse#readFrom(...)ClusterStateResponse#readFrom(...)CreateIndexRequest#readFrom(...)PutIndexTemplateRequest#readFrom(...)
In these readFrom() methods I don't have access to the registry. I don't see a way of passing down the registry without changing the Streamable interface.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
+1
| // sometimes the request comes in before the local node processed that cluster state | ||
| // in such cases we can load it from disk | ||
| metaData = IndexMetaData.FORMAT.loadLatestState(logger, nodeEnv.indexPaths(shardId.getIndex())); | ||
| metaData = metaStateService.getIndexMetadatFormat().loadLatestState(logger, nodeEnv.indexPaths(shardId.getIndex())); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
maybe just make this a public method on metaStateService? (i.e., loadIndexLatestState) it already has nodeEnv and everything it needs. And now that I check it - it already has one :D org.elasticsearch.gateway.MetaStateService#loadIndexState
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
+1 good point :)
| * @return Custom cluster state parts used to store temporal information in the cluster state that doesn't need to | ||
| * be persisted | ||
| */ | ||
| default Collection<ClusterState.Custom> getCustomClusterState() { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
nite: this needs a better names - getCustomParts? or getClusterStateCostums?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
+1 for getClusterStateCostums()
…x metadata in create index request from `getIndexMetadataPrototypeSafe()` back to `getIndexMetadataPrototype()` as there is some logic in the request that demotes custom index metadata to index settings.
|
Superseded by #22336 |
This adds a better extension point for custom cluster state parts by only allowing adding custom cluster state parts via the ClusterPlugin interface.
This is a WIP and would be great to get feedback.
PR for #20888