-
Notifications
You must be signed in to change notification settings - Fork 25.6k
Implement basic CcrRepository restore
#36287
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
|
Pinging @elastic/es-distributed |
|
This pulls over the basic restore functionality from #35719. This is so that we can start implementing the restore process without worrying about the concurrent restore issue. |
ywelsch
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
What's the purpose of having incrementIndexVersion = false?
| @Override | ||
| public ClusterState execute(ClusterState currentState) { | ||
| public ClusterState execute(ClusterState currentState) throws Exception { | ||
| // TODO: Evaluate if we want to keep once restore at a time behavior |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I don't understand this comment. What do you mean here?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Removed. That was just an earlier comment about concurrent restores.
| /** | ||
| * Returns an Iterable to all instances for the given class >T< for the cluster's master node. | ||
| */ | ||
| public synchronized <T> T getMasterNodeInstance(Class<T> clazz) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
If you merge latest master, you will notice that there is already a method with same name, but different semantics. Maybe call this one getCurrentMasterNodeInstance(...)
ywelsch
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
| throw new UnsupportedOperationException("Unsupported for repository of type: " + TYPE); | ||
| assert SNAPSHOT_UUID.equals(snapshotId.getUUID()) : "RemoteClusterRepository only supports the _latest_ as the UUID"; | ||
| Client remoteClient = client.getRemoteClusterClient(remoteClusterAlias); | ||
| ClusterStateResponse response = remoteClient.admin().cluster().prepareState().clear().setMetaData(true).get(); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
is there a way to filter out the index metadata here? We just want the global metadata.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It looks like if you set indices to an empty array it returns all of the indices. I could like put a single random string for an index name in there?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
it's a little ugly but might work, and save on transferring a lot of index metadata over the wire.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I added:
.setIndices("dummy_index_name") // We set a single dummy index name to avoid fetching all the index data
| public SnapshotInfo getSnapshotInfo(SnapshotId snapshotId) { | ||
| throw new UnsupportedOperationException("Unsupported for repository of type: " + TYPE); | ||
| assert SNAPSHOT_UUID.equals(snapshotId.getUUID()) : "RemoteClusterRepository only supports the _latest_ as the UUID"; | ||
| return new SnapshotInfo(snapshotId, Collections.singletonList(snapshotId.getName()), SnapshotState.SUCCESS, version); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I wonder if we should return all indices of the remote cluster here. That would more naturally map to the notion of the state of the remote cluster representing a snapshot.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Done.
x-pack/plugin/ccr/src/main/java/org/elasticsearch/xpack/ccr/repository/CcrRepository.java
Outdated
Show resolved
Hide resolved
x-pack/plugin/ccr/src/main/java/org/elasticsearch/xpack/ccr/repository/CcrRepository.java
Outdated
Show resolved
Hide resolved
x-pack/plugin/ccr/src/main/java/org/elasticsearch/xpack/ccr/repository/CcrRepository.java
Outdated
Show resolved
Hide resolved
x-pack/plugin/ccr/src/main/java/org/elasticsearch/xpack/ccr/repository/CcrRepository.java
Outdated
Show resolved
Hide resolved
| // The snapshot will be compatible because remote cluster connections must be compatible with our cluster | ||
| // version. So the snapshot version is not too important. | ||
| // TODO: Evaluate where we eventually want to pull the version from | ||
| private final Version version = Version.CURRENT; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
the version here matters. As an initial approximation, we can take the maximum version of data nodes in the leader cluster.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
private void validateSnapshotRestorable(final String repository, final SnapshotInfo snapshotInfo) {
if (!snapshotInfo.state().restorable()) {
throw new SnapshotRestoreException(new Snapshot(repository, snapshotInfo.snapshotId()), "unsupported snapshot state ["
+ snapshotInfo.state() + "]");
}
if (Version.CURRENT.before(snapshotInfo.version())) {
throw new SnapshotRestoreException(new Snapshot(repository, snapshotInfo.snapshotId()), "the snapshot was created with Elasticsearch version [" + snapshotInfo.version()
+ "] which is higher than the version of this node [" + Version.CURRENT + "]");
}
}
Here is the validation performed based on this version. So if the leader cluster is a newer version than the follower cluster, it will be marked as incompatible. So I will take the approach of the maximum version from the data nodes in the leader cluster if you are happy with that.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
So I will take the approach of the maximum version from the data nodes in the leader cluster if you are happy with that.
The validation logic you mentioned above (validateSnapshotRestorable) will not guarantee us that there isn't a data node in the follower cluster that is older than the master that runs the validation logic there, but it's at least better than no validation at all. I think it's the best check we can do for now. Can you add an item to the meta-issue that we need to think a bit more about BWC constraints?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I will add a note.
|
|
||
| ImmutableOpenMap<String, IndexMetaData> remoteIndices = remoteMetaData.getIndices(); | ||
| for (String indexName : remoteMetaData.getConcreteAllIndices()) { | ||
| SnapshotId snapshotId = new SnapshotId(indexName, SNAPSHOT_UUID); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
instead of one snapshot per index, I think it's more natural to have a fixed SnapshotID(latest, latest) and all the indices of the cluster then as indices that are part of the snapshot.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
done
|
|
||
| PlainActionFuture<RestoreService.RestoreCompletionResponse> future = PlainActionFuture.newFuture(); | ||
| restoreService.restoreSnapshot(restoreRequest, future); | ||
| future.actionGet(); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
check that response says that all shards successfully restored?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Done
x-pack/plugin/ccr/src/main/java/org/elasticsearch/xpack/ccr/repository/CcrRepository.java
Show resolved
Hide resolved
ywelsch
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Some nits, looks good o.w.
x-pack/plugin/ccr/src/main/java/org/elasticsearch/xpack/ccr/repository/CcrRepository.java
Outdated
Show resolved
Hide resolved
x-pack/plugin/ccr/src/main/java/org/elasticsearch/xpack/ccr/repository/CcrRepository.java
Show resolved
Hide resolved
x-pack/plugin/ccr/src/main/java/org/elasticsearch/xpack/ccr/repository/CcrRepository.java
Outdated
Show resolved
Hide resolved
This is related to elastic#35975. It implements a basic restore functionality for the CcrRepository. When the restore process is kicked off, it configures the new index as expected for a follower index. This means that the index has a different uuid, the version is not incremented, and the Ccr metadata is installed. When the restore shard method is called, an empty shard is initialized.
This is related to #35975. It implements a basic restore functionality for the CcrRepository. When the restore process is kicked off, it configures the new index as expected for a follower index. This means that the index has a different uuid, the version is not incremented, and the Ccr metadata is installed. When the restore shard method is called, an empty shard is initialized.
This is related to #35975. It implements a basic
restorefunctionalityfor the
CcrRepository. When the restore process is kicked off, itconfigures the new index as expected for a follower index. This means
that the index has a different uuid, the version is not incremented, and
the Ccr metadata is installed.
When the restore shard method is called, an empty shard is initialized.