Skip to content

Conversation

@Tim-Brooks
Copy link
Contributor

This is related to #35975. It implements a basic restore functionality
for the CcrRepository. When the restore process is kicked off, it
configures the new index as expected for a follower index. This means
that the index has a different uuid, the version is not incremented, and
the Ccr metadata is installed.

When the restore shard method is called, an empty shard is initialized.

@Tim-Brooks Tim-Brooks added >non-issue v7.0.0 :Distributed Indexing/CCR Issues around the Cross Cluster State Replication features v6.6.0 labels Dec 6, 2018
@elasticmachine
Copy link
Collaborator

Pinging @elastic/es-distributed

@Tim-Brooks
Copy link
Contributor Author

This pulls over the basic restore functionality from #35719. This is so that we can start implementing the restore process without worrying about the concurrent restore issue.

Copy link
Contributor

@ywelsch ywelsch left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

What's the purpose of having incrementIndexVersion = false?

@Override
public ClusterState execute(ClusterState currentState) {
public ClusterState execute(ClusterState currentState) throws Exception {
// TODO: Evaluate if we want to keep once restore at a time behavior
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't understand this comment. What do you mean here?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Removed. That was just an earlier comment about concurrent restores.

/**
* Returns an Iterable to all instances for the given class >T< for the cluster's master node.
*/
public synchronized <T> T getMasterNodeInstance(Class<T> clazz) {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If you merge latest master, you will notice that there is already a method with same name, but different semantics. Maybe call this one getCurrentMasterNodeInstance(...)

@Tim-Brooks Tim-Brooks requested a review from ywelsch December 6, 2018 17:03
Copy link
Contributor

@ywelsch ywelsch left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I've left more comments, especially on what SnapshotInfo should represent. Can you please also reply to my comments when pushing changes, so that it's easier for me to see how you addressed them? Look for example at #35678 or #35488

throw new UnsupportedOperationException("Unsupported for repository of type: " + TYPE);
assert SNAPSHOT_UUID.equals(snapshotId.getUUID()) : "RemoteClusterRepository only supports the _latest_ as the UUID";
Client remoteClient = client.getRemoteClusterClient(remoteClusterAlias);
ClusterStateResponse response = remoteClient.admin().cluster().prepareState().clear().setMetaData(true).get();
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

is there a way to filter out the index metadata here? We just want the global metadata.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It looks like if you set indices to an empty array it returns all of the indices. I could like put a single random string for an index name in there?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

it's a little ugly but might work, and save on transferring a lot of index metadata over the wire.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I added:

.setIndices("dummy_index_name") // We set a single dummy index name to avoid fetching all the index data

public SnapshotInfo getSnapshotInfo(SnapshotId snapshotId) {
throw new UnsupportedOperationException("Unsupported for repository of type: " + TYPE);
assert SNAPSHOT_UUID.equals(snapshotId.getUUID()) : "RemoteClusterRepository only supports the _latest_ as the UUID";
return new SnapshotInfo(snapshotId, Collections.singletonList(snapshotId.getName()), SnapshotState.SUCCESS, version);
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I wonder if we should return all indices of the remote cluster here. That would more naturally map to the notion of the state of the remote cluster representing a snapshot.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Done.

// The snapshot will be compatible because remote cluster connections must be compatible with our cluster
// version. So the snapshot version is not too important.
// TODO: Evaluate where we eventually want to pull the version from
private final Version version = Version.CURRENT;
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

the version here matters. As an initial approximation, we can take the maximum version of data nodes in the leader cluster.

Copy link
Contributor Author

@Tim-Brooks Tim-Brooks Dec 6, 2018

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

private void validateSnapshotRestorable(final String repository, final SnapshotInfo snapshotInfo) {
if (!snapshotInfo.state().restorable()) {
    throw new SnapshotRestoreException(new Snapshot(repository, snapshotInfo.snapshotId()), "unsupported snapshot state [" 
        + snapshotInfo.state() + "]");
}
if (Version.CURRENT.before(snapshotInfo.version())) {
    throw new SnapshotRestoreException(new Snapshot(repository, snapshotInfo.snapshotId()), "the snapshot was created with Elasticsearch version [" + snapshotInfo.version() 
        + "] which is higher than the version of this node [" + Version.CURRENT + "]");
    }
}

Here is the validation performed based on this version. So if the leader cluster is a newer version than the follower cluster, it will be marked as incompatible. So I will take the approach of the maximum version from the data nodes in the leader cluster if you are happy with that.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

So I will take the approach of the maximum version from the data nodes in the leader cluster if you are happy with that.

The validation logic you mentioned above (validateSnapshotRestorable) will not guarantee us that there isn't a data node in the follower cluster that is older than the master that runs the validation logic there, but it's at least better than no validation at all. I think it's the best check we can do for now. Can you add an item to the meta-issue that we need to think a bit more about BWC constraints?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I will add a note.


ImmutableOpenMap<String, IndexMetaData> remoteIndices = remoteMetaData.getIndices();
for (String indexName : remoteMetaData.getConcreteAllIndices()) {
SnapshotId snapshotId = new SnapshotId(indexName, SNAPSHOT_UUID);
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

instead of one snapshot per index, I think it's more natural to have a fixed SnapshotID(latest, latest) and all the indices of the cluster then as indices that are part of the snapshot.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

done


PlainActionFuture<RestoreService.RestoreCompletionResponse> future = PlainActionFuture.newFuture();
restoreService.restoreSnapshot(restoreRequest, future);
future.actionGet();
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

check that response says that all shards successfully restored?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Done

@Tim-Brooks Tim-Brooks requested a review from ywelsch December 7, 2018 16:25
Copy link
Contributor

@ywelsch ywelsch left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Some nits, looks good o.w.

@Tim-Brooks Tim-Brooks merged commit 8a53f2b into elastic:master Dec 7, 2018
Tim-Brooks added a commit to Tim-Brooks/elasticsearch that referenced this pull request Dec 12, 2018
This is related to elastic#35975. It implements a basic restore functionality
for the CcrRepository. When the restore process is kicked off, it
configures the new index as expected for a follower index. This means
that the index has a different uuid, the version is not incremented, and
the Ccr metadata is installed.

When the restore shard method is called, an empty shard is initialized.
Tim-Brooks added a commit that referenced this pull request Dec 12, 2018
This is related to #35975. It implements a basic restore functionality
for the CcrRepository. When the restore process is kicked off, it
configures the new index as expected for a follower index. This means
that the index has a different uuid, the version is not incremented, and
the Ccr metadata is installed.

When the restore shard method is called, an empty shard is initialized.
@Tim-Brooks Tim-Brooks deleted the ccr_repo_work branch December 18, 2019 14:46
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

:Distributed Indexing/CCR Issues around the Cross Cluster State Replication features >non-issue v6.6.0 v7.0.0-beta1

Projects

None yet

Development

Successfully merging this pull request may close these issues.

4 participants