Skip to content

Conversation

@Tim-Brooks
Copy link
Contributor

@Tim-Brooks Tim-Brooks commented Dec 3, 2018

Currently our handshake requests do not include a version. This is
unfortunate as we cannot rely on the stream version since it is not the
sending node's version. Instead it is the minimum compatibility version.
The handshake request is currently empty and we do nothing with it. This
should allow us to add data to the request without breaking backwards
compatibility.

This commit adds the version to the handshake request. Additionally, it
allows "future data" to be added to the request. This allows nodes to craft
a version compatible response. And will properly handle additional data in
future handshake requests. The proper handling of "future data" is useful
as this is the only request where we do not know the other node's version.

Finally, it renames the TcpTransportHandshaker to
TransportHandshaker.

@elasticmachine
Copy link
Collaborator

Pinging @elastic/es-distributed

@Tim-Brooks
Copy link
Contributor Author

I'm not completely sure if this will work as I think that we validate the entire message is consumed. But I wanted to push it up here to see what happens on the backwards compatibility tests.

@s1monw
Copy link
Contributor

s1monw commented Dec 3, 2018

I'm not completely sure if this will work as I think that we validate the entire message is consumed. But I wanted to push it up here to see what happens on the backwards compatibility tests.

I don't think this will fly. This handshake is only here to negotiate the version I am not sure what we are trying to achieve here?

@Tim-Brooks
Copy link
Contributor Author

Alright I had been doubting whether this would work due to an initial test failure. But that was due to a mistake where I did not make master (7.0.0) able to read prior response versions.

I don't think this will fly.

I think that it works. I have tested these changes locally against 5.6 and 6.2, even with the additional data in the request, the nodes successfully handshake. The reason is that on handshake requests we do not require that the entire stream has been consumed (like we do for normal requests).

Additionally, all of CI is passing now.

This handshake is only here to negotiate the version I am not sure what we are trying to achieve here?

There two motivations:

  1. We have two handshakes internal:tcp/handshake and internal:transport/handshake. They both validate versions. internal:transport/handshake also validates discovery node and cluster name. The downside of two handshakes are multiple roundtrips and increased complexity for the async-connection work. If the receiver of internal:tcp/handshake were aware of the true node version, it would allow a versioned response that sends the additional data (discovery node and cluster alias). If 6.7 supported this, then we could remove internal:transport/handshake in 7.0.

Additionally, if the handshake was version-aware it would allow for work in the future to use this (if we ever wanted to negotiate anything else).

  1. Currently the internal:tcp/handshake is kind of the wild west. We treat a message as a handshake request if the correct bit is set in the status byte. And it appears to me that we do not at all care what the actual message is because we throw it all away. We don't validate the action name either. It can be any action name. I have tested this locally with a random action name pointed at a 5.6 node.

The request type currently is TransportRequest.Empty. But that is not actually empty, as there is a variable int for the TaskId. But we do not even consume that, which is why I thought (and tests support) that adding additional data does not harm prior versions.

By making this change we bring the handshake into accordance with our other messages. The request stream will be consumed and treated more strictly (we can also validate the action name). As you see I added a field indicating how large the message is to allow future versions to add additional information if there is value in that. And prior versions would just blindly consume that data but send back their normal response (which includes the version so future version can handle it).

One other supporting piece of data:

This PR #31020 actually made 7.0 questionably backwards compatible for handshakes with versions prior to 6.3. It looks to me that we want these version to be able to handshake even if 7.0 will immediately close the connection after reading the response. My knowledge of what we want here is based on this comment:

    static void ensureVersionCompatibility(Version version, Version currentVersion, boolean isHandshake) {
        // for handshakes we are compatible with N-2 since otherwise we can't figure out our initial version
        // since we are compatible with N-1 and N+1 so we always send our minCompatVersion as the initial version in the
        // handshake. This looks odd but it's required to establish the connection correctly we check for real compatibility
        // once the connection is established
        final Version compatibilityVersion = isHandshake ? currentVersion.minimumCompatibilityVersion() : currentVersion;
        if (version.isCompatible(compatibilityVersion) == false) {
            final Version minCompatibilityVersion = isHandshake ? compatibilityVersion : compatibilityVersion.minimumCompatibilityVersion();
            String msg = "Received " + (isHandshake ? "handshake " : "") + "message from unsupported version: [";
            throw new IllegalStateException(msg + version + "] minimal compatible version is: [" + minCompatibilityVersion + "]");
        }
    }

Based on that work it looks to me that 7.0 will send a handshake request with a stream version of 6.7. And 6.7 should be compatible with 6.2. Although once 7.0 receives the response (indicating 6.2) it would close the stream.

The issue with #31020 is that 7.0 will have a minimal compatibility version of 6.7. That means in the handshake request it will include features. 6.2 will not know how to read these features. Instead those bytes will probably corrupt the parsing of the next bytes (action name). However, since it is a handshake request, at this point we are just not doing anything with the stream. Which is why the backwards compatbility tests never broke when that functionality was added.

Maybe we should fix that in 7.0? Not send the features in the handshake request? However, I guess my point is that this is an example of how we currently allow sending extra data in handshake requests. And this PR would close this weird edge case, while introducing some support for future (unexpected) changes.

@Tim-Brooks
Copy link
Contributor Author

I just kind of threw in the max of 2kb for future handshake requests. I'm not sure if there is actually any value in that. But we can decide on that validation, once you evaluate what you think of the reasoning for the other changes.

@s1monw
Copy link
Contributor

s1monw commented Dec 7, 2018

I think that it works. I have tested these changes locally against 5.6 and 6.2, even with the additional data in the request, the nodes successfully handshake. The reason is that on handshake requests we do not require that the entire stream has been consumed (like we do for normal requests).

uh sneaky. I think we should from now on?

Currently the internal:tcp/handshake is kind of the wild west. We treat a message as a handshake request if the correct bit is set in the status byte. And it appears to me that we do not at all care what the actual message is because we throw it all away. We don't validate the action name either. It can be any action name. I have tested this locally with a random action name pointed at a 5.6 node.

fair enough. lets fix that!

Maybe we should fix that in 7.0? Not send the features in the handshake request? However, I guess my point is that this is an example of how we currently allow sending extra data in handshake requests. And this PR would close this weird edge case, while introducing some support for future (unexpected) changes.

++

I am all for having just one handshake request! I still think we should also validate that we consume everything in the handshake going forward?

static final class HandshakeResponse extends TransportResponse {

private final Version responseVersion;
private Version requestVersion;
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

hmm this is unused?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I removed it.

try {
return buffer.readByte();
} catch (IndexOutOfBoundsException ex) {
throw new EOFException();
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

can you put the ex as a cause in there?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I added it. The ctor does not take it. But I added it use the init method.

@Tim-Brooks
Copy link
Contributor Author

@s1monw - I updated this based on your comments. I removed the 2KB limit as I'm not sure that is something that actually makes sense. And verified that the stream is fully consumed. I did not validate the action name yet. I think that will be a follow-up.

@Tim-Brooks Tim-Brooks requested a review from s1monw December 9, 2018 18:35
Copy link
Contributor

@s1monw s1monw left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

left one comment LGTM. I assume we add stuff based on top of this in followups?

@Tim-Brooks
Copy link
Contributor Author

left one comment LGTM. I assume we add stuff based on top of this in followups?

Yeah.


static final class HandshakeResponse extends TransportResponse {

private final Version responseVersion;
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If the plan is to replace internal:transport/handshake with this message, does it also need the ClusterName? Today we rely on the fact that the transport handshake validates that the cluster names match during discovery - I think that we don't otherwise verify that we're talking to nodes from the right cluster.

Copy link
Contributor Author

@Tim-Brooks Tim-Brooks Dec 11, 2018

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

That is follow-up work. We need the version the the request to bootstrap the follow-up work. Once we have the version in the request, we can serialize arbitrary responses that are compatible with the remote node's version.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Got it, thanks.

@Tim-Brooks Tim-Brooks merged commit 797f985 into elastic:master Dec 11, 2018
Tim-Brooks added a commit to Tim-Brooks/elasticsearch that referenced this pull request Dec 11, 2018
Currently our handshake requests do not include a version. This is
unfortunate as we cannot rely on the stream version since it is not the
sending node's version. Instead it is the minimum compatibility version.
The handshake request is currently empty and we do nothing with it. This
should allow us to add data to the request without breaking backwards
compatibility.

This commit adds the version to the handshake request. Additionally, it
allows "future data" to be added to the request. This allows nodes to craft
a version compatible response. And will properly handle additional data in
future handshake requests. The proper handling of "future data" is useful
as this is the only request where we do not know the other node's version.

Finally, it renames the TcpTransportHandshaker to
TransportHandshaker.
Tim-Brooks added a commit that referenced this pull request Dec 12, 2018
Currently our handshake requests do not include a version. This is
unfortunate as we cannot rely on the stream version since it is not the
sending node's version. Instead it is the minimum compatibility version.
The handshake request is currently empty and we do nothing with it. This
should allow us to add data to the request without breaking backwards
compatibility.

This commit adds the version to the handshake request. Additionally, it
allows "future data" to be added to the request. This allows nodes to craft
a version compatible response. And will properly handle additional data in
future handshake requests. The proper handling of "future data" is useful
as this is the only request where we do not know the other node's version.

Finally, it renames the TcpTransportHandshaker to
TransportHandshaker.
@Tim-Brooks Tim-Brooks deleted the combine_handshakes branch December 18, 2019 14:47
@Tim-Brooks Tim-Brooks restored the combine_handshakes branch December 18, 2019 14:47
@Tim-Brooks Tim-Brooks deleted the combine_handshakes branch April 30, 2020 18:27
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

:Distributed Coordination/Network Http and internode communication implementations >non-issue v6.6.0 v7.0.0-beta1

Projects

None yet

Development

Successfully merging this pull request may close these issues.

5 participants