Disc: Move AbstractDisruptionTC to filebased D. #34461

original-brownbear · 2018-10-15T14:19:01Z

Move to file based discovery in AbstractDisruptionTestCase
- Don't use the ordinals of the unicast hosts at config time, use them when actually starting the cluster and only write the nodes whose id is in the unicast host array to the discovery file
Remove all code paths in ClusterDiscoveryConfiguration that become unused due to not using it for configuration in AbstractDisruptionTestCase anymore
Relates Fix port assignment and discovery in tests #33675

elasticmachine · 2018-10-15T14:19:03Z

Pinging @elastic/es-distributed

original-brownbear · 2018-10-15T14:19:36Z

@DaveCTurner take a look when you have a minute, hope this is in the direction of what we discussed last week :)

DaveCTurner · 2018-10-15T14:57:47Z

Yes, this is the right sort of direction to take. However, the crucial change that we're looking for is to make it so that ClusterDiscoveryConfiguration no longer selects a free port by binding to it, releasing it, then fixing it in the config file of the corresponding node, because this is what leads to our test failures.

I expect that if you make this change then a lot more of this machinery will turn out no longer to be needed, because ESIntegTestCase already allows for nodes to bind to any port and we don't find out which one until much later.

Another possible improvement would be for InternalTestCluster to become responsible for unicastHostsOrdinals, since it's responsible for writing the discovery file, and avoid passing it around when starting the cluster as we do today. In all existing tests we either set it to null or new int[]{0} so I think we could just use a boolean field, say hostsListContainsOnlyFirstNode, to reflect more clearly that we're not covering many cases here.

original-brownbear · 2018-10-15T15:19:00Z

@DaveCTurner

Yes, this is the right sort of direction to take. However, the crucial change that we're looking for is to make it so that ClusterDiscoveryConfiguration no longer selects a free port by binding to it, releasing it, then fixing it in the config file of the corresponding node, because this is what leads to our test failures.

See below, this fixes the situation for the AbstractDisruptionTestCase but you're right that it leaves one use of this port binding approach.

I expect that if you make this change then a lot more of this machinery will turn out no longer to be needed, because ESIntegTestCase already allows for nodes to bind to any port and we don't find out which one until much later.

I deleted what is not used anymore in here already. The only remaining use of ClusterDiscoveryConfiguration is org.elasticsearch.test.SecuritySettingsSource which extends it.
Should I try to remove that in the is PR as well?

Another possible improvement would be for InternalTestCluster to become responsible for unicastHostsOrdinals, since it's responsible for writing the discovery file, and avoid passing it around when starting the cluster as we do today. In all existing tests we either set it to null or new int[]{0} so I think we could just use a boolean field, say hostsListContainsOnlyFirstNode, to reflect more clearly that we're not covering many cases here.

Sounds good will do :)

original-brownbear · 2018-10-15T15:41:04Z

@DaveCTurner hmm with the changes in here the only user of the old port assignment is the calls from SecuritySettingsSource but they never made use of the facilities to only set a subset of hosts as unicast hosts anyway.
Can't I just remove all the code for finding the port and setting the unicast host list setting, set the discovery to file based in those tests and it's fine because the internal cluster will just automatically configure all the hosts in the discovery file?

DaveCTurner · 2018-10-15T15:43:45Z

Can't I just remove all the code for finding the port and setting the unicast host list setting, set the discovery to file based in those tests and it's fine because the internal cluster will just automatically configure all the hosts in the discovery file?

Yes, except for DiscoveryDisruptionIT which needs each discovery file to only refer to the first node.

The only remaining use of ClusterDiscoveryConfiguration is org.elasticsearch.test.SecuritySettingsSource which extends it.

I wonder if SecuritySettingsSource should just extend NodeConfigurationSource directly. Needs a bit of study to look at how it's being used, but that's what I'd try.

original-brownbear · 2018-10-15T15:45:54Z

Yes, except for DiscoveryDisruptionIT which needs each discovery file to only refer to the first node.

Yea but that's not using this code anymore anyway with this change :)

I wonder if SecuritySettingsSource should just extend NodeConfigurationSource directly. Needs a bit of study to look at how it's being used, but that's what I'd try.

It's the only thing that extends ClusterDiscoveryConfiguration so we could just flatten that I think yes.

DaveCTurner · 2018-10-15T16:00:20Z

Yea but that's not using this code anymore anyway with this change :)

Good point, I hadn't spotted that.

NB I think hostsListContainsOnlyFirstNode can be a field on InternalTestCluster that you set to true only in DiscoveryDisruptionIT, avoiding all the overloaded methods that are needed just to pass it in.

original-brownbear · 2018-10-15T16:15:40Z

NB I think hostsListContainsOnlyFirstNode can be a field on InternalTestCluster that you set to true only in DiscoveryDisruptionIT, avoiding all the overloaded constructors that are needed just to pass it in.

Sure that seems much nicer :) => on it

…now, ClusterDiscoveryConfiguration doesn't do port assignments anymore

original-brownbear · 2018-10-15T16:36:34Z

@DaveCTurner alright done.

Port assignments don't happen in ClusterDiscoveryConfiguration anymore and I made the hostsListContainsOnlyFirstNode a field in the internal test cluster :)

DaveCTurner

I suggested a few more simplifications that I think are worth investigating.

DaveCTurner · 2018-10-15T17:54:45Z

server/src/test/java/org/elasticsearch/discovery/AbstractDisruptionTestCase.java

+    List<String> startCluster(int numberOfNodes, int minimumMasterNode, boolean hostsListContainsOnlyFirstNode) {
+        configureCluster(numberOfNodes, minimumMasterNode);
+        InternalTestCluster internalCluster = internalCluster();
+        internalCluster.setHostsListContainsOnlyFirstNode(hostsListContainsOnlyFirstNode);


I think this call can completely move to DiscoveryDisruptionIT, removing another overload:

public void testUnicastSinglePingResponseContainsMaster() throws Exception { internalCluster().setHostsListContainsOnlyFirstNode(true); List<String> nodes = startCluster(4, -1);

etc...

DaveCTurner · 2018-10-15T18:04:00Z

server/src/test/java/org/elasticsearch/discovery/AbstractDisruptionTestCase.java

+    private NodeConfigurationSource discoveryConfig;

    @Override
    protected Settings nodeSettings(int nodeOrdinal) {


I would like to see if it's possible to pass in all the settings by implementing this method directly, rather than using the NodeConfigurationSource etc.

For instance it looks like these settings could be here:

.put(FaultDetection.PING_TIMEOUT_SETTING.getKey(), "1s") // for hitting simulated network failures quickly .put(FaultDetection.PING_RETRIES_SETTING.getKey(), "1") // for hitting simulated network failures quickly .put("discovery.zen.join_timeout", "10s") // still long to induce failures but to long so test won't time out .put(DiscoverySettings.PUBLISH_TIMEOUT_SETTING.getKey(), "1s") // <-- for hitting simulated network failures quickly .put(TransportService.TCP_CONNECT_TIMEOUT.getKey(), "10s") // Network delay disruption waits for the min between this //and .put(TestZenDiscovery.USE_MOCK_PINGS.getKey(), false).build();

I also think these aren't needed and we can just make use of the machinery in ESIntegTestCase to set them appropriately:

.put(NodeEnvironment.MAX_LOCAL_STORAGE_NODES_SETTING.getKey(), numberOfNodes) .put(ElectMasterService.DISCOVERY_ZEN_MINIMUM_MASTER_NODES_SETTING.getKey(), minimumMasterNode) .putList(DISCOVERY_HOSTS_PROVIDER_SETTING.getKey(), "file")

We use non-default settings in very few places; moving the affected tests into their own fixtures so they can override nodeSettings themselves seems worth investigating.

@DaveCTurner Let me try :) What makes this a little tricky (requiring more changes maybe?) is that we have this code in org.elasticsearch.discovery.ClusterDisruptionIT#testSearchWithRelocationAndSlowClusterStateProcessing

/** * This test creates a scenario where a primary shard (0 replicas) relocates and is in POST_RECOVERY on the target * node but already deleted on the source node. Search request should still work. */ public void testSearchWithRelocationAndSlowClusterStateProcessing() throws Exception { // don't use DEFAULT settings (which can cause node disconnects on a slow CI machine) configureCluster(Settings.EMPTY, 3, 1);

to override these defaults (though this could just be me not being so fluent in this code yet :)). Maybe do this in a follow up PR?

Right, that's one of the tests that I think could be in its own fixture since it needs different cluster settings.

@DaveCTurner wanna handle this here or can we move that to the next PR? (this one is already doing quite a few things)

Sure, a follow-up is ok.

DaveCTurner · 2018-10-15T18:14:57Z

server/src/test/java/org/elasticsearch/discovery/AbstractDisruptionTestCase.java

-    ) throws ExecutionException, InterruptedException {
+    void configureCluster(Settings settings, int numberOfNodes, int minimumMasterNode) {
        if (minimumMasterNode < 0) {
            minimumMasterNode = numberOfNodes / 2 + 1;


As far as I can tell, we allow tests to set minimumMasterNode to the "wrong" value only to allow dynamically adding nodes after the cluster has started. If so, it'd be better to allow ESIntegTestCase to manage this number for us.

@DaveCTurner Aren't we only ever passing -1 here from org.elasticsearch.discovery.AbstractDisruptionTestCase#startCluster(int, int) when we start the cluster? It seems like this is just a convenience method for not having to set a value by hand here?

Sorry, let me clarify. I do not think this method should receive an explicit minimumMasterNode value at all, whether -1 or otherwise. In almost all cases, ESIntegTestCase now does a better job of automatically managing this setting than is done here. It was, however, possible that some of the tests were using this feature to set this value to something other than numberOfNodes / 2 + 1, for instance:

elasticsearch/server/src/test/java/org/elasticsearch/discovery/ClusterDisruptionIT.java

Line 366 in d7ef985

configureCluster(Settings.EMPTY, 3, 1);

elasticsearch/server/src/test/java/org/elasticsearch/discovery/SnapshotDisruptionIT.java

Line 62 in d7ef985

configureCluster(settings, 4, 2);

elasticsearch/server/src/test/java/org/elasticsearch/discovery/DiscoveryDisruptionIT.java

Line 141 in 8fe964d

List<String> nodes = startCluster(2, 1);

In the first two cases this is actually correct because numberOfNodes also includes some data-only nodes. The last case is kinda strange - I think it really can form two clusters, and I'm not sure what it's actually testing.

@DaveCTurner given the short exchange in Slack, wanna keep this around for now maybe?

Yes, let's do this in a follow-up. NB the action to take here is to make this a cluster with 1 master node and 1 data node, which'd then mean that we can fall back on ESIntegTestCase's management of the cluster configuration.

DaveCTurner · 2018-10-15T18:15:38Z

test/framework/src/main/java/org/elasticsearch/test/InternalTestCluster.java

    private ServiceDisruptionScheme activeDisruptionScheme;
    private Function<Client, Client> clientWrapper;

+    // If set to tru only the first node in the cluster will be made a unicast node


nit: s/tru/true/

DaveCTurner · 2018-10-15T18:18:49Z

.../framework/src/main/java/org/elasticsearch/test/discovery/ClusterDiscoveryConfiguration.java

        private static int nextPort = calcBasePort();

        private final int[] unicastHostOrdinals;
        private final int[] unicastHostPorts;


Is this used any more? I think it can go away.

original-brownbear · 2018-10-16T07:37:57Z

@DaveCTurner all comments addressed I hope :) can you take another look? :)

DaveCTurner · 2018-10-16T08:22:48Z

x-pack/plugin/security/src/test/java/org/elasticsearch/test/SecuritySettingsSource.java

     */
    public SecuritySettingsSource(int numOfNodes, boolean sslEnabled, Path parentFolder, Scope scope) {
-        super(numOfNodes, DEFAULT_SETTINGS);
+        this.nodeSettings = Settings.builder().put(Settings.EMPTY).put(DEFAULT_SETTINGS).build();


.put(Settings.EMPTY) is a no-op. Also, public static final Settings DEFAULT_SETTINGS = Settings.EMPTY; means that .put(DEFAULT_SETTINGS) is also a no-op. I think this means that both nodeSettings and transportClientSettings are just Settings.EMPTY (and this means there are more no-ops elsewhere).

Right ... sorry for missing that :) Removed fields and their noop use.

DaveCTurner

I asked for three minor changes, then this is good to go.

DaveCTurner · 2018-10-16T10:59:39Z

x-pack/plugin/security/src/test/java/org/elasticsearch/test/SecuritySettingsSource.java

    public Settings transportClientSettings() {
-        Settings superSettings = super.transportClientSettings();
+        Settings superSettings = Settings.EMPTY;
        Settings.Builder builder = Settings.builder().put(superSettings);


I would inline superSettings and now there's another put(Settings.EMPTY) to nuke.

DaveCTurner · 2018-10-16T11:07:20Z

server/src/test/java/org/elasticsearch/discovery/AbstractDisruptionTestCase.java

-    ) throws ExecutionException, InterruptedException {
+    void configureCluster(Settings settings, int numberOfNodes, int minimumMasterNode) {
        if (minimumMasterNode < 0) {
            minimumMasterNode = numberOfNodes / 2 + 1;


Yes, let's do this in a follow-up. NB the action to take here is to make this a cluster with 1 master node and 1 data node, which'd then mean that we can fall back on ESIntegTestCase's management of the cluster configuration.

DaveCTurner · 2018-10-16T11:07:46Z

server/src/test/java/org/elasticsearch/discovery/AbstractDisruptionTestCase.java

+    private NodeConfigurationSource discoveryConfig;

    @Override
    protected Settings nodeSettings(int nodeOrdinal) {


Sure, a follow-up is ok.

DaveCTurner · 2018-10-16T11:09:54Z

server/src/test/java/org/elasticsearch/discovery/DiscoveryDisruptionIT.java

    public void testUnicastSinglePingResponseContainsMaster() throws Exception {
-        List<String> nodes = startCluster(4, -1, new int[]{0});
+        List<String> nodes = startCluster(4, -1);
+        internalCluster().setHostsListContainsOnlyFirstNode(true);


I think this has to happen before you start the cluster, or else the cluster will start with full knowledge.

Right fixing

DaveCTurner · 2018-10-16T11:10:02Z

server/src/test/java/org/elasticsearch/discovery/DiscoveryDisruptionIT.java

    public void testIsolatedUnicastNodes() throws Exception {
-        List<String> nodes = startCluster(4, -1, new int[]{0});
+        List<String> nodes = startCluster(4, -1);
+        internalCluster().setHostsListContainsOnlyFirstNode(true);


I think this has to happen before you start the cluster, or else the cluster will start with full knowledge.

Right fixing

original-brownbear · 2018-10-16T11:25:16Z

@DaveCTurner all addressed in 5503164 I think:)

DaveCTurner

LGTM, thanks @original-brownbear. I added this PR and the followup we discussed to the list in #33675.

original-brownbear · 2018-10-16T14:28:50Z

@DaveCTurner thanks for the review!

* Discovery: Move AbstractDisruptionTestCase to file-based discovery. * Relates #33675 * Simplify away ClusterDiscoveryConfiguration

* Discovery: Move AbstractDisruptionTestCase to file-based discovery. * Relates elastic#33675 * Simplify away ClusterDiscoveryConfiguration

* Discovery: Move AbstractDisruptionTestCase to file-based discovery. * Relates #33675 * Simplify away ClusterDiscoveryConfiguration

Disc: Move AbstractDisruptionTC to filebased D.

d7ef985

* Relates elastic#33675

original-brownbear added >test Issues or PRs that are addressing/adding tests v7.0.0 :Distributed Coordination/Cluster Coordination Cluster formation and cluster state publication, including cluster membership and fault detection. v6.4.0 labels Oct 15, 2018

original-brownbear requested a review from DaveCTurner October 15, 2018 14:19

jasontedor added v6.5.0 and removed v6.4.0 labels Oct 15, 2018

CR: Pass boolean instead of ordinal list

427b7c3

CR: hostsListContainsOnlyFirstNode is a field in InternalTestCluster …

72f34e5

…now, ClusterDiscoveryConfiguration doesn't do port assignments anymore

DaveCTurner reviewed Oct 15, 2018

View reviewed changes

original-brownbear added 2 commits October 16, 2018 08:21

CR: Remove now dead code

199c41e

Simplify away ClusterDiscoveryConfiguration

8fe964d

DaveCTurner reviewed Oct 16, 2018

View reviewed changes

CR: Remove noop settings actions in tests

7b4072d

DaveCTurner reviewed Oct 16, 2018

View reviewed changes

DaveCTurner mentioned this pull request Oct 16, 2018

Fix port assignment and discovery in tests #33675

Closed

6 tasks

original-brownbear added 2 commits October 16, 2018 13:14

Merge remote-tracking branch 'elastic/master' into 33675

7f9b849

CR: Fix call order + remove Settings.EMPTY noop

5503164

DaveCTurner approved these changes Oct 16, 2018

View reviewed changes

original-brownbear merged commit ea576a8 into elastic:master Oct 16, 2018

original-brownbear deleted the 33675 branch October 16, 2018 14:28

kcm pushed a commit that referenced this pull request Oct 30, 2018

Disc: Move AbstractDisruptionTC to filebased D. (#34461)

a098151

* Discovery: Move AbstractDisruptionTestCase to file-based discovery. * Relates #33675 * Simplify away ClusterDiscoveryConfiguration

original-brownbear added a commit that referenced this pull request Nov 16, 2018

Disc: Move AbstractDisruptionTC to filebased D. (#34461)

fc2c5cf

* Discovery: Move AbstractDisruptionTestCase to file-based discovery. * Relates #33675 * Simplify away ClusterDiscoveryConfiguration

jimczi added v7.0.0-beta1 and removed v7.0.0 labels Feb 7, 2019

Disc: Move AbstractDisruptionTC to filebased D. #34461

Disc: Move AbstractDisruptionTC to filebased D. #34461

Uh oh!

Conversation

original-brownbear commented Oct 15, 2018

Uh oh!

elasticmachine commented Oct 15, 2018

Uh oh!

original-brownbear commented Oct 15, 2018

Uh oh!

DaveCTurner commented Oct 15, 2018

Uh oh!

original-brownbear commented Oct 15, 2018 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

original-brownbear commented Oct 15, 2018

Uh oh!

DaveCTurner commented Oct 15, 2018

Uh oh!

original-brownbear commented Oct 15, 2018

Uh oh!

DaveCTurner commented Oct 15, 2018 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

original-brownbear commented Oct 15, 2018

Uh oh!

original-brownbear commented Oct 15, 2018

Uh oh!

DaveCTurner left a comment

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

original-brownbear commented Oct 16, 2018

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

DaveCTurner left a comment

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

original-brownbear commented Oct 15, 2018 •

edited

Loading

DaveCTurner commented Oct 15, 2018 •

edited

Loading