-
Notifications
You must be signed in to change notification settings - Fork 25.6k
Disc: Move AbstractDisruptionTC to filebased D. #34461
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
|
Pinging @elastic/es-distributed |
|
@DaveCTurner take a look when you have a minute, hope this is in the direction of what we discussed last week :) |
|
Yes, this is the right sort of direction to take. However, the crucial change that we're looking for is to make it so that I expect that if you make this change then a lot more of this machinery will turn out no longer to be needed, because Another possible improvement would be for |
See below, this fixes the situation for the
I deleted what is not used anymore in here already. The only remaining use of
Sounds good will do :) |
|
@DaveCTurner hmm with the changes in here the only user of the old port assignment is the calls from |
Yes, except for
I wonder if |
Yea but that's not using this code anymore anyway with this change :)
It's the only thing that extends |
Good point, I hadn't spotted that. NB I think |
Sure that seems much nicer :) => on it |
…now, ClusterDiscoveryConfiguration doesn't do port assignments anymore
|
@DaveCTurner alright done. Port assignments don't happen in |
DaveCTurner
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I suggested a few more simplifications that I think are worth investigating.
| List<String> startCluster(int numberOfNodes, int minimumMasterNode, boolean hostsListContainsOnlyFirstNode) { | ||
| configureCluster(numberOfNodes, minimumMasterNode); | ||
| InternalTestCluster internalCluster = internalCluster(); | ||
| internalCluster.setHostsListContainsOnlyFirstNode(hostsListContainsOnlyFirstNode); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think this call can completely move to DiscoveryDisruptionIT, removing another overload:
public void testUnicastSinglePingResponseContainsMaster() throws Exception {
internalCluster().setHostsListContainsOnlyFirstNode(true);
List<String> nodes = startCluster(4, -1);
etc...
| private NodeConfigurationSource discoveryConfig; | ||
|
|
||
| @Override | ||
| protected Settings nodeSettings(int nodeOrdinal) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I would like to see if it's possible to pass in all the settings by implementing this method directly, rather than using the NodeConfigurationSource etc.
For instance it looks like these settings could be here:
.put(FaultDetection.PING_TIMEOUT_SETTING.getKey(), "1s") // for hitting simulated network failures quickly
.put(FaultDetection.PING_RETRIES_SETTING.getKey(), "1") // for hitting simulated network failures quickly
.put("discovery.zen.join_timeout", "10s") // still long to induce failures but to long so test won't time out
.put(DiscoverySettings.PUBLISH_TIMEOUT_SETTING.getKey(), "1s") // <-- for hitting simulated network failures quickly
.put(TransportService.TCP_CONNECT_TIMEOUT.getKey(), "10s") // Network delay disruption waits for the min between this
//and
.put(TestZenDiscovery.USE_MOCK_PINGS.getKey(), false).build();
I also think these aren't needed and we can just make use of the machinery in ESIntegTestCase to set them appropriately:
.put(NodeEnvironment.MAX_LOCAL_STORAGE_NODES_SETTING.getKey(), numberOfNodes)
.put(ElectMasterService.DISCOVERY_ZEN_MINIMUM_MASTER_NODES_SETTING.getKey(), minimumMasterNode)
.putList(DISCOVERY_HOSTS_PROVIDER_SETTING.getKey(), "file")
We use non-default settings in very few places; moving the affected tests into their own fixtures so they can override nodeSettings themselves seems worth investigating.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@DaveCTurner Let me try :) What makes this a little tricky (requiring more changes maybe?) is that we have this code in org.elasticsearch.discovery.ClusterDisruptionIT#testSearchWithRelocationAndSlowClusterStateProcessing
/**
* This test creates a scenario where a primary shard (0 replicas) relocates and is in POST_RECOVERY on the target
* node but already deleted on the source node. Search request should still work.
*/
public void testSearchWithRelocationAndSlowClusterStateProcessing() throws Exception {
// don't use DEFAULT settings (which can cause node disconnects on a slow CI machine)
configureCluster(Settings.EMPTY, 3, 1);to override these defaults (though this could just be me not being so fluent in this code yet :)). Maybe do this in a follow up PR?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Right, that's one of the tests that I think could be in its own fixture since it needs different cluster settings.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@DaveCTurner wanna handle this here or can we move that to the next PR? (this one is already doing quite a few things)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Sure, a follow-up is ok.
| ) throws ExecutionException, InterruptedException { | ||
| void configureCluster(Settings settings, int numberOfNodes, int minimumMasterNode) { | ||
| if (minimumMasterNode < 0) { | ||
| minimumMasterNode = numberOfNodes / 2 + 1; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
As far as I can tell, we allow tests to set minimumMasterNode to the "wrong" value only to allow dynamically adding nodes after the cluster has started. If so, it'd be better to allow ESIntegTestCase to manage this number for us.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@DaveCTurner Aren't we only ever passing -1 here from org.elasticsearch.discovery.AbstractDisruptionTestCase#startCluster(int, int) when we start the cluster? It seems like this is just a convenience method for not having to set a value by hand here?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Sorry, let me clarify. I do not think this method should receive an explicit minimumMasterNode value at all, whether -1 or otherwise. In almost all cases, ESIntegTestCase now does a better job of automatically managing this setting than is done here. It was, however, possible that some of the tests were using this feature to set this value to something other than numberOfNodes / 2 + 1, for instance:
elasticsearch/server/src/test/java/org/elasticsearch/discovery/ClusterDisruptionIT.java
Line 366 in d7ef985
| configureCluster(Settings.EMPTY, 3, 1); |
elasticsearch/server/src/test/java/org/elasticsearch/discovery/SnapshotDisruptionIT.java
Line 62 in d7ef985
| configureCluster(settings, 4, 2); |
elasticsearch/server/src/test/java/org/elasticsearch/discovery/DiscoveryDisruptionIT.java
Line 141 in 8fe964d
| List<String> nodes = startCluster(2, 1); |
In the first two cases this is actually correct because numberOfNodes also includes some data-only nodes. The last case is kinda strange - I think it really can form two clusters, and I'm not sure what it's actually testing.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@DaveCTurner given the short exchange in Slack, wanna keep this around for now maybe?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yes, let's do this in a follow-up. NB the action to take here is to make this a cluster with 1 master node and 1 data node, which'd then mean that we can fall back on ESIntegTestCase's management of the cluster configuration.
| private ServiceDisruptionScheme activeDisruptionScheme; | ||
| private Function<Client, Client> clientWrapper; | ||
|
|
||
| // If set to tru only the first node in the cluster will be made a unicast node |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
nit: s/tru/true/
| private static int nextPort = calcBasePort(); | ||
|
|
||
| private final int[] unicastHostOrdinals; | ||
| private final int[] unicastHostPorts; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Is this used any more? I think it can go away.
|
@DaveCTurner all comments addressed I hope :) can you take another look? :) |
| */ | ||
| public SecuritySettingsSource(int numOfNodes, boolean sslEnabled, Path parentFolder, Scope scope) { | ||
| super(numOfNodes, DEFAULT_SETTINGS); | ||
| this.nodeSettings = Settings.builder().put(Settings.EMPTY).put(DEFAULT_SETTINGS).build(); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
.put(Settings.EMPTY) is a no-op. Also, public static final Settings DEFAULT_SETTINGS = Settings.EMPTY; means that .put(DEFAULT_SETTINGS) is also a no-op. I think this means that both nodeSettings and transportClientSettings are just Settings.EMPTY (and this means there are more no-ops elsewhere).
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Right ... sorry for missing that :) Removed fields and their noop use.
DaveCTurner
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I asked for three minor changes, then this is good to go.
| public Settings transportClientSettings() { | ||
| Settings superSettings = super.transportClientSettings(); | ||
| Settings superSettings = Settings.EMPTY; | ||
| Settings.Builder builder = Settings.builder().put(superSettings); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I would inline superSettings and now there's another put(Settings.EMPTY) to nuke.
| ) throws ExecutionException, InterruptedException { | ||
| void configureCluster(Settings settings, int numberOfNodes, int minimumMasterNode) { | ||
| if (minimumMasterNode < 0) { | ||
| minimumMasterNode = numberOfNodes / 2 + 1; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yes, let's do this in a follow-up. NB the action to take here is to make this a cluster with 1 master node and 1 data node, which'd then mean that we can fall back on ESIntegTestCase's management of the cluster configuration.
| private NodeConfigurationSource discoveryConfig; | ||
|
|
||
| @Override | ||
| protected Settings nodeSettings(int nodeOrdinal) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Sure, a follow-up is ok.
| public void testUnicastSinglePingResponseContainsMaster() throws Exception { | ||
| List<String> nodes = startCluster(4, -1, new int[]{0}); | ||
| List<String> nodes = startCluster(4, -1); | ||
| internalCluster().setHostsListContainsOnlyFirstNode(true); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think this has to happen before you start the cluster, or else the cluster will start with full knowledge.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Right fixing
| public void testIsolatedUnicastNodes() throws Exception { | ||
| List<String> nodes = startCluster(4, -1, new int[]{0}); | ||
| List<String> nodes = startCluster(4, -1); | ||
| internalCluster().setHostsListContainsOnlyFirstNode(true); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think this has to happen before you start the cluster, or else the cluster will start with full knowledge.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Right fixing
|
@DaveCTurner all addressed in 5503164 I think:) |
DaveCTurner
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM, thanks @original-brownbear. I added this PR and the followup we discussed to the list in #33675.
|
@DaveCTurner thanks for the review! |
* Discovery: Move AbstractDisruptionTestCase to file-based discovery. * Relates #33675 * Simplify away ClusterDiscoveryConfiguration
* Discovery: Move AbstractDisruptionTestCase to file-based discovery. * Relates elastic#33675 * Simplify away ClusterDiscoveryConfiguration
* Discovery: Move AbstractDisruptionTestCase to file-based discovery. * Relates #33675 * Simplify away ClusterDiscoveryConfiguration
AbstractDisruptionTestCaseClusterDiscoveryConfigurationthat become unused due to not using it for configuration inAbstractDisruptionTestCaseanymore