Skip to content

Conversation

@tlrx
Copy link
Member

@tlrx tlrx commented Nov 14, 2018

This pull request exposes two new methods in the IndexShard and TransportReplicationAction classes in order to allow transport replication actions to acquire all index shard operation permits for their execution.

It first adds the acquirePrimaryAllOperationsPermits() and the acquireReplicaAllOperationsPermits() methods to the IndexShard class which allow to acquire all operations permits on a shard while exposing a Releasable. It also refactors the TransportReplicationAction class to expose two protected methods (acquirePrimaryOperationPermit() and acquireReplicaOperationPermit()) that can be overridden when a transport replication action requires the acquisition of all permits on primary and/or replica shard during execution.

Finally, it adds a TransportReplicationAllPermitsAcquisitionTests which illustrates how a transport replication action can grab all permits before adding a cluster block in the cluster state, making subsequent operations that requires a single permit to fail (such test has been discussed in #35332 (comment)).

Related to elastic #33888

@tlrx tlrx added >enhancement v7.0.0 :Distributed Indexing/Distributed A catch all label for anything in the Distributed Indexing Area. Please avoid if you can. v6.6.0 labels Nov 14, 2018
@tlrx tlrx requested review from bleskes, s1monw and ywelsch November 14, 2018 12:54
@elasticmachine
Copy link
Collaborator

Pinging @elastic/es-distributed

Copy link
Contributor

@bleskes bleskes left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thx @tlrx. I left initial feedback on the production call.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

why is the targetAllocaitonID passed here? I guess it should be validated, but it isn't?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is a leftover that was helpful in TransportReplicationAllPermitsAcquisitionTests to dinstinguish between the primary and replica shard when executing a transport replication test action.

I reverted this and changed how the action in test retrieves the index shard.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

nit: acquireAllPrimaryOperationPermits?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Done

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm not so comfortable with separating this code from the one in updatePrimaryTermIfNeeded - they are tightly connected. Instead of sharing code this way, how about creating a callback that will run:

indexShardOperationPermits.acquire(listener, executorOnDelay, true, debugInfo);

or

indexShardOperationPermits.asyncBlockOperations(listener, timeout.duration(), timeout.timeUnit());

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I share your concerns and I updated the code, let me know what you think.

@tlrx
Copy link
Member Author

tlrx commented Nov 15, 2018

Thanks @bleskes, I addressed your first bunch of comments. I also added comments to the TransportReplicationAllPermitsAcquisitionTests and reworked it a bit so that it is easier to understand.

Copy link
Contributor

@bleskes bleskes left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM. I left one suggestion for a potential follow up PR.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is indicates a funky property of getActiveOperationsCount. To date all the blocks were internal to the shard and thus 0 was a good return. I don't really have a good solution in mind. At the very least we should document this on getActiveOperationsCount. Maybe 1 (we know there is only one exclusive op) is a better return value? /cc @ywelsch

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I take the point, thanks.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

sorry, I missed the ping here. I think that 1 is ok too, but then we should in general change the terminology in IndexShardOperationPermits to move from the notion of blocking operations to running exclusive operations (similar to a read/write lock).

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

agreed.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

🙈

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I know, not really proud of this... I'll take a look at how to improve this test.

tlrx added 4 commits November 22, 2018 12:20
…cationAction

This commit adds the acquirePrimaryAllOperationsPermits() and the
acquireReplicaAllOperationsPermits() methods to the IndexShard class.
These methods allow to acquire all operations permits on a primary or
a replica shard and can be used in future transport replication actions
to acquire all permits instead of a single one.

Related to elastic elastic#33888
@tlrx tlrx force-pushed the add-indexshard-acquire-all-permits branch from 4865718 to ef255a2 Compare November 22, 2018 12:11
@tlrx tlrx merged commit 2e37f17 into elastic:master Nov 23, 2018
@tlrx tlrx deleted the add-indexshard-acquire-all-permits branch November 23, 2018 08:27
tlrx added a commit that referenced this pull request Nov 23, 2018
…Action (#35540)

This pull request exposes two new methods in the IndexShard and
TransportReplicationAction classes in order to allow transport replication
actions to acquire all index shard operation permits for their execution.

It first adds the acquireAllPrimaryOperationPermits() and the
acquireAllReplicaOperationsPermits() methods to the IndexShard class
which allow to acquire all operations permits on a shard while exposing
a Releasable. It also refactors the TransportReplicationAction class to
expose two protected methods (acquirePrimaryOperationPermit() and
acquireReplicaOperationPermit()) that can be overridden when a transport
replication action requires the acquisition of all permits on primary and/or
replica shard during execution.

Finally, it adds a TransportReplicationAllPermitsAcquisitionTests which
 illustrates how a transport replication action can grab all permits before
adding a cluster block in the cluster state, making subsequent operations
that requires a single permit to fail).

Related to elastic #33888
original-brownbear pushed a commit that referenced this pull request Nov 23, 2018
…Action (#35540)

This pull request exposes two new methods in the IndexShard and 
TransportReplicationAction classes in order to allow transport replication 
actions to acquire all index shard operation permits for their execution.

It first adds the acquireAllPrimaryOperationPermits() and the 
acquireAllReplicaOperationsPermits() methods to the IndexShard class 
which allow to acquire all operations permits on a shard while exposing 
a Releasable. It also refactors the TransportReplicationAction class to 
expose two protected methods (acquirePrimaryOperationPermit() and 
acquireReplicaOperationPermit()) that can be overridden when a transport 
replication action requires the acquisition of all permits on primary and/or 
replica shard during execution.

Finally, it adds a TransportReplicationAllPermitsAcquisitionTests which
 illustrates how a transport replication action can grab all permits before 
adding a cluster block in the cluster state, making subsequent operations 
that requires a single permit to fail).

Related to elastic #33888
@tlrx tlrx mentioned this pull request Dec 5, 2018
50 tasks
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

:Distributed Indexing/Distributed A catch all label for anything in the Distributed Indexing Area. Please avoid if you can. >enhancement v6.6.0 v7.0.0-beta1

Projects

None yet

Development

Successfully merging this pull request may close these issues.

5 participants