-
Notifications
You must be signed in to change notification settings - Fork 25.6k
Fix race condition in RemoteClusterConnection node supplier #25432
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
This commit fixes a race condition in the node supplier used by the RemoteClusterConnection. The node supplier stores an iterator over a set backed by a ConcurrentHashMap, but the get operation of the supplier uses multiple methods of the iterator and is suceptible to a race between the calls to hasNext() and next(). The test in this commit fails under the old implementation with a NoSuchElementException. This commit adds a wrapper object over a set and a list, with all methods being synchronized to avoid races. Additionally, iterators are no longer used and replaced with a counter and index based access of an array list to maintain the round robin aspect of the previous node supplier implementation.
|
@pickypg observed the following errors that were caused by this race condition: |
s1monw
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
left a comment, thanks for the fix @jaymode
| // nodes, this class uses a counter and retrieves by index from the list. The arraylist enables us to do this in O(1). | ||
| private final Set<DiscoveryNode> nodeSet = new HashSet<>(); | ||
| private final List<DiscoveryNode> nodeList = new ArrayList<>(); | ||
| private final AtomicInteger counter = new AtomicInteger(0); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
this can be a simple integer now?
| // this classes uses both a set and a list to support faster operations. Insertion and contains for this class are O(1) thanks | ||
| // to the use of the set and removal is O(n) due to the arraylist. In order to support a round-robin scheme through the connected | ||
| // nodes, this class uses a counter and retrieves by index from the list. The arraylist enables us to do this in O(1). | ||
| private final Set<DiscoveryNode> nodeSet = new HashSet<>(); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I don't like to have 2 data structures that we have to keep in sync. if you really wanna optimize for removal / insertion we can just use an array or an array list and sort it? I doubt we should to be honest. Maybe we just go with a list and linear scan or we use a set only and the same iterator approach and with the synchronization in place we can just set the iterator to null if we changed the set? that might be the easiest
s1monw
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM
This commit fixes a race condition in the node supplier used by the RemoteClusterConnection. The node supplier stores an iterator over a set backed by a ConcurrentHashMap, but the get operation of the supplier uses multiple methods of the iterator and is suceptible to a race between the calls to hasNext() and next(). The test in this commit fails under the old implementation with a NoSuchElementException. This commit adds a wrapper object over a set and a iterator, with all methods being synchronized to avoid races. Modifications to the set result in the iterator being set to null and the next retrieval creates a new iterator.
This commit fixes a race condition in the node supplier used by the RemoteClusterConnection. The node supplier stores an iterator over a set backed by a ConcurrentHashMap, but the get operation of the supplier uses multiple methods of the iterator and is suceptible to a race between the calls to hasNext() and next(). The test in this commit fails under the old implementation with a NoSuchElementException. This commit adds a wrapper object over a set and a iterator, with all methods being synchronized to avoid races. Modifications to the set result in the iterator being set to null and the next retrieval creates a new iterator.
This commit fixes a race condition in the node supplier used by the RemoteClusterConnection. The
node supplier stores an iterator over a set backed by a ConcurrentHashMap, but the get operation
of the supplier uses multiple methods of the iterator and is suceptible to a race between the
calls to hasNext() and next(). The test in this commit fails under the old implementation with a
NoSuchElementException. This commit adds a wrapper object over a set and a list, with all methods
being synchronized to avoid races. Additionally, iterators are no longer used and replaced with a
counter and index based access of an array list to maintain the round robin aspect of the previous
node supplier implementation.