-
Notifications
You must be signed in to change notification settings - Fork 1.8k
perf(NODE-4727): Improve performance of buffering and cursors #3447
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
baa95be to
805a972
Compare
| break; | ||
| } | ||
| } | ||
| this.sessions.prune(session => session.hasTimedOut(sessionTimeoutMinutes)); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
this is technically not the same logic as the removed logic, though arguably more correct than the removed logic
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yea we would end early, however the tests we have accept this, we can add another to demonstrate the change, from the perspective of the goal here we are cleaning up unusable sessions, we're just doing more cleaning in one go
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
it would be nice to have a proper test for this, if it's not too much trouble
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Modified an existing test, if you look at what the test used to be it nearly should've broke with this change, it just was written to put all the timed out sessions at the end of the list so the early break was never used
src/utils.ts
Outdated
| const kBuffers = Symbol('buffers'); | ||
| /** @internal */ | ||
| const kLength = Symbol('length'); | ||
| type ListNode<T> = { value: T; next: ListNode<T>; prev: ListNode<T> }; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
instead of all the TS gymnastics for the purpose of initialization, can we instead define the types better to account for the null node since that actually is a legitimate instance in an empty list?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think Neal's approach is correct. With the exception of during initialization, a Node will never have null next or prevs. If we adjust the ListNode type to account for null pointers, then we need to check for null everywhere we use them.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yea, so the null value is intended to be an indicator of nothing, the List class is supposed to always have a truthy value of your choosing or return null indicating it is empty.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This would be a fine argument if the ListNode was returned to the users, but it's actually just used for the internal implementation of the list, and saying that the things that are currently typed as ListNode<T> always have something in them when the list methods interact with them is simply not true. The point of the TS type here is to help make sure that the internal implementation of the list accounts for all possible values of ListNode. My suggestion is to explicitly define head: HeadNode<T> | NullNode where HeadNode<T> = { value: null; next: ListNode<T>; prev: ListNode<T> } and NullNode = { value: null; next: NullNode; prev: NullNode }. It would make it more clear that the head never contains an actual value of its own. Furthermore, setting prev: and next: of ListNode<T> to be ListNode<T> | HeadNode<T> would likewise account for those scenarios of the node in question being at the start or end of the list.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I've added some better typing, so now that our head node is special it moves some of the casting but I think it improves the understanding here that head is special and holds null and the rest are the items in the list.
NullNode = { value: null; next: NullNode; prev: NullNode }
Trying this doesn't get type value. In theory node !== this head should narrow away the HeadNode to make node only a ListNode, thus making it's value the correct T, but this reference check isn't something the language supports. (Unless I'm missing something)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
From slack: some more improvements were just pushed to rid us of some casting and the casting that is left is clear where the edge case is with narrowing the type
src/utils.ts
Outdated
| const kBuffers = Symbol('buffers'); | ||
| /** @internal */ | ||
| const kLength = Symbol('length'); | ||
| type ListNode<T> = { value: T; next: ListNode<T>; prev: ListNode<T> }; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think Neal's approach is correct. With the exception of during initialization, a Node will never have null next or prevs. If we adjust the ListNode type to account for null pointers, then we need to check for null everywhere we use them.
|
|
||
| const iterations = 100; | ||
| const defaultItemsSize = 100000; | ||
| const makeBigArray = (length = defaultItemsSize) => Array.from({ length }).fill(1); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
You use arrow functions here and functions below. Can we choose one and be consistent throughout the file?
My preference is always functions but it's up to you.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
sure, made them consistent
| console.log('\n' + '-'.repeat(155)); | ||
| }; | ||
|
|
||
| void (function testArrayShift() { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Any reason for the IIFEs? Why not just throw them inside a main and call the main function?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Added a main
| @@ -0,0 +1,109 @@ | |||
| /* eslint-disable no-restricted-modules */ | |||
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
question - is this something we should consider adding to our benchmarks that show up in CI now?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Perhaps, but to some extent we are testing nodejs / v8 and another library, I'm in favor of testing lets file something that captures the array of variables there.
src/utils.ts
Outdated
| * If BufferPool contains 4 bytes or more construct an int32 from the leading bytes, | ||
| * otherwise return null. Size can be negative, caller should error check. | ||
| */ | ||
| readSize(): number | null { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'm not sure this belongs in the buffer pool since it's specific to reading the length of an OpMsg message. Can we put this logic in a free-function on the MessageStream instead? Or consider making our generic BufferPool specific to OpMsgs?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It's specific to reading an int32 from the front of a buffer not OpMsg (the same logic goes for OpQuery). Grabbing a leading int32 is a common enough operation in our line of work that I don't think it adds bloat to keep it in this class, we only ever call read with a value based on the int32 that is at the leading 4 bytes of the current buffers
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Okay, if we keep it let's name it readInt32, not readSize
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
done
src/utils.ts
Outdated
| if (size === this.length) { | ||
| result = Buffer.concat(this[kBuffers]); | ||
| // We know we have enough, we just don't know how it is spread across chunks | ||
| // TODO(NODE-XXXX): alloc API should change based on raw option |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
jira ticket?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
added!
| first(): T | null { | ||
| // If the list is empty, value will be the sentinel's null | ||
| return this.head.next.value; | ||
| } | ||
|
|
||
| last(): T | null { | ||
| // If the list is empty, value will be the sentinel's null | ||
| return this.head.prev.value; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
nit: for consistency with other stack/queues I've worked with, could we consider using peek in the method names to make it clear that these methods do not modify the list? peekFront and peekBack or something similar.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
https://kotlinlang.org/api/latest/jvm/stdlib/kotlin.collections/first.html
https://ruby-doc.org/core-2.4.1/Array.html#method-i-first
https://doc.rust-lang.org/std/vec/struct.Vec.html#method.first
I've used plenty of examples of first/last in my past so I'll leave it as it, I've added a tsdoc to help clarify the methods.
src/utils.ts
Outdated
| this.head.next = newNode; | ||
| } | ||
|
|
||
| private remove(node?: ListNode<T> | null): T | null { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
for all methods that return values - could we consider returning an object { value: T } instead of just T to allow for nullish types to be stored in the list? We may as well just properly do this now to avoid issues down the line like we're facing with cursors.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I don't think its necessary for the cases we use the list for, nullish values work inside this list, just not null and that is intentional
732b889 to
0f679d1
Compare
Co-authored-by: Daria Pardue <[email protected]>
Description
What is changing?
Anywhere we used arrays or an external dependency I've migrated us to our own linked list implementation.
What is the motivation for this change?
In v4 the cursor implementation was changed to shift documents from the front of a list which has a negative impact of moving every item in the list up by one index. I considered tracking an offset on the cursor, but after wrestling with a number of tests that reveled the many locations the index would need to be incremented and decremented to correctly track the next document, it seemed worth using a data structure fitting the job.
LinkedList
I offer our own implementation of linked list which is relatively simple to follow and for us to make future adjustments to if needed. It is circularly linked with an always defined sentinel head node to remove the need for null checks on next/prev. I've added unit test benchmarks, I'll post in a follow up comment.
BufferPool
The linked list now backs our BufferPool implementation so that buffers can be removed from the head of list as needed for reading. I removed the logic for consuming / not consuming buffers as it added unneeded complexity to the read implementation. The non consume read was being used to read an int32 of the top of the current buffers, instead I've added a function specialized for that job, and read will always consume from the pool.
AbstractCursor
We now pay an upfront cost of constructing a list from the array returned to us by BSON (ideally BSON could be asked to build this List), one iteration through the documents to make a List so we can remove documents from the front. This improves the time it takes to
shiftand keeps the memory usage of a cursor consistent with what we had prior to this change. (Keeping an offset meant also maintaining the entire array of documents in the cursor until the nextBatch was received)SessionPool & ConnectionPool & ServerSelection
The above all do not require random access to the data they manage, migrating to a LL seems like an easy win, but I can easily roll these changes back.
Double check the following
npm run check:lintscript<type>(NODE-xxxx)<!>: <description>