- 
                Notifications
    
You must be signed in to change notification settings  - Fork 3.8k
 
chore(topk): improvements to reduce alloc bytes and alloc space #19660
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: main
Are you sure you want to change the base?
Conversation
4f12f62    to
    804b43c      
    Compare
  
    | // This record contains a nil sort key to test the behaviour of | ||
| // NullsFirst. | ||
| {"ts": nil, "table": "D", "line": "line A"}, | ||
| {"table": "D", "line": "line A"}, | 
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
i had to remove "ts": nil as it was creating the record with incorrect schema (timestamp type set to null) which was resulting in two timestamp fields in the compacted schema
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Nice, LGTM though you may want a review from someone closer to the engine code.
I like the approach of sorting the output on the stream builder, plus the performance benefits are great, so its a win-win I think.
| 
               | 
          ||
| compactor := arrowagg.NewRecords(memory.DefaultAllocator) | ||
| for rec, rows := range recordRows { | ||
| slices.Sort(rows) | 
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Is it even possible that rows is not sorted?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
yeah i think so atleast incase of global topK as the records returned by local topK are no more guaranteed to be sorted
| case *array.Float16: | ||
| right := right.(*array.Float16) | ||
| return left.Value(leftIdx).Cmp(right.Value(rightIdx)), nil | ||
| 
               | 
          ||
| case *array.Float32: | ||
| right := right.(*array.Float32) | ||
| return cmp.Compare(left.Value(leftIdx), right.Value(rightIdx)), nil | 
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I don't think that we need to support all array types, only the the ones used by the engine.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
removed most of the non-relevant ones in 7f5c378
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
can you update the doc comment on physical.TopK to remove the mention that it does a SORT? Once #19672 is merged, the comment in the protobuf will also need to be updated
What this PR does / why we need it:
Update
Compactto slice contiguous rowsCompactin topK appends record slices containing a single row before concatenating them. That creates K records each with 1 row for every compaction cycle which adds overhead allocating record metadata. This can be improved by slicing contiguous ranges of a record that belong to topk, relaxing ordering guarantees of TopK allows us to pick larger ranges.Replace
compareScalarswithcompareArraysas the former results in allocations.Update
mapperto not useschema.Fields()which creates a copy of all fields.Which issue(s) this PR fixes:
Fixes #
Special notes for your reviewer:
Checklist
CONTRIBUTING.mdguide (required)featPRs are unlikely to be accepted unless a case can be made for the feature actually being a bug fix to existing behavior.docs/sources/setup/upgrade/_index.mddeprecated-config.yamlanddeleted-config.yamlfiles respectively in thetools/deprecated-config-checkerdirectory. Example PR