Skip to content

Conversation

@ashwanthgoli
Copy link
Contributor

@ashwanthgoli ashwanthgoli commented Oct 31, 2025

What this PR does / why we need it:

  1. Update Compact to slice contiguous rows

    • Compact in topK appends record slices containing a single row before concatenating them. That creates K records each with 1 row for every compaction cycle which adds overhead allocating record metadata. This can be improved by slicing contiguous ranges of a record that belong to topk, relaxing ordering guarantees of TopK allows us to pick larger ranges.
    • With this change topK is not guarenteed to return the top K entries in sorted order, it is upto the caller to sort the entries. Stream builder is update accordingly
  2. Replace compareScalars with compareArrays as the former results in allocations.

  3. Update mapper to not use schema.Fields() which creates a copy of all fields.

goos: darwin
goarch: arm64
pkg: github.com/grafana/loki/v3/pkg/logql/bench
cpu: Apple M1 Pro
                                                                                 │     main     │   with_compact_contiguous_slices   │         with_compare_array         │      with_mapper_field_lookup      │
                                                                                 │    sec/op    │   sec/op     vs base               │   sec/op     vs base               │   sec/op     vs base               │
LogQL/query={region="ap-southeast-1"}_[BACKWARD]/kind=log/store=dataobj-engine-8   17.294 ± ∞ ¹   4.220 ± ∞ ¹  -75.60% (p=0.008 n=5)   3.779 ± ∞ ¹  -78.15% (p=0.008 n=5)   3.499 ± ∞ ¹  -79.77% (p=0.008 n=5)

                                                                                 │     main      │    with_compact_contiguous_slices    │          with_compare_array          │       with_mapper_field_lookup       │
                                                                                 │     B/op      │     B/op       vs base               │     B/op       vs base               │     B/op       vs base               │
LogQL/query={region="ap-southeast-1"}_[BACKWARD]/kind=log/store=dataobj-engine-8   55.98Gi ± ∞ ¹   14.06Gi ± ∞ ¹  -74.88% (p=0.008 n=5)   12.70Gi ± ∞ ¹  -77.31% (p=0.008 n=5)   10.16Gi ± ∞ ¹  -81.85% (p=0.008 n=5)

                                                                                 │     main     │   with_compact_contiguous_slices    │         with_compare_array          │      with_mapper_field_lookup       │
                                                                                 │  allocs/op   │  allocs/op    vs base               │  allocs/op    vs base               │  allocs/op    vs base               │
LogQL/query={region="ap-southeast-1"}_[BACKWARD]/kind=log/store=dataobj-engine-8   901.7M ± ∞ ¹   192.9M ± ∞ ¹  -78.61% (p=0.008 n=5)   125.4M ± ∞ ¹  -86.10% (p=0.008 n=5)   125.0M ± ∞ ¹  -86.14% (p=0.008 n=5)

Which issue(s) this PR fixes:
Fixes #

Special notes for your reviewer:

Checklist

  • Reviewed the CONTRIBUTING.md guide (required)
  • Documentation added
  • Tests updated
  • Title matches the required conventional commits format, see here
    • Note that Promtail is considered to be feature complete, and future development for logs collection will be in Grafana Alloy. As such, feat PRs are unlikely to be accepted unless a case can be made for the feature actually being a bug fix to existing behavior.
  • Changes that require user attention or interaction to upgrade are documented in docs/sources/setup/upgrade/_index.md
  • If the change is deprecating or removing a configuration option, update the deprecated-config.yaml and deleted-config.yaml files respectively in the tools/deprecated-config-checker directory. Example PR

@ashwanthgoli ashwanthgoli changed the title chore(topk): compact: reduce allocs by appending contiguous slices chore(topk): improvements to reduce alloc bytes and alloc space Nov 3, 2025
@ashwanthgoli ashwanthgoli marked this pull request as ready for review November 3, 2025 10:00
@ashwanthgoli ashwanthgoli requested a review from a team as a code owner November 3, 2025 10:00
// This record contains a nil sort key to test the behaviour of
// NullsFirst.
{"ts": nil, "table": "D", "line": "line A"},
{"table": "D", "line": "line A"},
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

i had to remove "ts": nil as it was creating the record with incorrect schema (timestamp type set to null) which was resulting in two timestamp fields in the compacted schema

Copy link
Contributor

@benclive benclive left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Nice, LGTM though you may want a review from someone closer to the engine code.

I like the approach of sorting the output on the stream builder, plus the performance benefits are great, so its a win-win I think.


compactor := arrowagg.NewRecords(memory.DefaultAllocator)
for rec, rows := range recordRows {
slices.Sort(rows)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is it even possible that rows is not sorted?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

yeah i think so atleast incase of global topK as the records returned by local topK are no more guaranteed to be sorted

Comment on lines 214 to 220
case *array.Float16:
right := right.(*array.Float16)
return left.Value(leftIdx).Cmp(right.Value(rightIdx)), nil

case *array.Float32:
right := right.(*array.Float32)
return cmp.Compare(left.Value(leftIdx), right.Value(rightIdx)), nil
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't think that we need to support all array types, only the the ones used by the engine.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

removed most of the non-relevant ones in 7f5c378

Copy link
Member

@rfratto rfratto left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

can you update the doc comment on physical.TopK to remove the mention that it does a SORT? Once #19672 is merged, the comment in the protobuf will also need to be updated

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

Projects

None yet

Development

Successfully merging this pull request may close these issues.

4 participants