Support vectorized append and compare for multi group by #12996

Rachelint · 2024-10-18T09:35:12Z

Which issue does this PR close?

Closes #.

Related to

Rationale for this change

Although GroupValuesColumn is stored the multi gourp by values in column oriented way.

However, it still use row oriented approach to perform append and equal to.

The most obvious overhead is that we need to downcast the array when processing each row, and instructions for downcast is actually not few, and even worse it will introduce branches.
And as I guess, the row oriented approach will also increase the random memory accesses but I am not sure.

What changes are included in this PR?

This pr introduce the vectorized append and vectorized equal to for GroupValuesColumn.

But such vectorized appoach is not compatible with streaming aggregation depending on the order between input rows and their corresponding gourp indices.

So I define a new VectorizedGroupValuesColumn for optimizing non streaming aggregation cases, and keep the original GroupValuesColumn for the streaming aggregation cases.

Are these changes tested?

Yes, I think enough new unit tests are added.

Are there any user-facing changes?

No.

Dandandan · 2024-10-18T10:09:10Z

datafusion/physical-plan/src/aggregates/group_values/group_column.rs


+    fn append_non_nullable_val(&mut self, array: &ArrayRef, row: usize) {
+        if NULLABLE {
+            self.nulls.append(false);


This could be optimized to append nulls for entire batch instead of per value

Yes, I plan to refactor the interface for supporting input a rows: &[usize].
And make all parts' appending vectorized, and see the performance again.

(i.e. remove it here and call it in such a way we use https://docs.rs/arrow/latest/arrow/array/struct.BooleanBufferBuilder.html#method.append_n

I add the append_batch function to support vectorized append more better.
But the improvement seems still not obvious. #12996 (comment)

🤔 I guess, it is likely due the new introduced branch of equal_to:

if *group_idx < group_values_len { for (i, group_val) in self.group_values.iter().enumerate() { if !check_row_equal(group_val.as_ref(), *group_idx, &cols[i], row) { return false; } } } else { let row_idx_offset = group_idx - group_values_len; let row_idx = self.append_rows_buffer[row_idx_offset]; return is_rows_eq(cols, row, cols, row_idx).unwrap(); }

To eliminate this extra branch, I think we need to refactor the intern process metioned in #12821 (comment)

I am trying it.

Rachelint · 2024-10-19T11:10:11Z

The latest benchmark numbers:

--------------------
Benchmark clickbench_1.json
--------------------
┏━━━━━━━━━━━━━━┳━━━━━━━━━━━━┳━━━━━━━━━━━━━━━━━━━━━━━━┳━━━━━━━━━━━━━━━┓
┃ Query        ┃       main ┃ vectorize-append-value ┃        Change ┃
┡━━━━━━━━━━━━━━╇━━━━━━━━━━━━╇━━━━━━━━━━━━━━━━━━━━━━━━╇━━━━━━━━━━━━━━━┩
│ QQuery 0     │     0.67ms │                 0.68ms │     no change │
│ QQuery 1     │    67.01ms │                65.25ms │     no change │
│ QQuery 2     │   165.14ms │               157.75ms │     no change │
│ QQuery 3     │   181.43ms │               181.83ms │     no change │
│ QQuery 4     │  1566.65ms │              1574.95ms │     no change │
│ QQuery 5     │  1539.79ms │              1532.81ms │     no change │
│ QQuery 6     │    61.01ms │                57.01ms │ +1.07x faster │
│ QQuery 7     │    77.09ms │                73.02ms │ +1.06x faster │
│ QQuery 8     │  1971.64ms │              1762.88ms │ +1.12x faster │
│ QQuery 9     │  1921.59ms │              1903.47ms │     no change │
│ QQuery 10    │   516.35ms │               499.35ms │     no change │
│ QQuery 11    │   590.99ms │               556.80ms │ +1.06x faster │
│ QQuery 12    │  1814.14ms │              1816.26ms │     no change │
│ QQuery 13    │  2956.07ms │              2954.48ms │     no change │
│ QQuery 14    │  2054.42ms │              1940.82ms │ +1.06x faster │
│ QQuery 15    │  1899.87ms │              1873.73ms │     no change │
│ QQuery 16    │  4066.16ms │              3744.25ms │ +1.09x faster │
│ QQuery 17    │  3629.16ms │              3428.06ms │ +1.06x faster │
│ QQuery 18    │  8282.13ms │              7646.27ms │ +1.08x faster │
│ QQuery 19    │   144.20ms │               146.30ms │     no change │
│ QQuery 20    │  3222.65ms │              3224.85ms │     no change │
│ QQuery 21    │  3924.86ms │              3913.65ms │     no change │
│ QQuery 22    │  9144.86ms │              9022.44ms │     no change │
│ QQuery 23    │ 23875.41ms │             23664.41ms │     no change │
│ QQuery 24    │  1123.53ms │              1132.05ms │     no change │
│ QQuery 25    │  1011.03ms │              1002.87ms │     no change │
│ QQuery 26    │  1326.71ms │              1319.49ms │     no change │
│ QQuery 27    │  4666.49ms │              4662.07ms │     no change │
│ QQuery 28    │ 24069.75ms │             24145.85ms │     no change │
│ QQuery 29    │   902.07ms │               890.73ms │     no change │
│ QQuery 30    │  1813.79ms │              1722.40ms │ +1.05x faster │
│ QQuery 31    │  2008.03ms │              1977.28ms │     no change │
│ QQuery 32    │  7369.56ms │              7601.38ms │     no change │
│ QQuery 33    │  9752.79ms │              9742.50ms │     no change │
│ QQuery 34    │  9716.57ms │              9696.95ms │     no change │
│ QQuery 35    │  2760.71ms │              2244.23ms │ +1.23x faster │
│ QQuery 36    │   255.12ms │               241.01ms │ +1.06x faster │
│ QQuery 37    │   158.70ms │               154.80ms │     no change │
│ QQuery 38    │   155.15ms │               153.09ms │     no change │
│ QQuery 39    │   595.64ms │               587.48ms │     no change │
│ QQuery 40    │    57.09ms │                60.69ms │  1.06x slower │
│ QQuery 41    │    53.32ms │                52.81ms │     no change │
│ QQuery 42    │    65.53ms │                65.13ms │     no change │
└──────────────┴────────────┴────────────────────────┴───────────────┘

Dandandan · 2024-10-20T13:26:50Z

datafusion/physical-plan/src/aggregates/group_values/column.rs

-    core(array, row);
+struct AggregationHashTable<T: AggregationHashTableEntry> {
+    /// Raw table storing values in a `Vec`
+    raw_table: Vec<T>,


Based on some experiments in changing hash join algorithm, I think it's likely hashbrown performs much better than implementing a hashtable ourselves although I would like to be surprised 🙂

Based on some experiments in changing hash join algorithm, I think it's likely hashbrown performs much better than implementing a hashtable ourselves although I would like to be surprised 🙂

🤔 Even if we can perform something like vectorized compare or vectorized append in our hashtable?

I found in multi group by case, we will perform the compare for each row leading to the array downcasting again and again... And actually the downcast operation will be compiled to many asm codes....

And I foudn we can't eliminate it and perform the vectorized compare with hashbrown...

fn equal_to_inner(&self, lhs_row: usize, array: &ArrayRef, rhs_row: usize) -> bool { let array = array.as_byte_view::<B>();

We can still do "vectorized compare" by doing the lookup in the hashtable (based on hash value only) and the vectorized equality check separately. That way you still can use the fast hashtable, but move the equality check to a separate/vectorized step.
That's at least what is done in the vectorized hash join implementation :). I changed it before to use a Vec-based index like you did here, but that performed significantly worse.

The reason I think is that the lookup is incredibly well optimized using the swiss table design and you get fewer 'false" candidates to check for, while we can still use the vectorized/type specialized equality check.

Make sense, thank you!

Rachelint · 2024-10-26T03:27:30Z

The logic is a bit complex, I plan to finish and do benchmark for it today.

alamb · 2024-11-04T21:24:18Z

This is top of my list to review tomorrow morning

alamb · 2024-11-04T22:06:26Z

This is top of my list to review tomorrow morning

I am sorry -- I am just finding other PRs like #12978 and #13133 very subtle and take a long time to review (aka write tests for / help make sure they are still correct)

jayzhan211

👍

alamb · 2024-11-05T18:26:43Z

I am giving this a final review now

alamb · 2024-11-05T19:24:54Z

Performance results:

--------------------
Benchmark clickbench_partitioned.json
--------------------
┏━━━━━━━━━━━━━━┳━━━━━━━━━━━━┳━━━━━━━━━━━━━━━━━━━━━━━━┳━━━━━━━━━━━━━━━┓
┃ Query        ┃  main_base ┃ vectorize-append-value ┃        Change ┃
┡━━━━━━━━━━━━━━╇━━━━━━━━━━━━╇━━━━━━━━━━━━━━━━━━━━━━━━╇━━━━━━━━━━━━━━━┩
│ QQuery 0     │     2.30ms │                 2.33ms │     no change │
│ QQuery 1     │    40.32ms │                40.38ms │     no change │
│ QQuery 2     │    96.98ms │                97.06ms │     no change │
│ QQuery 3     │   106.77ms │               108.36ms │     no change │
│ QQuery 4     │   912.91ms │               923.46ms │     no change │
│ QQuery 5     │   957.04ms │               943.27ms │     no change │
│ QQuery 6     │    36.29ms │                35.65ms │     no change │
│ QQuery 7     │    44.35ms │                44.05ms │     no change │
│ QQuery 8     │  1374.70ms │              1026.74ms │ +1.34x faster │
│ QQuery 9     │  1349.37ms │              1354.92ms │     no change │
│ QQuery 10    │   308.40ms │               287.66ms │ +1.07x faster │
│ QQuery 11    │   358.62ms │               321.64ms │ +1.11x faster │
│ QQuery 12    │  1003.91ms │               981.15ms │     no change │
│ QQuery 13    │  1542.79ms │              1470.92ms │     no change │
│ QQuery 14    │  1076.66ms │               913.23ms │ +1.18x faster │
│ QQuery 15    │  1080.68ms │              1107.04ms │     no change │
│ QQuery 16    │  2434.58ms │              1986.02ms │ +1.23x faster │
│ QQuery 17    │  2243.82ms │              1854.36ms │ +1.21x faster │
│ QQuery 18    │  5145.29ms │              4294.07ms │ +1.20x faster │
│ QQuery 19    │    98.01ms │               100.58ms │     no change │
│ QQuery 20    │  1259.01ms │              1273.34ms │     no change │
│ QQuery 21    │  1524.57ms │              1495.15ms │     no change │
│ QQuery 22    │  2711.65ms │              2661.01ms │     no change │
│ QQuery 23    │  8991.12ms │              8565.66ms │     no change │
│ QQuery 24    │   521.71ms │               515.62ms │     no change │
│ QQuery 25    │   434.70ms │               423.71ms │     no change │
│ QQuery 26    │   594.60ms │               584.15ms │     no change │
│ QQuery 27    │  1884.39ms │              1857.91ms │     no change │
│ QQuery 28    │ 12978.56ms │             13103.89ms │     no change │
│ QQuery 29    │   530.69ms │               538.63ms │     no change │
│ QQuery 30    │  1023.13ms │               897.27ms │ +1.14x faster │
│ QQuery 31    │  1044.11ms │               956.21ms │ +1.09x faster │
│ QQuery 32    │  4300.17ms │              4064.21ms │ +1.06x faster │
│ QQuery 33    │  4063.10ms │              4043.20ms │     no change │
│ QQuery 34    │  4084.94ms │              4073.62ms │     no change │
│ QQuery 35    │  1926.27ms │              1355.58ms │ +1.42x faster │
│ QQuery 36    │   239.44ms │               231.08ms │     no change │
│ QQuery 37    │    96.47ms │                97.42ms │     no change │
│ QQuery 38    │   140.95ms │               142.52ms │     no change │
│ QQuery 39    │   513.28ms │               443.86ms │ +1.16x faster │
│ QQuery 40    │    57.47ms │                55.78ms │     no change │
│ QQuery 41    │    48.74ms │                50.94ms │     no change │
│ QQuery 42    │    62.29ms │                63.98ms │     no change │
└──────────────┴────────────┴────────────────────────┴───────────────┘
┏━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━┳━━━━━━━━━━━━┓
┃ Benchmark Summary                     ┃            ┃
┡━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━╇━━━━━━━━━━━━┩
│ Total Time (main_base)                │ 69245.17ms │
│ Total Time (vectorize-append-value)   │ 65387.65ms │
│ Average Time (main_base)              │  1610.35ms │
│ Average Time (vectorize-append-value) │  1520.64ms │
│ Queries Faster                        │         12 │
│ Queries Slower                        │          0 │
│ Queries with No Change                │         31 │
└───────────────────────────────────────┴────────────┘

--------------------
Benchmark clickbench_extended.json
--------------------
┏━━━━━━━━━━━━━━┳━━━━━━━━━━━━┳━━━━━━━━━━━━━━━━━━━━━━━━┳━━━━━━━━━━━━━━━┓
┃ Query        ┃  main_base ┃ vectorize-append-value ┃        Change ┃
┡━━━━━━━━━━━━━━╇━━━━━━━━━━━━╇━━━━━━━━━━━━━━━━━━━━━━━━╇━━━━━━━━━━━━━━━┩
│ QQuery 0     │  2783.88ms │              2821.00ms │     no change │
│ QQuery 1     │   690.58ms │               679.33ms │     no change │
│ QQuery 2     │  1435.85ms │              1364.87ms │     no change │
│ QQuery 3     │   781.53ms │               708.00ms │ +1.10x faster │
│ QQuery 4     │ 12395.12ms │             12441.79ms │     no change │
│ QQuery 5     │ 19443.67ms │             19077.87ms │     no change │
└──────────────┴────────────┴────────────────────────┴───────────────┘
┏━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━┳━━━━━━━━━━━━┓
┃ Benchmark Summary                     ┃            ┃
┡━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━╇━━━━━━━━━━━━┩
│ Total Time (main_base)                │ 37530.62ms │
│ Total Time (vectorize-append-value)   │ 37092.86ms │
│ Average Time (main_base)              │  6255.10ms │
│ Average Time (vectorize-append-value) │  6182.14ms │
│ Queries Faster                        │          1 │
│ Queries Slower                        │          0 │
│ Queries with No Change                │          5 │
└───────────────────────────────────────┴────────────┘

--------------------
Benchmark tpch_sf1.json
--------------------
┏━━━━━━━━━━━━━━┳━━━━━━━━━━━┳━━━━━━━━━━━━━━━━━━━━━━━━┳━━━━━━━━━━━━━━━┓
┃ Query        ┃ main_base ┃ vectorize-append-value ┃        Change ┃
┡━━━━━━━━━━━━━━╇━━━━━━━━━━━╇━━━━━━━━━━━━━━━━━━━━━━━━╇━━━━━━━━━━━━━━━┩
│ QQuery 1     │  227.03ms │               223.21ms │     no change │
│ QQuery 2     │  117.37ms │               118.32ms │     no change │
│ QQuery 3     │  131.23ms │               112.98ms │ +1.16x faster │
│ QQuery 4     │   80.11ms │                82.46ms │     no change │
│ QQuery 5     │  161.69ms │               157.51ms │     no change │
│ QQuery 6     │   43.57ms │                43.28ms │     no change │
│ QQuery 7     │  208.38ms │               195.13ms │ +1.07x faster │
│ QQuery 8     │  168.85ms │               163.41ms │     no change │
│ QQuery 9     │  246.40ms │               245.05ms │     no change │
│ QQuery 10    │  203.72ms │               205.29ms │     no change │
│ QQuery 11    │   94.72ms │                92.35ms │     no change │
│ QQuery 12    │  100.17ms │               115.46ms │  1.15x slower │
│ QQuery 13    │  212.30ms │               208.76ms │     no change │
│ QQuery 14    │   83.73ms │                70.51ms │ +1.19x faster │
│ QQuery 15    │  104.72ms │               112.26ms │  1.07x slower │
│ QQuery 16    │   72.68ms │                69.40ms │     no change │
│ QQuery 17    │  202.88ms │               208.88ms │     no change │
│ QQuery 18    │  309.89ms │               322.50ms │     no change │
│ QQuery 19    │  121.13ms │               118.12ms │     no change │
│ QQuery 20    │  139.03ms │               122.02ms │ +1.14x faster │
│ QQuery 21    │  260.01ms │               253.21ms │     no change │
│ QQuery 22    │   67.71ms │                67.18ms │     no change │
└──────────────┴───────────┴────────────────────────┴───────────────┘

🚀

alamb

👏 @Rachelint @jayzhan211 @2010YOUY01 and @Dandandan. What great teamwork

This PR is really nice in my opinion. It makes a super tricky and performance sensitive part of the code about as clear as I could imagine it to be.

I also ran some code coverage on this

nice cargo llvm-cov --html test --test fuzz -- aggregate
nice cargo llvm-cov --html test -p datafusion-physical-plan -- group_values

And verified that the new code was well covered

alamb · 2024-11-05T18:29:49Z

datafusion/physical-plan/src/aggregates/group_values/column.rs

 }

-impl GroupValuesColumn {
+/// Buffers to store intermediate results in `vectorized_append`


alamb · 2024-11-05T18:47:04Z

datafusion/physical-plan/src/aggregates/group_values/column.rs

+
    /// Create a new instance of GroupValuesColumn if supported for the specified schema
    pub fn try_new(schema: SchemaRef) -> Result<Self> {
        let map = RawTable::with_capacity(0);


This with_capacity can probably be improved (as a follow on PR) to avoid some smaller allocations

alamb · 2024-11-05T18:48:57Z

datafusion/physical-plan/src/aggregates/group_values/column.rs

+    /// `Group indices` order are against with their input order, and this will lead to error
+    /// in `streaming aggregation`.
+    ///
+    fn scalarized_intern(


this is basically the same as GroupValuesColumn::intern was previously, which makes sense to me

alamb · 2024-11-05T19:00:14Z

datafusion/physical-plan/src/aggregates/group_values/group_column.rs

    fn equal_to(&self, lhs_row: usize, array: &ArrayRef, rhs_row: usize) -> bool;
+
    /// Appends the row at `row` in `array` to this builder
    fn append_val(&mut self, array: &ArrayRef, row: usize);


Maybe as a follow on we can consider removing append_val and equal_to and simpl change all codepaths to use the vectorized version

I am a bit worried about if we merge them, some extra if else will be introduced.
It hurt much for performance for the row level operation.

A good thing to benchmark (as a follow on PR) perhaps

alamb · 2024-11-05T19:01:13Z

datafusion/physical-plan/src/aggregates/group_values/group_column.rs

+    /// it will record the `true` result at the corresponding
+    /// position in `equal_to_results`.
+    ///
+    /// And if found nth result in `equal_to_results` is already


this is quite clever to pass in the existing "is equal to results"

alamb · 2024-11-05T19:02:35Z

datafusion/physical-plan/src/aggregates/group_values/group_column.rs

+
+            (false, _) => {
+                for &row in rows {
+                    self.group_values.push(arr.value(row));


that uf possible the inner loop just looks like this (memcopy!)

😆 I think we can even do more, like check if rows.len() == array.len(), if so we just perform extend.

I think we already could use extend instead of push? extend on Vec is somewhat faster than push as the capacity check / allocation is done once instead of once per value.

I think there are several things that could be done to make the append even faster:

extend_from_slice if rows.len() == array.len()

use extend rather than push for values

Speed up appending nulls (don't append bits one by one)

I think we already could use extend instead of push? extend on Vec is somewhat faster than push as the capacity check / allocation is done once instead of once per value.

Ok, I got it, I think again and found it indeed simple to do it!

I think there are several things that could be done to make the append even faster:

1. `extend_from_slice` `if rows.len() == array.len()` 2. use `extend` rather than `push` for values 3. Speed up appending nulls (don't append bits one by one)

I filed an issue to tracking the potential improvements for vecotrized operations.
#13275

alamb · 2024-11-05T19:35:58Z

datafusion/physical-plan/src/aggregates/group_values/group_column.rs

        };
    }

+    fn vectorized_equal_to(


What i have been dreaming about with @XiangpengHao is maybe something like adding take / filter to arrow array builders

I took this opportunity to write up the idea (finally) for your amusement:

Optimize take/filter/concat from multiple input arrays to a single large output array arrow-rs#6692

alamb · 2024-11-05T19:53:32Z

As my admittedly sparse help for this PR I have filed some additional tickets for follow on work after this PR is merged:

Reorganize the GroupColumn implementations into more coherent modules #13262
Implement Specialized GroupColumn for Date/Time/Timestamp types for multi-column GROUP BY #13263

alamb · 2024-11-06T16:10:01Z

I don't think we need to wait on this PR anymore, let's merge it in and keep moving forward. Thank you everyone again!

alamb · 2025-02-04T11:55:35Z

Update here is that this is looking like it results in some sweet clickbench improvements:

Update ClickBench benchmarks with DataFusion 44.0.0 #13983 (comment)

* simple support vectorized append. * fix tests. * some logs. * add `append_n` in `MaybeNullBufferBuilder`. * impl basic append_batch * fix equal to. * define `GroupIndexContext`. * define the structs useful in vectorizing. * re-define some structs for vectorized operations. * impl some vectorized logics. * impl chekcing hashmap stage. * fix compile. * tmp * define and impl `vectorized_compare`. * fix compile. * impl `vectorized_equal_to`. * impl `vectorized_append`. * finish the basic vectorized ops logic. * impl `take_n`. * fix `renaming clear` and `groups fill`. * fix death loop due to rehashing. * fix vectorized append. * add counter. * use extend rather than resize. * remove dbg!. * remove reserve. * refactor the codes to make simpler and more performant. * clear `scalarized_indices` in `intern` to avoid some corner case. * fix `scalarized_equal_to`. * fallback to total scalarized `GroupValuesColumn` in streaming aggregation. * add unit test for `VectorizedGroupValuesColumn`. * add unit test for emitting first n in `VectorizedGroupValuesColumn`. * sort out tests codes in for group columns and add vectorized tests for primitives. * add vectorized test for byte builder. * add vectorized test for byte view builder. * add test for the all nulls or not nulls branches in vectorized. * fix clippy. * fix fmt. * fix compile in rust 1.79. * improve comments. * fix doc. * add more comments to explain the really complex vectorized intern process. * add comments to explain why we still need origin `GroupValuesColumn`. * remove some stale comments. * fix clippy. * add comments for `vectorized_equal_to` and `vectorized_append`. * fix clippy. * use zip to simplify codes. * use izip to simplify codes. * Update datafusion/physical-plan/src/aggregates/group_values/group_column.rs Co-authored-by: Jay Zhan <[email protected]> * first_n attempt Signed-off-by: jayzhan211 <[email protected]> * add test Signed-off-by: jayzhan211 <[email protected]> * improve hashtable modifying in emit first n test. * add `emit_group_index_list_buffer` to avoid allocating new `Vec` to store the remaining gourp indices. * make comments in VectorizedGroupValuesColumn::intern simpler and clearer. * define `VectorizedOperationBuffers` to hold buffers used in vectorized operations to make code clearer. * unify `VectorizedGroupValuesColumn` and `GroupValuesColumn`. * fix fmt. * fix comments. * fix clippy. --------- Signed-off-by: jayzhan211 <[email protected]> Co-authored-by: Jay Zhan <[email protected]> (cherry picked from commit 345117b)

… v44 * simple support vectorized append. * fix tests. * some logs. * add `append_n` in `MaybeNullBufferBuilder`. * impl basic append_batch * fix equal to. * define `GroupIndexContext`. * define the structs useful in vectorizing. * re-define some structs for vectorized operations. * impl some vectorized logics. * impl chekcing hashmap stage. * fix compile. * tmp * define and impl `vectorized_compare`. * fix compile. * impl `vectorized_equal_to`. * impl `vectorized_append`. * finish the basic vectorized ops logic. * impl `take_n`. * fix `renaming clear` and `groups fill`. * fix death loop due to rehashing. * fix vectorized append. * add counter. * use extend rather than resize. * remove dbg!. * remove reserve. * refactor the codes to make simpler and more performant. * clear `scalarized_indices` in `intern` to avoid some corner case. * fix `scalarized_equal_to`. * fallback to total scalarized `GroupValuesColumn` in streaming aggregation. * add unit test for `VectorizedGroupValuesColumn`. * add unit test for emitting first n in `VectorizedGroupValuesColumn`. * sort out tests codes in for group columns and add vectorized tests for primitives. * add vectorized test for byte builder. * add vectorized test for byte view builder. * add test for the all nulls or not nulls branches in vectorized. * fix clippy. * fix fmt. * fix compile in rust 1.79. * improve comments. * fix doc. * add more comments to explain the really complex vectorized intern process. * add comments to explain why we still need origin `GroupValuesColumn`. * remove some stale comments. * fix clippy. * add comments for `vectorized_equal_to` and `vectorized_append`. * fix clippy. * use zip to simplify codes. * use izip to simplify codes. * Update datafusion/physical-plan/src/aggregates/group_values/group_column.rs Co-authored-by: Jay Zhan <[email protected]> * first_n attempt Signed-off-by: jayzhan211 <[email protected]> * add test Signed-off-by: jayzhan211 <[email protected]> * improve hashtable modifying in emit first n test. * add `emit_group_index_list_buffer` to avoid allocating new `Vec` to store the remaining gourp indices. * make comments in VectorizedGroupValuesColumn::intern simpler and clearer. * define `VectorizedOperationBuffers` to hold buffers used in vectorized operations to make code clearer. * unify `VectorizedGroupValuesColumn` and `GroupValuesColumn`. * fix fmt. * fix comments. * fix clippy. --------- Signed-off-by: jayzhan211 <[email protected]> Co-authored-by: Jay Zhan <[email protected]> (cherry picked from commit 345117b)

Rachelint added 3 commits October 17, 2024 02:36

simple support vectorized append.

be6a67d

fix tests.

2cdf05d

some logs.

04ea2d2

github-actions bot added the physical-expr Changes to the physical-expr crates label Oct 18, 2024

Rachelint mentioned this pull request Oct 18, 2024

Remove unnecessary null checks in GroupColumns #12944

Closed

Dandandan reviewed Oct 18, 2024

View reviewed changes

Rachelint added 3 commits October 19, 2024 17:01

add append_n in MaybeNullBufferBuilder.

a83c2ea

impl basic append_batch

3df75ac

fix equal to.

13c9489

Rachelint changed the title ~~POC: Vectorize append value~~ POC: Vectorized hashtable for aggregation Oct 20, 2024

Dandandan reviewed Oct 20, 2024

View reviewed changes

Rachelint added 2 commits October 22, 2024 02:40

define GroupIndexContext.

5fd63e8

define the structs useful in vectorizing.

d4b5820

Rachelint force-pushed the vectorize-append-value branch from 601c7b2 to d4b5820 Compare October 21, 2024 18:59

Rachelint added 9 commits October 22, 2024 13:55

re-define some structs for vectorized operations.

04f35bb

impl some vectorized logics.

d215937

impl chekcing hashmap stage.

2af6ff5

fix compile.

473914a

tmp

14f8881

define and impl vectorized_compare.

ebbeb5a

fix compile.

dad79c0

impl vectorized_equal_to.

1a7c2eb

impl vectorized_append.

d79b813

Rachelint force-pushed the vectorize-append-value branch from 3415659 to d79b813 Compare October 23, 2024 07:36

finish the basic vectorized ops logic.

6edc646

Rachelint added 3 commits October 26, 2024 16:55

impl take_n.

150248f

fix renaming clear and groups fill.

37d68e6

fix death loop due to rehashing.

ebd9db9

fix fmt.

e4bd579

Rachelint force-pushed the vectorize-append-value branch from e0635fe to c6f8074 Compare November 4, 2024 15:22

fix comments.

14fffb8

Rachelint force-pushed the vectorize-append-value branch from c6f8074 to 14fffb8 Compare November 4, 2024 15:24

fix clippy.

d479cc2

alamb mentioned this pull request Nov 4, 2024

[EPIC] Improvements to GroupColumn multi-column aggregation performance #12680

Closed

14 tasks

jayzhan211 approved these changes Nov 4, 2024

View reviewed changes

alamb mentioned this pull request Nov 5, 2024

[Epic] High cardinality aggregation performance wishlist #11679

Open

4 tasks

alamb mentioned this pull request Nov 5, 2024

Optimize take/filter/concat from multiple input arrays to a single large output array apache/arrow-rs#6692

Open

alamb approved these changes Nov 5, 2024

View reviewed changes

alamb mentioned this pull request Nov 5, 2024

Reorganize the GroupColumn implementations into more coherent modules #13262

Closed

alamb mentioned this pull request Nov 5, 2024

Nov 5. 2024: This week in DataFusion #13265

Closed

3 tasks

Rachelint mentioned this pull request Nov 6, 2024

Improve vectorized operations of GroupColumn #13275

Open

alamb merged commit 345117b into apache:main Nov 6, 2024
25 checks passed

alamb mentioned this pull request Nov 15, 2024

Update ClickBench benchmarks with DataFusion 43.0.0 #13099

Closed

1 task

Dandandan mentioned this pull request Nov 17, 2024

Improve performance of ClickBench Q18, Q35, #13449

Open

3 tasks

This was referenced Nov 20, 2024

Nov 20. 2024: This week in DataFusion #13503

Closed

Release DataFusion 44.0.0 #13334

Closed

Rachelint mentioned this pull request Nov 25, 2024

POC: Remove unnecessary null check for GroupColumn #12947

Closed

alamb mentioned this pull request Jan 2, 2025

Update ClickBench benchmarks with DataFusion 44.0.0 #13983

Closed

Rachelint mentioned this pull request May 3, 2025

Intermediate result blocked approach to aggregation memory management #15591

Closed

Support vectorized append and compare for multi group by #12996

Support vectorized append and compare for multi group by #12996

Uh oh!

Conversation

Rachelint commented Oct 18, 2024 • edited by alamb Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Which issue does this PR close?

Rationale for this change

What changes are included in this PR?

Are these changes tested?

Are there any user-facing changes?

Uh oh!

Dandandan Oct 18, 2024 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Rachelint Oct 19, 2024 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Rachelint Oct 19, 2024 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Rachelint commented Oct 19, 2024 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Dandandan Oct 20, 2024 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Rachelint commented Oct 26, 2024

Uh oh!

alamb commented Nov 4, 2024

Uh oh!

alamb commented Nov 4, 2024

Uh oh!

jayzhan211 left a comment

Choose a reason for hiding this comment

Uh oh!

alamb commented Nov 5, 2024

Uh oh!

alamb commented Nov 5, 2024

Uh oh!

alamb left a comment

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Rachelint commented Oct 18, 2024 •

edited by alamb

Loading

Dandandan Oct 18, 2024 •

edited

Loading

Rachelint Oct 19, 2024 •

edited

Loading

Rachelint Oct 19, 2024 •

edited

Loading

Rachelint commented Oct 19, 2024 •

edited

Loading

Dandandan Oct 20, 2024 •

edited

Loading

Rachelint Nov 6, 2024 •

edited

Loading