Skip to content

Conversation

@jayzhan211
Copy link
Contributor

@jayzhan211 jayzhan211 commented Jul 6, 2024

Which issue does this PR close?

As @findepi pointed out in #11274 (comment) that most of the aggregate function does not return non-null result if no row qualified. I double check the result in Postgres and Duckdb and find out they does not return empty list for array_agg. I think we can follow the behaviour as they did.

I also hope this can make dealing with nullability simpler

  1. Check the nullability of each (aggregate) function
  2. Make sure nullable is helpful for logical optimizer.

Closes #.

Rationale for this change

What changes are included in this PR?

Are these changes tested?

Are there any user-facing changes?

The result of array agg is changed

@github-actions github-actions bot added physical-expr Changes to the physical-expr crates sqllogictest SQL Logic Tests (.slt) labels Jul 6, 2024
@jayzhan211 jayzhan211 changed the title Change array agg semantic for empty result from empty list to empty row Change array agg result from empty list to empty row if no row qualifed Jul 6, 2024
@jayzhan211 jayzhan211 changed the title Change array agg result from empty list to empty row if no row qualifed Change array agg result from empty list to null if no row qualifed Jul 6, 2024
Signed-off-by: jayzhan211 <[email protected]>
Signed-off-by: jayzhan211 <[email protected]>
@github-actions github-actions bot added the logical-expr Logical plan and expressions label Jul 6, 2024
Signed-off-by: jayzhan211 <[email protected]>
Signed-off-by: jayzhan211 <[email protected]>
@github-actions github-actions bot added the core Core DataFusion crate label Jul 6, 2024
@jayzhan211 jayzhan211 marked this pull request as ready for review July 6, 2024 07:49
Comment on lines -1761 to -1780
query III
WITH indices AS (
SELECT 1 AS idx UNION ALL
SELECT 2 AS idx UNION ALL
SELECT 3 AS idx UNION ALL
SELECT 4 AS idx UNION ALL
SELECT 5 AS idx
)
SELECT data.arr[indices.idx] as element, array_length(data.arr) as array_len, dummy
FROM (
SELECT array_agg(distinct c2) as arr, count(1) as dummy FROM aggregate_test_100
) data
CROSS JOIN indices
ORDER BY 1
----
1 5 100
2 5 100
3 5 100
4 5 100
5 5 100
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

why removed?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I rewrite it to the simpler one!

statement ok
drop table t;

# test with no values
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

add array_agg(distinct case on empty table

Signed-off-by: jayzhan211 <[email protected]>
@jayzhan211 jayzhan211 marked this pull request as draft July 8, 2024 23:51
@github-actions github-actions bot removed the logical-expr Logical plan and expressions label Jul 9, 2024
Signed-off-by: jayzhan211 <[email protected]>
@github-actions github-actions bot added the logical-expr Logical plan and expressions label Jul 9, 2024
@jayzhan211 jayzhan211 marked this pull request as ready for review July 9, 2024 08:56
@jayzhan211 jayzhan211 marked this pull request as draft July 9, 2024 09:54
Signed-off-by: jayzhan211 <[email protected]>
Signed-off-by: jayzhan211 <[email protected]>
Signed-off-by: jayzhan211 <[email protected]>
@jayzhan211 jayzhan211 marked this pull request as ready for review July 9, 2024 15:29
Copy link
Contributor

@alamb alamb left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks good to me @jayzhan211 . Thank you for this PR

Also, thank you @findepi for the reviews 🙏

I double checked the answers in postgres and I believe after this PR DataFusion has consistent behavior

postgres=# create table t(a int, b float, c bigint);
ERROR:  relation "t" already exists
postgres=# drop table t;
DROP TABLE
postgres=# create table t(a int, b float, c bigint);
CREATE TABLE
postgres=# insert into t values (1, 1.2, 2);
INSERT 0 1
postgres=# select array_agg(a) from t where a > 2;
 array_agg
-----------

(1 row)

postgres=# select array_agg(b) from t where b > 3.1;
 array_agg
-----------

(1 row)

postgres=# select array_agg(c), count(1) from t where c > 3;
 array_agg | count
-----------+-------
           |     0
(1 row)

postgres=# select array_agg(distinct a) from t where a > 3;
 array_agg
-----------

(1 row)

@alamb alamb changed the title Change array agg result from empty list to null if no row qualifed Change array_agg to return null on no input rather than empty list Jul 10, 2024
@jayzhan211 jayzhan211 merged commit d3f6372 into apache:main Jul 10, 2024
@jayzhan211 jayzhan211 deleted the array-agg-no-row branch July 10, 2024 23:32
@jayzhan211
Copy link
Contributor Author

Thanks @alamb @findepi

Lordworms pushed a commit to Lordworms/arrow-datafusion that referenced this pull request Jul 12, 2024
apache#11299)

* change array agg semantic for empty result

Signed-off-by: jayzhan211 <[email protected]>

* return null

Signed-off-by: jayzhan211 <[email protected]>

* fix test

Signed-off-by: jayzhan211 <[email protected]>

* fix order sensitive

Signed-off-by: jayzhan211 <[email protected]>

* fix test

Signed-off-by: jayzhan211 <[email protected]>

* add more test

Signed-off-by: jayzhan211 <[email protected]>

* fix null

Signed-off-by: jayzhan211 <[email protected]>

* fix multi-phase case

Signed-off-by: jayzhan211 <[email protected]>

* add comment

Signed-off-by: jayzhan211 <[email protected]>

* cleanup

Signed-off-by: jayzhan211 <[email protected]>

* fix clone

Signed-off-by: jayzhan211 <[email protected]>

---------

Signed-off-by: jayzhan211 <[email protected]>
findepi pushed a commit to findepi/datafusion that referenced this pull request Jul 16, 2024
apache#11299)

* change array agg semantic for empty result

Signed-off-by: jayzhan211 <[email protected]>

* return null

Signed-off-by: jayzhan211 <[email protected]>

* fix test

Signed-off-by: jayzhan211 <[email protected]>

* fix order sensitive

Signed-off-by: jayzhan211 <[email protected]>

* fix test

Signed-off-by: jayzhan211 <[email protected]>

* add more test

Signed-off-by: jayzhan211 <[email protected]>

* fix null

Signed-off-by: jayzhan211 <[email protected]>

* fix multi-phase case

Signed-off-by: jayzhan211 <[email protected]>

* add comment

Signed-off-by: jayzhan211 <[email protected]>

* cleanup

Signed-off-by: jayzhan211 <[email protected]>

* fix clone

Signed-off-by: jayzhan211 <[email protected]>

---------

Signed-off-by: jayzhan211 <[email protected]>
xinlifoobar pushed a commit to xinlifoobar/datafusion that referenced this pull request Jul 17, 2024
apache#11299)

* change array agg semantic for empty result

Signed-off-by: jayzhan211 <[email protected]>

* return null

Signed-off-by: jayzhan211 <[email protected]>

* fix test

Signed-off-by: jayzhan211 <[email protected]>

* fix order sensitive

Signed-off-by: jayzhan211 <[email protected]>

* fix test

Signed-off-by: jayzhan211 <[email protected]>

* add more test

Signed-off-by: jayzhan211 <[email protected]>

* fix null

Signed-off-by: jayzhan211 <[email protected]>

* fix multi-phase case

Signed-off-by: jayzhan211 <[email protected]>

* add comment

Signed-off-by: jayzhan211 <[email protected]>

* cleanup

Signed-off-by: jayzhan211 <[email protected]>

* fix clone

Signed-off-by: jayzhan211 <[email protected]>

---------

Signed-off-by: jayzhan211 <[email protected]>
xinlifoobar pushed a commit to xinlifoobar/datafusion that referenced this pull request Jul 18, 2024
apache#11299)

* change array agg semantic for empty result

Signed-off-by: jayzhan211 <[email protected]>

* return null

Signed-off-by: jayzhan211 <[email protected]>

* fix test

Signed-off-by: jayzhan211 <[email protected]>

* fix order sensitive

Signed-off-by: jayzhan211 <[email protected]>

* fix test

Signed-off-by: jayzhan211 <[email protected]>

* add more test

Signed-off-by: jayzhan211 <[email protected]>

* fix null

Signed-off-by: jayzhan211 <[email protected]>

* fix multi-phase case

Signed-off-by: jayzhan211 <[email protected]>

* add comment

Signed-off-by: jayzhan211 <[email protected]>

* cleanup

Signed-off-by: jayzhan211 <[email protected]>

* fix clone

Signed-off-by: jayzhan211 <[email protected]>

---------

Signed-off-by: jayzhan211 <[email protected]>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

core Core DataFusion crate logical-expr Logical plan and expressions physical-expr Changes to the physical-expr crates sqllogictest SQL Logic Tests (.slt)

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants