-
Notifications
You must be signed in to change notification settings - Fork 28.9k
[SPARK-34003][SQL][FOLLOWUP] Avoid pushing modified Char/Varchar sort attributes into aggregate for existing ones #31129
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
…gthCheckForCharVarchar and ResolveAggregateFunctions
|
Kubernetes integration test starting |
|
Kubernetes integration test status failure |
|
Kubernetes integration test starting |
|
Kubernetes integration test status success |
|
Test build #133921 has finished for PR 31129 at commit
|
|
Test build #133926 has finished for PR 31129 at commit
|
|
Test build #133952 has finished for PR 31129 at commit
|
|
thanks, merging to master/3.1! |
… attributes into aggregate for existing ones ### What changes were proposed in this pull request? In 0f8e5dd, we partially fix the rule conflicts between `PaddingAndLengthCheckForCharVarchar` and `ResolveAggregateFunctions`, as error still exists in sql like ```SELECT substr(v, 1, 2), sum(i) FROM t GROUP BY v ORDER BY substr(v, 1, 2)``` ```sql [info] Failed to analyze query: org.apache.spark.sql.AnalysisException: expression 'spark_catalog.default.t.`v`' is neither present in the group by, nor is it an aggregate function. Add to group by or wrap in first() (or first_value) if you don't care which value you get.; [info] Project [substr(v, 1, 2)#100, sum(i)#101L] [info] +- Sort [aggOrder#102 ASC NULLS FIRST], true [info] +- !Aggregate [v#106], [substr(v#106, 1, 2) AS substr(v, 1, 2)#100, sum(cast(i#98 as bigint)) AS sum(i)#101L, substr(v#103, 1, 2) AS aggOrder#102 [info] +- SubqueryAlias spark_catalog.default.t [info] +- Project [if ((length(v#97) <= 3)) v#97 else if ((length(rtrim(v#97, None)) > 3)) cast(raise_error(concat(input string of length , cast(length(v#97) as string), exceeds varchar type length limitation: 3)) as string) else rpad(rtrim(v#97, None), 3, ) AS v#106, i#98] [info] +- Relation[v#97,i#98] parquet [info] [info] Project [substr(v, 1, 2)#100, sum(i)#101L] [info] +- Sort [aggOrder#102 ASC NULLS FIRST], true [info] +- !Aggregate [v#106], [substr(v#106, 1, 2) AS substr(v, 1, 2)#100, sum(cast(i#98 as bigint)) AS sum(i)#101L, substr(v#103, 1, 2) AS aggOrder#102 [info] +- SubqueryAlias spark_catalog.default.t [info] +- Project [if ((length(v#97) <= 3)) v#97 else if ((length(rtrim(v#97, None)) > 3)) cast(raise_error(concat(input string of length , cast(length(v#97) as string), exceeds varchar type length limitation: 3)) as string) else rpad(rtrim(v#97, None), 3, ) AS v#106, i#98] [info] +- Relation[v#97,i#98] parquet ``` We need to look recursively into children to find char/varchars. In this PR, we try to resolve the full attributes including the original `Aggregate` expressions and the candidates in `SortOrder` together, then use the new re-resolved `Aggregate` expressions to determine which candidate in the `SortOrder` shall be pushed. This can avoid mismatch for the same attributes w/o this change, as the expressions returned by `executeSameContext` will change when `PaddingAndLengthCheckForCharVarchar` takes effects. W/ this change, the expressions can be matched correctly. For those unmatched, w need to look recursively into children to find char/varchars instead of the expression itself only. ### Why are the changes needed? bugfix ### Does this PR introduce _any_ user-facing change? no ### How was this patch tested? add new tests Closes #31129 from yaooqinn/SPARK-34003-F. Authored-by: Kent Yao <[email protected]> Signed-off-by: Wenchen Fan <[email protected]> (cherry picked from commit 99f8489) Signed-off-by: Wenchen Fan <[email protected]>
What changes were proposed in this pull request?
In 0f8e5dd, we partially fix the rule conflicts between
PaddingAndLengthCheckForCharVarcharandResolveAggregateFunctions, as error still exists insql like
SELECT substr(v, 1, 2), sum(i) FROM t GROUP BY v ORDER BY substr(v, 1, 2)We need to look recursively into children to find char/varchars.
In this PR, we try to resolve the full attributes including the original
Aggregateexpressions and the candidates inSortOrdertogether, then use the new re-resolvedAggregateexpressions to determine which candidate in theSortOrdershall be pushed. This can avoid mismatch for the same attributes w/o this change, as the expressions returned byexecuteSameContextwill change whenPaddingAndLengthCheckForCharVarchartakes effects. W/ this change, the expressions can be matched correctly.For those unmatched, w need to look recursively into children to find char/varchars instead of the expression itself only.
Why are the changes needed?
bugfix
Does this PR introduce any user-facing change?
no
How was this patch tested?
add new tests