[SPARK-49352][SQL][3.5] Avoid redundant array transform for identical expression #47863

viirya · 2024-08-24T00:38:19Z

What changes were proposed in this pull request?

This patch avoids ArrayTransform in resolveArrayType function if the resolution expression is the same as input param.

Why are the changes needed?

Our customer encounters significant performance regression when migrating from Spark 3.2 to Spark 3.4 on a Insert Into query which is analyzed as a AppendData on an Iceberg table.
We found that the root cause is in Spark 3.4, TableOutputResolver resolves the query with additional ArrayTransform on an ArrayType field. The ArrayTransform's lambda function is actually an identical function, i.e., the transformation is redundant.

Does this PR introduce any user-facing change?

No

How was this patch tested?

Unit test and manual e2e test

Was this patch authored or co-authored using generative AI tooling?

No

… expression

viirya · 2024-08-24T00:38:35Z

cc @dongjoon-hyun

dongjoon-hyun

+1, LGTM (Pending CIs). Thank you, @viirya .

… expression ### What changes were proposed in this pull request? This patch avoids `ArrayTransform` in `resolveArrayType` function if the resolution expression is the same as input param. ### Why are the changes needed? Our customer encounters significant performance regression when migrating from Spark 3.2 to Spark 3.4 on a `Insert Into` query which is analyzed as a `AppendData` on an Iceberg table. We found that the root cause is in Spark 3.4, `TableOutputResolver` resolves the query with additional `ArrayTransform` on an `ArrayType` field. The `ArrayTransform`'s lambda function is actually an identical function, i.e., the transformation is redundant. ### Does this PR introduce _any_ user-facing change? No ### How was this patch tested? Unit test and manual e2e test ### Was this patch authored or co-authored using generative AI tooling? No Closes #47863 from viirya/fix_redundant_array_transform_3.5. Authored-by: Liang-Chi Hsieh <[email protected]> Signed-off-by: Dongjoon Hyun <[email protected]>

dongjoon-hyun · 2024-08-24T05:26:59Z

Merged to branch-3.5.

viirya · 2024-08-24T05:27:16Z

Thanks @dongjoon-hyun

… expression (apache#553) ### What changes were proposed in this pull request? This patch avoids `ArrayTransform` in `resolveArrayType` function if the resolution expression is the same as input param. ### Why are the changes needed? Our customer encounters significant performance regression when migrating from Spark 3.2 to Spark 3.4 on a `Insert Into` query which is analyzed as a `AppendData` on an Iceberg table. We found that the root cause is in Spark 3.4, `TableOutputResolver` resolves the query with additional `ArrayTransform` on an `ArrayType` field. The `ArrayTransform`'s lambda function is actually an identical function, i.e., the transformation is redundant. ### Does this PR introduce _any_ user-facing change? No ### How was this patch tested? Unit test and manual e2e test ### Was this patch authored or co-authored using generative AI tooling? No Closes apache#47863 from viirya/fix_redundant_array_transform_3.5. Authored-by: Liang-Chi Hsieh <[email protected]> Signed-off-by: Dongjoon Hyun <[email protected]> Co-authored-by: Liang-Chi Hsieh <[email protected]>

[SPARK-49352][SQL][3.5] Avoid redundant array transform for identical…

653b24b

… expression

github-actions bot added the SQL label Aug 24, 2024

viirya mentioned this pull request Aug 24, 2024

[SPARK-49352][SQL] Avoid redundant array transform for identical expression #47843

Closed

dongjoon-hyun approved these changes Aug 24, 2024

View reviewed changes

dongjoon-hyun closed this Aug 24, 2024

viirya deleted the fix_redundant_array_transform_3.5 branch August 24, 2024 05:27

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

[SPARK-49352][SQL][3.5] Avoid redundant array transform for identical expression #47863

[SPARK-49352][SQL][3.5] Avoid redundant array transform for identical expression #47863

Uh oh!

viirya commented Aug 24, 2024

Uh oh!

viirya commented Aug 24, 2024

Uh oh!

dongjoon-hyun left a comment

Uh oh!

dongjoon-hyun commented Aug 24, 2024

Uh oh!

viirya commented Aug 24, 2024

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

[SPARK-49352][SQL][3.5] Avoid redundant array transform for identical expression #47863

[SPARK-49352][SQL][3.5] Avoid redundant array transform for identical expression #47863

Uh oh!

Conversation

viirya commented Aug 24, 2024

What changes were proposed in this pull request?

Why are the changes needed?

Does this PR introduce any user-facing change?

How was this patch tested?

Was this patch authored or co-authored using generative AI tooling?

Uh oh!

viirya commented Aug 24, 2024

Uh oh!

dongjoon-hyun left a comment

Choose a reason for hiding this comment

Uh oh!

dongjoon-hyun commented Aug 24, 2024

Uh oh!

viirya commented Aug 24, 2024

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants