Skip to content

Conversation

@yaooqinn
Copy link
Member

@yaooqinn yaooqinn commented Aug 7, 2025

What changes were proposed in this pull request?

This PR improves the UTF8String repeat logic for some special cases

Why are the changes needed?

performance improvement

Does this PR introduce any user-facing change?

now

How was this patch tested?

Passing existing UT and benchmarked like

scala> spark.time((1 to 10000000).foreach(_ => UTF8String.fromStri
ng("A").repeat(1024)))
Time taken: 784 ms
scala> spark.time((1 to 10000000).foreach(_ => repeat(UTF8String.f
romString("A"), 1024)))
Time taken: 432 ms

Was this patch authored or co-authored using generative AI tooling?

no

@github-actions github-actions bot added the SQL label Aug 7, 2025
Copy link
Member

@dongjoon-hyun dongjoon-hyun left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

+1, LGTM. Nice!

Thank you, @yaooqinn .

@dongjoon-hyun
Copy link
Member

Merged to master for Apache Spark 4.1.0.

@yaooqinn yaooqinn deleted the SPARK-53171 branch August 8, 2025 02:16
@yaooqinn
Copy link
Member Author

yaooqinn commented Aug 8, 2025

Thank you @dongjoon-hyun

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants