-
Notifications
You must be signed in to change notification settings - Fork 28.9k
[SPARK-29074][SQL] Optimize date_format for foldable fmt
#25782
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
|
@dongjoon-hyun @maropu @srowen Please, take a look at the PR. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I don't quite know the logic of foldable enough to comment, but seems plausible.
| override protected def nullSafeEval(timestamp: Any, format: Any): Any = { | ||
| val df = TimestampFormatter(format.toString, zoneId) | ||
| UTF8String.fromString(df.format(timestamp.asInstanceOf[Long])) | ||
| val tf = if (formatter.isEmpty) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
How about .getOrElse?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
.getOrElse has some overhead of calling the lambda function. I explicitly avoided its usage in the interpreted mode. For consistency, I could do the same in the codegen function but I don't think it does matter.
|
Test build #110566 has finished for PR 25782 at commit
|
sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/expressions/datetimeExpressions.scala
Outdated
Show resolved
Hide resolved
|
Test build #110567 has finished for PR 25782 at commit
|
sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/expressions/datetimeExpressions.scala
Outdated
Show resolved
Hide resolved
sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/expressions/datetimeExpressions.scala
Outdated
Show resolved
Hide resolved
sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/expressions/datetimeExpressions.scala
Show resolved
Hide resolved
|
Thank you for pinging me, @MaxGekk . In general, this PR looks fine. I left a few minor comments. I'll take a look later again. |
|
Test build #110586 has finished for PR 25782 at commit
|
|
retest this please |
|
Test build #110590 has finished for PR 25782 at commit
|
sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/expressions/datetimeExpressions.scala
Outdated
Show resolved
Hide resolved
sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/expressions/datetimeExpressions.scala
Show resolved
Hide resolved
|
LGTM if tests pass. |
|
jenkins, retest this, please |
|
Seems like Jenkins is down. |
|
Test build #4870 has finished for PR 25782 at commit
|
|
retest this please |
|
Test build #110639 has finished for PR 25782 at commit
|
|
Test build #110678 has finished for PR 25782 at commit
|
|
Merged to master. |
What changes were proposed in this pull request?
In the PR, I propose to create an instance of
TimestampFormatteronly once at the initialization, and reuse it inside ofnullSafeEval()anddoGenCode()in the case when thefmtparameter is foldable.Why are the changes needed?
The changes improve performance of the
date_format()function.Before:
After:
Does this PR introduce any user-facing change?
No.
How was this patch tested?
By existing test suites
DateExpressionsSuiteandDateFunctionsSuite.