-
Notifications
You must be signed in to change notification settings - Fork 28.9k
[SPARK-31630][SQL] Fix perf regression by skipping timestamps rebasing after some threshold #28441
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
|
@cloud-fan @HyukjinKwon Please, review this PR. |
|
Test build #122230 has finished for PR 28441 at commit
|
…common-threshold # Conflicts: # sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/util/RebaseDateTime.scala # sql/core/benchmarks/DateTimeRebaseBenchmark-jdk11-results.txt # sql/core/benchmarks/DateTimeRebaseBenchmark-results.txt
|
Test build #122259 has finished for PR 28441 at commit
|
|
Test build #122258 has finished for PR 28441 at commit
|
|
Test build #122262 has finished for PR 28441 at commit
|
|
|
||
| private def getLastSwitchTs(rebaseMap: AnyRefMap[String, RebaseInfo]): Long = { | ||
| val latestTs = rebaseMap.values.map(_.switches.last).max | ||
| require(rebaseMap.values.forall(_.diffs.last == 0), |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
ideally require should be the first line in a method.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It uses latestTs in the error message. I am going to improve the message by converting micros to Instant, so, toString should form nicer string:
require(rebaseMap.values.forall(_.diffs.last == 0),
s"Differences between Julian and Gregorian calendar after ${microsToInstant(latestTs)} " +
"are expected to be zero for available time zones.")|
LGTM, let's regenerate the benchmark result to fix conflicts. |
…common-threshold # Conflicts: # sql/core/benchmarks/DateTimeRebaseBenchmark-jdk11-results.txt # sql/core/benchmarks/DateTimeRebaseBenchmark-results.txt
|
Test build #122306 has finished for PR 28441 at commit
|
|
retest this please |
|
Test build #122313 has finished for PR 28441 at commit
|
|
Test build #122314 has finished for PR 28441 at commit
|
|
thanks, merging to master/3.0! |
…g after some threshold ### What changes were proposed in this pull request? Skip timestamps rebasing after a global threshold when there is no difference between Julian and Gregorian calendars. This allows to avoid checking hash maps of switch points, and fixes perf regressions in `toJavaTimestamp()` and `fromJavaTimestamp()`. ### Why are the changes needed? The changes fix perf regressions of conversions to/from external type `java.sql.Timestamp`. Before (see the PR's results #28440): ``` ================================================================================================ Conversion from/to external types ================================================================================================ OpenJDK 64-Bit Server VM 1.8.0_252-8u252-b09-1~18.04-b09 on Linux 4.15.0-1063-aws Intel(R) Xeon(R) CPU E5-2670 v2 2.50GHz To/from Java's date-time: Best Time(ms) Avg Time(ms) Stdev(ms) Rate(M/s) Per Row(ns) Relative ------------------------------------------------------------------------------------------------------------------------ From java.sql.Timestamp 376 388 10 13.3 75.2 1.1X Collect java.sql.Timestamp 1878 1937 64 2.7 375.6 0.2X ``` After: ``` ================================================================================================ Conversion from/to external types ================================================================================================ OpenJDK 64-Bit Server VM 1.8.0_252-8u252-b09-1~18.04-b09 on Linux 4.15.0-1063-aws Intel(R) Xeon(R) CPU E5-2670 v2 2.50GHz To/from Java's date-time: Best Time(ms) Avg Time(ms) Stdev(ms) Rate(M/s) Per Row(ns) Relative ------------------------------------------------------------------------------------------------------------------------ From java.sql.Timestamp 249 264 24 20.1 49.8 1.7X Collect java.sql.Timestamp 1503 1523 24 3.3 300.5 0.3X ``` Perf improvements in average of: 1. From java.sql.Timestamp is ~ 34% 2. To java.sql.Timestamps is ~16% ### Does this PR introduce _any_ user-facing change? No ### How was this patch tested? By existing test suites `DateTimeUtilsSuite` and `RebaseDateTimeSuite`. Closes #28441 from MaxGekk/opt-rebase-common-threshold. Authored-by: Max Gekk <[email protected]> Signed-off-by: Wenchen Fan <[email protected]> (cherry picked from commit bef5828) Signed-off-by: Wenchen Fan <[email protected]>
What changes were proposed in this pull request?
Skip timestamps rebasing after a global threshold when there is no difference between Julian and Gregorian calendars. This allows to avoid checking hash maps of switch points, and fixes perf regressions in
toJavaTimestamp()andfromJavaTimestamp().Why are the changes needed?
The changes fix perf regressions of conversions to/from external type
java.sql.Timestamp.Before (see the PR's results #28440):
After:
Perf improvements in average of:
Does this PR introduce any user-facing change?
No
How was this patch tested?
By existing test suites
DateTimeUtilsSuiteandRebaseDateTimeSuite.