Skip to content

Conversation

@MaxGekk
Copy link
Member

@MaxGekk MaxGekk commented May 3, 2020

What changes were proposed in this pull request?

Skip timestamps rebasing after a global threshold when there is no difference between Julian and Gregorian calendars. This allows to avoid checking hash maps of switch points, and fixes perf regressions in toJavaTimestamp() and fromJavaTimestamp().

Why are the changes needed?

The changes fix perf regressions of conversions to/from external type java.sql.Timestamp.

Before (see the PR's results #28440):

================================================================================================
Conversion from/to external types
================================================================================================

OpenJDK 64-Bit Server VM 1.8.0_252-8u252-b09-1~18.04-b09 on Linux 4.15.0-1063-aws
Intel(R) Xeon(R) CPU E5-2670 v2 @ 2.50GHz
To/from Java's date-time:                 Best Time(ms)   Avg Time(ms)   Stdev(ms)    Rate(M/s)   Per Row(ns)   Relative
------------------------------------------------------------------------------------------------------------------------
From java.sql.Timestamp                             376            388          10         13.3          75.2       1.1X
Collect java.sql.Timestamp                         1878           1937          64          2.7         375.6       0.2X

After:

================================================================================================
Conversion from/to external types
================================================================================================

OpenJDK 64-Bit Server VM 1.8.0_252-8u252-b09-1~18.04-b09 on Linux 4.15.0-1063-aws
Intel(R) Xeon(R) CPU E5-2670 v2 @ 2.50GHz
To/from Java's date-time:                 Best Time(ms)   Avg Time(ms)   Stdev(ms)    Rate(M/s)   Per Row(ns)   Relative
------------------------------------------------------------------------------------------------------------------------
From java.sql.Timestamp                             249            264          24         20.1          49.8       1.7X
Collect java.sql.Timestamp                         1503           1523          24          3.3         300.5       0.3X

Perf improvements in average of:

  1. From java.sql.Timestamp is ~ 34%
  2. To java.sql.Timestamps is ~16%

Does this PR introduce any user-facing change?

No

How was this patch tested?

By existing test suites DateTimeUtilsSuite and RebaseDateTimeSuite.

@MaxGekk
Copy link
Member Author

MaxGekk commented May 3, 2020

@cloud-fan @HyukjinKwon Please, review this PR.

@SparkQA
Copy link

SparkQA commented May 3, 2020

Test build #122230 has finished for PR 28441 at commit f4187d2.

  • This patch passes all tests.
  • This patch merges cleanly.
  • This patch adds no public classes.

MaxGekk added 7 commits May 4, 2020 06:05
…common-threshold

# Conflicts:
#	sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/util/RebaseDateTime.scala
#	sql/core/benchmarks/DateTimeRebaseBenchmark-jdk11-results.txt
#	sql/core/benchmarks/DateTimeRebaseBenchmark-results.txt
@SparkQA
Copy link

SparkQA commented May 4, 2020

Test build #122259 has finished for PR 28441 at commit ded04e2.

  • This patch passes all tests.
  • This patch merges cleanly.
  • This patch adds no public classes.

@SparkQA
Copy link

SparkQA commented May 4, 2020

Test build #122258 has finished for PR 28441 at commit b9dfb16.

  • This patch passes all tests.
  • This patch does not merge cleanly.
  • This patch adds no public classes.

@SparkQA
Copy link

SparkQA commented May 4, 2020

Test build #122262 has finished for PR 28441 at commit d31ec0f.

  • This patch passes all tests.
  • This patch merges cleanly.
  • This patch adds no public classes.


private def getLastSwitchTs(rebaseMap: AnyRefMap[String, RebaseInfo]): Long = {
val latestTs = rebaseMap.values.map(_.switches.last).max
require(rebaseMap.values.forall(_.diffs.last == 0),
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

ideally require should be the first line in a method.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It uses latestTs in the error message. I am going to improve the message by converting micros to Instant, so, toString should form nicer string:

    require(rebaseMap.values.forall(_.diffs.last == 0),
      s"Differences between Julian and Gregorian calendar after ${microsToInstant(latestTs)} " +
      "are expected to be zero for available time zones.")

@cloud-fan
Copy link
Contributor

LGTM, let's regenerate the benchmark result to fix conflicts.

MaxGekk added 2 commits May 5, 2020 09:16
…common-threshold

# Conflicts:
#	sql/core/benchmarks/DateTimeRebaseBenchmark-jdk11-results.txt
#	sql/core/benchmarks/DateTimeRebaseBenchmark-results.txt
@SparkQA
Copy link

SparkQA commented May 5, 2020

Test build #122306 has finished for PR 28441 at commit eaca6a8.

  • This patch fails due to an unknown error code, -9.
  • This patch merges cleanly.
  • This patch adds no public classes.

@cloud-fan
Copy link
Contributor

retest this please

@SparkQA
Copy link

SparkQA commented May 5, 2020

Test build #122313 has finished for PR 28441 at commit eaca6a8.

  • This patch passes all tests.
  • This patch merges cleanly.
  • This patch adds no public classes.

@SparkQA
Copy link

SparkQA commented May 5, 2020

Test build #122314 has finished for PR 28441 at commit 2badf88.

  • This patch passes all tests.
  • This patch merges cleanly.
  • This patch adds no public classes.

@cloud-fan
Copy link
Contributor

thanks, merging to master/3.0!

@cloud-fan cloud-fan closed this in bef5828 May 5, 2020
cloud-fan pushed a commit that referenced this pull request May 5, 2020
…g after some threshold

### What changes were proposed in this pull request?
Skip timestamps rebasing after a global threshold when there is no difference between Julian and Gregorian calendars. This allows to avoid checking hash maps of switch points, and fixes perf regressions in `toJavaTimestamp()` and `fromJavaTimestamp()`.

### Why are the changes needed?
The changes fix perf regressions of conversions to/from external type `java.sql.Timestamp`.

Before (see the PR's results #28440):
```
================================================================================================
Conversion from/to external types
================================================================================================

OpenJDK 64-Bit Server VM 1.8.0_252-8u252-b09-1~18.04-b09 on Linux 4.15.0-1063-aws
Intel(R) Xeon(R) CPU E5-2670 v2  2.50GHz
To/from Java's date-time:                 Best Time(ms)   Avg Time(ms)   Stdev(ms)    Rate(M/s)   Per Row(ns)   Relative
------------------------------------------------------------------------------------------------------------------------
From java.sql.Timestamp                             376            388          10         13.3          75.2       1.1X
Collect java.sql.Timestamp                         1878           1937          64          2.7         375.6       0.2X
```

After:
```
================================================================================================
Conversion from/to external types
================================================================================================

OpenJDK 64-Bit Server VM 1.8.0_252-8u252-b09-1~18.04-b09 on Linux 4.15.0-1063-aws
Intel(R) Xeon(R) CPU E5-2670 v2  2.50GHz
To/from Java's date-time:                 Best Time(ms)   Avg Time(ms)   Stdev(ms)    Rate(M/s)   Per Row(ns)   Relative
------------------------------------------------------------------------------------------------------------------------
From java.sql.Timestamp                             249            264          24         20.1          49.8       1.7X
Collect java.sql.Timestamp                         1503           1523          24          3.3         300.5       0.3X
```

Perf improvements in average of:

1. From java.sql.Timestamp is ~ 34%
2. To java.sql.Timestamps is ~16%

### Does this PR introduce _any_ user-facing change?
No

### How was this patch tested?
By existing test suites `DateTimeUtilsSuite` and `RebaseDateTimeSuite`.

Closes #28441 from MaxGekk/opt-rebase-common-threshold.

Authored-by: Max Gekk <[email protected]>
Signed-off-by: Wenchen Fan <[email protected]>
(cherry picked from commit bef5828)
Signed-off-by: Wenchen Fan <[email protected]>
@MaxGekk MaxGekk deleted the opt-rebase-common-threshold branch June 5, 2020 19:48
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants