Skip to content

Commit e415a42

Browse files
MaxGekkcloud-fan
authored andcommitted
[SPARK-31439][SQL] Fix perf regression of fromJavaDate
### What changes were proposed in this pull request? In the PR, I propose to re-use optimized implementation of days rebase function `rebaseJulianToGregorianDays()` introduced by the PR #28067 in conversion of `java.sql.Date` values to Catalyst's `DATE` values. The function `fromJavaDate` in `DateTimeUtils` was re-written by taking the implementation from Spark 2.4, and by rebasing the final results via `rebaseJulianToGregorianDays()`. Also I updated `DateTimeBenchmark`, and added a benchmark for conversion from `java.sql.Date`. ### Why are the changes needed? The PR fixes the regression of parallelizing a collection of `java.sql.Date` values, and improves performance of converting external values to Catalyst's `DATE` values: - x4 on the master branch - 30% against Spark 2.4.6-SNAPSHOT Spark 2.4.6-SNAPSHOT: ``` To/from java.sql.Timestamp: Best Time(ms) Avg Time(ms) Stdev(ms) Rate(M/s) Per Row(ns) Relative ------------------------------------------------------------------------------------------------------------------------ From java.sql.Date 614 655 43 8.1 122.8 1.0X ``` Before the changes: ``` To/from java.sql.Timestamp: Best Time(ms) Avg Time(ms) Stdev(ms) Rate(M/s) Per Row(ns) Relative ------------------------------------------------------------------------------------------------------------------------ From java.sql.Date 1154 1206 46 4.3 230.9 1.0X ``` After: ``` To/from java.sql.Timestamp: Best Time(ms) Avg Time(ms) Stdev(ms) Rate(M/s) Per Row(ns) Relative ------------------------------------------------------------------------------------------------------------------------ From java.sql.Date 427 434 7 11.7 85.3 1.0X ``` ### Does this PR introduce any user-facing change? No ### How was this patch tested? - By existing tests suites, in particular, `DateTimeUtilsSuite`, `RebaseDateTimeSuite`, `DateFunctionsSuite`, `DateExpressionsSuite`. - Re-run `DateTimeBenchmark` in the environment: | Item | Description | | ---- | ----| | Region | us-west-2 (Oregon) | | Instance | r3.xlarge | | AMI | ubuntu/images/hvm-ssd/ubuntu-bionic-18.04-amd64-server-20190722.1 (ami-06f2f779464715dc5) | | Java | OpenJDK 64-Bit Server VM 1.8.0_242 and OpenJDK 64-Bit Server VM 11.0.6+10 | Closes #28205 from MaxGekk/optimize-fromJavaDate. Authored-by: Max Gekk <[email protected]> Signed-off-by: Wenchen Fan <[email protected]> (cherry picked from commit 2c5d489) Signed-off-by: Wenchen Fan <[email protected]>
1 parent be5dfcd commit e415a42

File tree

4 files changed

+232
-231
lines changed

4 files changed

+232
-231
lines changed

sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/util/DateTimeUtils.scala

Lines changed: 4 additions & 10 deletions
Original file line numberDiff line numberDiff line change
@@ -103,16 +103,10 @@ object DateTimeUtils {
103103
* @return The number of days since epoch from java.sql.Date.
104104
*/
105105
def fromJavaDate(date: Date): SQLDate = {
106-
val era = if (date.before(julianCommonEraStart)) 0 else 1
107-
val localDate = LocalDate
108-
.of(date.getYear + 1900, date.getMonth + 1, 1)
109-
.`with`(ChronoField.ERA, era)
110-
// Add days separately to convert dates existed in Julian calendar but not
111-
// in Proleptic Gregorian calendar. For example, 1000-02-29 is valid date
112-
// in Julian calendar because 1000 is a leap year but 1000 is not a leap
113-
// year in Proleptic Gregorian calendar. And 1000-02-29 doesn't exist in it.
114-
.plusDays(date.getDate - 1) // Returns the next valid date after `date.getDate - 1` days
115-
localDateToDays(localDate)
106+
val millisUtc = date.getTime
107+
val millisLocal = millisUtc + TimeZone.getDefault.getOffset(millisUtc)
108+
val julianDays = Math.toIntExact(Math.floorDiv(millisLocal, MILLIS_PER_DAY))
109+
rebaseJulianToGregorianDays(julianDays)
116110
}
117111

118112
/**

0 commit comments

Comments
 (0)