-
Notifications
You must be signed in to change notification settings - Fork 28.9k
[SPARK-26903][SQL] Remove the TimeZone cache #23812
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
|
Test build #102422 has finished for PR 23812 at commit
|
|
Test build #102423 has finished for PR 23812 at commit
|
|
Test build #102427 has finished for PR 23812 at commit
|
srowen
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Side note: SimpleDateParam calls TimeZone.getTimeZone("GMT"). If you like you could make that a constant here or call to DateTimeUtils to fully remove those calls.
sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/util/DateTimeUtils.scala
Show resolved
Hide resolved
...catalyst/src/test/scala/org/apache/spark/sql/catalyst/expressions/DateExpressionsSuite.scala
Outdated
Show resolved
Hide resolved
...catalyst/src/test/scala/org/apache/spark/sql/catalyst/expressions/DateExpressionsSuite.scala
Outdated
Show resolved
Hide resolved
|
@MaxGekk I'll merge this with a rebase, and if you check out my few comments above |
# Conflicts: # sql/core/benchmarks/DateTimeBenchmark-results.txt
|
Test build #102655 has finished for PR 23812 at commit
|
| ToUTCTimestamp( | ||
| Literal(Timestamp.valueOf("2015-07-24 00:00:00")), Literal("\"quote")) :: Nil) | ||
| }.getMessage | ||
| assert(msg == "Invalid ID for region-based ZoneId, invalid format: \"quote") |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Small last one -- make this consistent with the test below and remove comment about escaping. In fact, maybe the bad zone ID should be obviously wrong, like "NoSuchZone"
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I added a couple more test cases
|
Test build #102698 has finished for PR 23812 at commit
|
|
Merged to master |
|
|
||
| def getTimeZone(timeZoneId: String): TimeZone = { | ||
| computedTimeZones.computeIfAbsent(timeZoneId, computeTimeZone) | ||
| val zoneId = ZoneId.of(timeZoneId, ZoneId.SHORT_IDS) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Hi @MaxGekk after upgrading Spark 2.3 to Spark3.0, we found this behaviour change are rejecting some valid timeZoneIds, for example
// GMT+8:00 is a valid timezone if parsed from TimeZone.getTimeZone("GMT+8:00")
// However, ZoneId.of("GMT+8:00", ZoneId.SHORT_IDS) are rejected with an exception
from_unix_time("2020-01-01 10:00:00", "GMT+8:00")
what do you think about support these kind of timezones, such as GMT+8:00?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
What changes were proposed in this pull request?
In the PR, I propose to convert time zone string to
TimeZoneby converting it toZoneIdwhich usesZoneOffsetinternally. TheZoneOffsetclass of JDK 8 has a cache already: http://hg.openjdk.java.net/jdk8/jdk8/jdk/file/687fd7c7986d/src/share/classes/java/time/ZoneOffset.java#l205 . In this way, there is no need to support cache of time zones in Spark.The PR removes
computedTimeZonesfromDateTimeUtils, and usesZoneId.ofto convert time zone id string toZoneIdand toTimeZoneat the end.How was this patch tested?
The changes were tested by