-
Notifications
You must be signed in to change notification settings - Fork 28.9k
[SPARK-20588][SQL] Cache TimeZone instances. #17933
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
srowen
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
One small suggestion, but looking good. I assume that's all the calls to TimeZone.getTimeZone.
| val tzClass = classOf[TimeZone].getName | ||
| ctx.addMutableState(tzClass, tzTerm, s"""$tzTerm = $tzClass.getTimeZone("$tz");""") | ||
| ctx.addMutableState(tzClass, utcTerm, s"""$utcTerm = $tzClass.getTimeZone("UTC");""") | ||
| val dtu = DateTimeUtils.getClass.getName.stripSuffix("$") |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Is it more efficient to save this value in a static (object) member somewhere or will it not matter much? I see it's used several times.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I don't think it will matter much for now because this will be processed only once per generating code.
What do you think?
| } | ||
|
|
||
| def getTimeZone(timeZoneId: String): TimeZone = { | ||
| val timeZones = threadLocalTimeZones.get() |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
How about just threadLocalTimeZones.get().getOrElseUpdate(timeZoneId, TimeZone.getTimeZone(timeZoneId))? It avoids the double lookup.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'll update it. Thanks!
|
Test build #76732 has finished for PR 17933 at commit
|
|
Test build #76747 has finished for PR 17933 at commit
|
gatorsmile
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM
| sdf | ||
| } | ||
|
|
||
| private val threadLocalTimeZones = new ThreadLocal[mutable.Map[String, TimeZone]] { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
As we won't go to update this map once the values are put. We only need synchronization when putting the values. The content of the map can be shared between threads when reading. I am wondering if we need a local map for each thread.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
That's a good point.
How about using ConcurrentHashMap instead?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Sounds good to me.
| } | ||
|
|
||
| def getTimeZone(timeZoneId: String): TimeZone = { | ||
| computedTimeZones.computeIfAbsent(timeZoneId, computeTimeZone) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Is Java 7 support completely removed now? Seems computeIfAbsent is only supported in Java 8.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I believe Java 7 support was removed as of Spark 2.2.0.
| import javax.xml.bind.DatatypeConverter | ||
|
|
||
| import scala.annotation.tailrec | ||
| import scala.collection.mutable |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We can remove this now.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks, I'll remove it.
|
LGTM |
|
Test build #76921 has finished for PR 17933 at commit
|
|
Test build #76923 has finished for PR 17933 at commit
|
|
Thanks! Merging to master/2.2 |
## What changes were proposed in this pull request? Because the method `TimeZone.getTimeZone(String ID)` is synchronized on the TimeZone class, concurrent call of this method will become a bottleneck. This especially happens when casting from string value containing timezone info to timestamp value, which uses `DateTimeUtils.stringToTimestamp()` and gets TimeZone instance on the site. This pr makes a cache of the generated TimeZone instances to avoid the synchronization. ## How was this patch tested? Existing tests. Author: Takuya UESHIN <[email protected]> Closes #17933 from ueshin/issues/SPARK-20588. (cherry picked from commit c8c878a) Signed-off-by: Xiao Li <[email protected]>
## What changes were proposed in this pull request? Because the method `TimeZone.getTimeZone(String ID)` is synchronized on the TimeZone class, concurrent call of this method will become a bottleneck. This especially happens when casting from string value containing timezone info to timestamp value, which uses `DateTimeUtils.stringToTimestamp()` and gets TimeZone instance on the site. This pr makes a cache of the generated TimeZone instances to avoid the synchronization. ## How was this patch tested? Existing tests. Author: Takuya UESHIN <[email protected]> Closes apache#17933 from ueshin/issues/SPARK-20588.
## What changes were proposed in this pull request? Because the method `TimeZone.getTimeZone(String ID)` is synchronized on the TimeZone class, concurrent call of this method will become a bottleneck. This especially happens when casting from string value containing timezone info to timestamp value, which uses `DateTimeUtils.stringToTimestamp()` and gets TimeZone instance on the site. This pr makes a cache of the generated TimeZone instances to avoid the synchronization. ## How was this patch tested? Existing tests. Author: Takuya UESHIN <[email protected]> Closes apache#17933 from ueshin/issues/SPARK-20588.
Because the method `TimeZone.getTimeZone(String ID)` is synchronized on the TimeZone class, concurrent call of this method will become a bottleneck. This especially happens when casting from string value containing timezone info to timestamp value, which uses `DateTimeUtils.stringToTimestamp()` and gets TimeZone instance on the site. This pr makes a cache of the generated TimeZone instances to avoid the synchronization. Existing tests. Author: Takuya UESHIN <[email protected]> Closes apache#17933 from ueshin/issues/SPARK-20588. (cherry picked from commit c8c878a)
What changes were proposed in this pull request?
Because the method
TimeZone.getTimeZone(String ID)is synchronized on the TimeZone class, concurrent call of this method will become a bottleneck.This especially happens when casting from string value containing timezone info to timestamp value, which uses
DateTimeUtils.stringToTimestamp()and gets TimeZone instance on the site.This pr makes a cache of the generated TimeZone instances to avoid the synchronization.
How was this patch tested?
Existing tests.