[SPARK-20588][SQL] Cache TimeZone instances. #17933

ueshin · 2017-05-10T07:04:50Z

What changes were proposed in this pull request?

Because the method TimeZone.getTimeZone(String ID) is synchronized on the TimeZone class, concurrent call of this method will become a bottleneck.
This especially happens when casting from string value containing timezone info to timestamp value, which uses DateTimeUtils.stringToTimestamp() and gets TimeZone instance on the site.

This pr makes a cache of the generated TimeZone instances to avoid the synchronization.

How was this patch tested?

Existing tests.

srowen

One small suggestion, but looking good. I assume that's all the calls to TimeZone.getTimeZone.

srowen · 2017-05-10T09:12:33Z

sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/expressions/datetimeExpressions.scala

        val tzClass = classOf[TimeZone].getName
-        ctx.addMutableState(tzClass, tzTerm, s"""$tzTerm = $tzClass.getTimeZone("$tz");""")
-        ctx.addMutableState(tzClass, utcTerm, s"""$utcTerm = $tzClass.getTimeZone("UTC");""")
+        val dtu = DateTimeUtils.getClass.getName.stripSuffix("$")


Is it more efficient to save this value in a static (object) member somewhere or will it not matter much? I see it's used several times.

I don't think it will matter much for now because this will be processed only once per generating code.
What do you think?

srowen · 2017-05-10T09:14:45Z

sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/util/DateTimeUtils.scala

+  }
+
+  def getTimeZone(timeZoneId: String): TimeZone = {
+    val timeZones = threadLocalTimeZones.get()


How about just threadLocalTimeZones.get().getOrElseUpdate(timeZoneId, TimeZone.getTimeZone(timeZoneId))? It avoids the double lookup.

I'll update it. Thanks!

SparkQA · 2017-05-10T09:31:26Z

Test build #76732 has finished for PR 17933 at commit de79e50.

This patch passes all tests.
This patch merges cleanly.
This patch adds no public classes.

SparkQA · 2017-05-10T12:34:46Z

Test build #76747 has finished for PR 17933 at commit 97d5bba.

This patch passes all tests.
This patch merges cleanly.
This patch adds no public classes.

gatorsmile

LGTM

viirya · 2017-05-14T10:38:27Z

sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/util/DateTimeUtils.scala

    sdf
  }

+  private val threadLocalTimeZones = new ThreadLocal[mutable.Map[String, TimeZone]] {


As we won't go to update this map once the values are put. We only need synchronization when putting the values. The content of the map can be shared between threads when reading. I am wondering if we need a local map for each thread.

That's a good point.
How about using ConcurrentHashMap instead?

Sounds good to me.

viirya · 2017-05-15T01:52:09Z

sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/util/DateTimeUtils.scala

+  }
+
+  def getTimeZone(timeZoneId: String): TimeZone = {
+    computedTimeZones.computeIfAbsent(timeZoneId, computeTimeZone)


Is Java 7 support completely removed now? Seems computeIfAbsent is only supported in Java 8.

I believe Java 7 support was removed as of Spark 2.2.0.

viirya · 2017-05-15T01:54:39Z

sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/util/DateTimeUtils.scala

 import javax.xml.bind.DatatypeConverter

 import scala.annotation.tailrec
+import scala.collection.mutable


We can remove this now.

Thanks, I'll remove it.

viirya · 2017-05-15T02:03:20Z

LGTM

SparkQA · 2017-05-15T03:18:16Z

Test build #76921 has finished for PR 17933 at commit 7935a1a.

This patch passes all tests.
This patch merges cleanly.
This patch adds no public classes.

SparkQA · 2017-05-15T04:23:51Z

Test build #76923 has finished for PR 17933 at commit 3cdbb3a.

This patch passes all tests.
This patch merges cleanly.
This patch adds no public classes.

gatorsmile · 2017-05-15T23:52:51Z

Thanks! Merging to master/2.2

## What changes were proposed in this pull request? Because the method `TimeZone.getTimeZone(String ID)` is synchronized on the TimeZone class, concurrent call of this method will become a bottleneck. This especially happens when casting from string value containing timezone info to timestamp value, which uses `DateTimeUtils.stringToTimestamp()` and gets TimeZone instance on the site. This pr makes a cache of the generated TimeZone instances to avoid the synchronization. ## How was this patch tested? Existing tests. Author: Takuya UESHIN <[email protected]> Closes #17933 from ueshin/issues/SPARK-20588. (cherry picked from commit c8c878a) Signed-off-by: Xiao Li <[email protected]>

## What changes were proposed in this pull request? Because the method `TimeZone.getTimeZone(String ID)` is synchronized on the TimeZone class, concurrent call of this method will become a bottleneck. This especially happens when casting from string value containing timezone info to timestamp value, which uses `DateTimeUtils.stringToTimestamp()` and gets TimeZone instance on the site. This pr makes a cache of the generated TimeZone instances to avoid the synchronization. ## How was this patch tested? Existing tests. Author: Takuya UESHIN <[email protected]> Closes apache#17933 from ueshin/issues/SPARK-20588.

Because the method `TimeZone.getTimeZone(String ID)` is synchronized on the TimeZone class, concurrent call of this method will become a bottleneck. This especially happens when casting from string value containing timezone info to timestamp value, which uses `DateTimeUtils.stringToTimestamp()` and gets TimeZone instance on the site. This pr makes a cache of the generated TimeZone instances to avoid the synchronization. Existing tests. Author: Takuya UESHIN <[email protected]> Closes apache#17933 from ueshin/issues/SPARK-20588. (cherry picked from commit c8c878a)

Cache TimeZone instances per thread.

de79e50

srowen requested changes May 10, 2017

View reviewed changes

Use getOrElseUpdate().

97d5bba

srowen approved these changes May 13, 2017

View reviewed changes

gatorsmile approved these changes May 14, 2017

View reviewed changes

viirya reviewed May 14, 2017

View reviewed changes

Use ConcurrentHashMap instead of thread-local Map.

7935a1a

ueshin changed the title ~~[SPARK-20588][SQL] Cache TimeZone instances per thread.~~ [SPARK-20588][SQL] Cache TimeZone instances. May 15, 2017

viirya reviewed May 15, 2017

View reviewed changes

Remove unnecessary import.

3cdbb3a

asfgit closed this in c8c878a May 15, 2017

srowen mentioned this pull request Feb 17, 2019

[SPARK-26903][SQL] Remove the TimeZone cache #23812

Closed

[SPARK-20588][SQL] Cache TimeZone instances. #17933

[SPARK-20588][SQL] Cache TimeZone instances. #17933

Uh oh!

Conversation

ueshin commented May 10, 2017 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

What changes were proposed in this pull request?

How was this patch tested?

Uh oh!

srowen left a comment

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

SparkQA commented May 10, 2017

Uh oh!

SparkQA commented May 10, 2017

Uh oh!

gatorsmile left a comment

Choose a reason for hiding this comment

Uh oh!

viirya May 14, 2017 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

viirya May 15, 2017 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

viirya commented May 15, 2017

Uh oh!

SparkQA commented May 15, 2017

Uh oh!

SparkQA commented May 15, 2017

Uh oh!

gatorsmile commented May 15, 2017

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

5 participants

ueshin commented May 10, 2017 •

edited

Loading

viirya May 14, 2017 •

edited

Loading

viirya May 15, 2017 •

edited

Loading