Skip to content

Conversation

@ueshin
Copy link
Member

@ueshin ueshin commented May 10, 2017

What changes were proposed in this pull request?

Because the method TimeZone.getTimeZone(String ID) is synchronized on the TimeZone class, concurrent call of this method will become a bottleneck.
This especially happens when casting from string value containing timezone info to timestamp value, which uses DateTimeUtils.stringToTimestamp() and gets TimeZone instance on the site.

This pr makes a cache of the generated TimeZone instances to avoid the synchronization.

How was this patch tested?

Existing tests.

Copy link
Member

@srowen srowen left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

One small suggestion, but looking good. I assume that's all the calls to TimeZone.getTimeZone.

val tzClass = classOf[TimeZone].getName
ctx.addMutableState(tzClass, tzTerm, s"""$tzTerm = $tzClass.getTimeZone("$tz");""")
ctx.addMutableState(tzClass, utcTerm, s"""$utcTerm = $tzClass.getTimeZone("UTC");""")
val dtu = DateTimeUtils.getClass.getName.stripSuffix("$")
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is it more efficient to save this value in a static (object) member somewhere or will it not matter much? I see it's used several times.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't think it will matter much for now because this will be processed only once per generating code.
What do you think?

}

def getTimeZone(timeZoneId: String): TimeZone = {
val timeZones = threadLocalTimeZones.get()
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

How about just threadLocalTimeZones.get().getOrElseUpdate(timeZoneId, TimeZone.getTimeZone(timeZoneId))? It avoids the double lookup.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'll update it. Thanks!

@SparkQA
Copy link

SparkQA commented May 10, 2017

Test build #76732 has finished for PR 17933 at commit de79e50.

  • This patch passes all tests.
  • This patch merges cleanly.
  • This patch adds no public classes.

@SparkQA
Copy link

SparkQA commented May 10, 2017

Test build #76747 has finished for PR 17933 at commit 97d5bba.

  • This patch passes all tests.
  • This patch merges cleanly.
  • This patch adds no public classes.

Copy link
Member

@gatorsmile gatorsmile left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM

sdf
}

private val threadLocalTimeZones = new ThreadLocal[mutable.Map[String, TimeZone]] {
Copy link
Member

@viirya viirya May 14, 2017

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

As we won't go to update this map once the values are put. We only need synchronization when putting the values. The content of the map can be shared between threads when reading. I am wondering if we need a local map for each thread.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

That's a good point.
How about using ConcurrentHashMap instead?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Sounds good to me.

@ueshin ueshin changed the title [SPARK-20588][SQL] Cache TimeZone instances per thread. [SPARK-20588][SQL] Cache TimeZone instances. May 15, 2017
}

def getTimeZone(timeZoneId: String): TimeZone = {
computedTimeZones.computeIfAbsent(timeZoneId, computeTimeZone)
Copy link
Member

@viirya viirya May 15, 2017

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is Java 7 support completely removed now? Seems computeIfAbsent is only supported in Java 8.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I believe Java 7 support was removed as of Spark 2.2.0.

import javax.xml.bind.DatatypeConverter

import scala.annotation.tailrec
import scala.collection.mutable
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We can remove this now.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks, I'll remove it.

@viirya
Copy link
Member

viirya commented May 15, 2017

LGTM

@SparkQA
Copy link

SparkQA commented May 15, 2017

Test build #76921 has finished for PR 17933 at commit 7935a1a.

  • This patch passes all tests.
  • This patch merges cleanly.
  • This patch adds no public classes.

@SparkQA
Copy link

SparkQA commented May 15, 2017

Test build #76923 has finished for PR 17933 at commit 3cdbb3a.

  • This patch passes all tests.
  • This patch merges cleanly.
  • This patch adds no public classes.

@gatorsmile
Copy link
Member

Thanks! Merging to master/2.2

asfgit pushed a commit that referenced this pull request May 15, 2017
## What changes were proposed in this pull request?

Because the method `TimeZone.getTimeZone(String ID)` is synchronized on the TimeZone class, concurrent call of this method will become a bottleneck.
This especially happens when casting from string value containing timezone info to timestamp value, which uses `DateTimeUtils.stringToTimestamp()` and gets TimeZone instance on the site.

This pr makes a cache of the generated TimeZone instances to avoid the synchronization.

## How was this patch tested?

Existing tests.

Author: Takuya UESHIN <[email protected]>

Closes #17933 from ueshin/issues/SPARK-20588.

(cherry picked from commit c8c878a)
Signed-off-by: Xiao Li <[email protected]>
@asfgit asfgit closed this in c8c878a May 15, 2017
robert3005 pushed a commit to palantir/spark that referenced this pull request May 19, 2017
## What changes were proposed in this pull request?

Because the method `TimeZone.getTimeZone(String ID)` is synchronized on the TimeZone class, concurrent call of this method will become a bottleneck.
This especially happens when casting from string value containing timezone info to timestamp value, which uses `DateTimeUtils.stringToTimestamp()` and gets TimeZone instance on the site.

This pr makes a cache of the generated TimeZone instances to avoid the synchronization.

## How was this patch tested?

Existing tests.

Author: Takuya UESHIN <[email protected]>

Closes apache#17933 from ueshin/issues/SPARK-20588.
liyichao pushed a commit to liyichao/spark that referenced this pull request May 24, 2017
## What changes were proposed in this pull request?

Because the method `TimeZone.getTimeZone(String ID)` is synchronized on the TimeZone class, concurrent call of this method will become a bottleneck.
This especially happens when casting from string value containing timezone info to timestamp value, which uses `DateTimeUtils.stringToTimestamp()` and gets TimeZone instance on the site.

This pr makes a cache of the generated TimeZone instances to avoid the synchronization.

## How was this patch tested?

Existing tests.

Author: Takuya UESHIN <[email protected]>

Closes apache#17933 from ueshin/issues/SPARK-20588.
jzhuge pushed a commit to jzhuge/spark that referenced this pull request Aug 20, 2018
Because the method `TimeZone.getTimeZone(String ID)` is synchronized on the TimeZone class, concurrent call of this method will become a bottleneck.
This especially happens when casting from string value containing timezone info to timestamp value, which uses `DateTimeUtils.stringToTimestamp()` and gets TimeZone instance on the site.

This pr makes a cache of the generated TimeZone instances to avoid the synchronization.

Existing tests.

Author: Takuya UESHIN <[email protected]>

Closes apache#17933 from ueshin/issues/SPARK-20588.

(cherry picked from commit c8c878a)
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

5 participants