Update SQLConf.scala #1272

baishuo · 2014-07-01T03:29:57Z

use concurrent.ConcurrentHashMap instead of util.Collections.synchronizedMap

AmplabJenkins · 2014-07-01T03:30:39Z

Can one of the admins verify this patch?

aarondav · 2014-07-01T03:49:59Z

Jenkins, test this please.

AmplabJenkins · 2014-07-01T03:50:39Z

Merged build triggered.

AmplabJenkins · 2014-07-01T03:50:49Z

Merged build started.

aarondav · 2014-07-01T03:51:49Z

Out of curiosity, what motivates this change? I thought typically ConcurrentHashMap is used for more heavyweight concurrent data structures, while synchronizedMap() is used when concurrency is rare.

Either way, our usage is not threadsafe, since we do things like

if (map.contains(x)) map.get(x) else defaultValue

AmplabJenkins · 2014-07-01T05:51:51Z

Merged build finished.

AmplabJenkins · 2014-07-01T05:51:51Z

Refer to this link for build results: https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/16277/

add some synchronized

baishuo · 2014-07-01T06:39:39Z

I add some synchronized please see if it is thread safe，and Jenkins should test this once more

aarondav · 2014-07-01T07:13:25Z

This is indeed threadsafe, but perhaps overeager. I think we should aim for get()s to be relatively fast, and I think we can avoid extra synchronization there.

aarondav · 2014-07-01T07:14:15Z

sql/core/src/main/scala/org/apache/spark/sql/SQLConf.scala

Perhaps we can remove this synchronized, I don't think we care about the consistency guarantees of inserting multiple properties at once :)

cloud-fan · 2014-07-01T07:15:26Z

With the new synchronizeds you added in usage, I don't think we need ConcurrentHashMap any more. Maybe just a simple HashMap is enough.

aarondav · 2014-07-01T07:15:36Z

sql/core/src/main/scala/org/apache/spark/sql/SQLConf.scala

Similarly, here, just

Option(settings.get(key))

baishuo · 2014-07-01T07:30:09Z

thanks @aarondav ，had modified according to your comment，please help me to check if it is proper

rxin · 2014-07-01T07:39:46Z

Can you undo the indent spacing change?

chenghao-intel · 2014-07-01T07:47:03Z

sql/core/src/main/scala/org/apache/spark/sql/SQLConf.scala

Probably adding the checking logic is better.

Agree. How about Option(settings.get(key)).orElse(throw new NoSuchElementException(key))?

chenghao-intel · 2014-07-01T07:51:29Z

And I also saw the code:

def toDebugString: String = {
    settings.synchronized {
      settings.asScala.toArray.sorted.map{ case (k, v) => s"$k=$v" }.mkString("\n")
    }
  }

Should we remove the synchronized block also? since we use the ConcurrentHashMap instead.

rxin · 2014-07-01T07:52:33Z

sql/core/src/main/scala/org/apache/spark/sql/SQLConf.scala

We should not be using ConcurrentHashMap because this will be a very low contention code path. For low contention code path, ConcurrentHashMap is a very poor choice (as a matter of fact it'll likely be much slower than synchronized, and use a lot more memory)

back to Collections.synchronizedMap

@rxin I think the performance distinction is extremely minor in this case, as there is only one ConcurrentHashMap. ConcurrentHashMap's API tends to be nicer to use, though, as people may not realize that iteration over a SynchronizedMap is not threadsafe, like in the current implementation of SQLConf.

As @baishuo mentioned, if we use synchronizedMap we'll have to add settings.synchronized {} in a few places now.

baishuo · 2014-07-01T07:56:24Z

Hi,@rxin ，had remove indent spacing on
def set(props: Properties): Unit = {
props.asScala.foreach { case (k, v) => this.settings.put(k, v) }
}
please help me to check if it is proper

baishuo · 2014-07-01T09:21:35Z

hi @rxin, how to modify is proper?
use java.util.Collections.synchronizedMap and
use settings.sycnronize {
...............
}
to ensure the thread safe?

concretevitamin · 2014-07-01T22:35:39Z

Yeah, what is motivating this change? When this class got introduced, @rxin commented that java.util.ConcurrentHashMap had bad memory footprint and suggested the current approach instead.

concretevitamin · 2014-07-01T22:43:51Z

Sorry, I didn't realize Reynold had already commented on this thread. The current changes with Option look good.

concretevitamin · 2014-07-03T04:42:51Z

Jenkins, ok to test.

AmplabJenkins · 2014-07-03T04:45:48Z

Merged build triggered.

AmplabJenkins · 2014-07-03T04:45:56Z

Merged build started.

AmplabJenkins · 2014-07-03T04:50:55Z

Merged build finished.

AmplabJenkins · 2014-07-03T04:50:55Z

Refer to this link for build results: https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/16302/

aarondav · 2014-07-03T18:56:58Z

@rxin wins because he says that SQLConf will become a Thread-local variable. This looks good, the only thing to change for thread-safety is to add a synchronized for getAll().

rxin · 2014-07-03T22:18:34Z

Note that the latest code no longer compiles ....

rxin · 2014-07-03T23:03:21Z

sql/core/src/main/scala/org/apache/spark/sql/SQLConf.scala

this line should be getOrElse

This follows code style from BlockManager:

def getLocalFromDisk(blockId: BlockId, serializer: Serializer): Option[Iterator[Any]] = { diskStore.getValues(blockId, serializer).orElse( sys.error("Block " + blockId + " not found on disk, though it should be")) }

Anyway throw expression return Nothing, which can work with both getOrElse and orElse

orElse literally does not compile here, as it returns Option(Nothing).

Ah, I see. get(key: String) need to return a String which I missed. My bad :P

baishuo · 2014-07-04T03:15:47Z

ooh，sorry about the compile error，had change orElse to getOrElse. thank you @rxin

AmplabJenkins · 2014-07-04T03:15:50Z

Merged build triggered.

AmplabJenkins · 2014-07-04T03:16:00Z

Merged build started.

AmplabJenkins · 2014-07-04T05:17:08Z

Merged build finished.

AmplabJenkins · 2014-07-04T05:17:08Z

Refer to this link for build results: https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/16335/

rxin · 2014-07-04T05:33:22Z

Jenkins, retest this please.

AmplabJenkins · 2014-07-04T05:35:51Z

Merged build triggered.

AmplabJenkins · 2014-07-04T05:36:01Z

Merged build started.

AmplabJenkins · 2014-07-04T07:23:34Z

Merged build finished. All automated tests passed.

AmplabJenkins · 2014-07-04T07:23:35Z

All automated tests passed.
Refer to this link for build results: https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/16338/

rxin · 2014-07-04T07:25:22Z

Thanks. I'm merging this in master & branch-1.0.

use concurrent.ConcurrentHashMap instead of util.Collections.synchronizedMap Author: baishuo(白硕) <[email protected]> Closes #1272 from baishuo/master and squashes the following commits: 51ec55d [baishuo(白硕)] Update SQLConf.scala 63da043 [baishuo(白硕)] Update SQLConf.scala 36b6dbd [baishuo(白硕)] Update SQLConf.scala 864faa0 [baishuo(白硕)] Update SQLConf.scala 593096b [baishuo(白硕)] Update SQLConf.scala 7304d9b [baishuo(白硕)] Update SQLConf.scala 843581c [baishuo(白硕)] Update SQLConf.scala 1d3e4a2 [baishuo(白硕)] Update SQLConf.scala 0740f28 [baishuo(白硕)] Update SQLConf.scala (cherry picked from commit 0bbe612) Signed-off-by: Reynold Xin <[email protected]>

use concurrent.ConcurrentHashMap instead of util.Collections.synchronizedMap Author: baishuo(白硕) <[email protected]> Closes apache#1272 from baishuo/master and squashes the following commits: 51ec55d [baishuo(白硕)] Update SQLConf.scala 63da043 [baishuo(白硕)] Update SQLConf.scala 36b6dbd [baishuo(白硕)] Update SQLConf.scala 864faa0 [baishuo(白硕)] Update SQLConf.scala 593096b [baishuo(白硕)] Update SQLConf.scala 7304d9b [baishuo(白硕)] Update SQLConf.scala 843581c [baishuo(白硕)] Update SQLConf.scala 1d3e4a2 [baishuo(白硕)] Update SQLConf.scala 0740f28 [baishuo(白硕)] Update SQLConf.scala

Update SQLConf.scala

0740f28

use concurrent.ConcurrentHashMap instead of util.Collections.synchronizedMap

Update SQLConf.scala

1d3e4a2

add some synchronized

aarondav reviewed Jul 1, 2014
View reviewed changes

sql/core/src/main/scala/org/apache/spark/sql/SQLConf.scala Outdated

Copy link

Contributor

aarondav Jul 1, 2014

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Similarly, here, just

Option(settings.get(key))

Update SQLConf.scala

843581c

Update SQLConf.scala

7304d9b

chenghao-intel reviewed Jul 1, 2014
View reviewed changes

Update SQLConf.scala

593096b

rxin reviewed Jul 1, 2014
View reviewed changes

baishuo added 2 commits July 1, 2014 16:05

Update SQLConf.scala

864faa0

Update SQLConf.scala

36b6dbd

Update SQLConf.scala

63da043

rxin reviewed Jul 3, 2014
View reviewed changes

Update SQLConf.scala

51ec55d

asfgit closed this in 0bbe612 Jul 4, 2014

wangyum pushed a commit that referenced this pull request May 26, 2023

[CARMEL-6327] Support Broadcast Join with Stream Side Skew (#1272)

25a3b90

Update SQLConf.scala #1272

Update SQLConf.scala #1272

Uh oh!

Conversation

baishuo commented Jul 1, 2014

Uh oh!

AmplabJenkins commented Jul 1, 2014

Uh oh!

aarondav commented Jul 1, 2014

Uh oh!

AmplabJenkins commented Jul 1, 2014

Uh oh!

AmplabJenkins commented Jul 1, 2014

Uh oh!

aarondav commented Jul 1, 2014

Uh oh!

AmplabJenkins commented Jul 1, 2014

Uh oh!

AmplabJenkins commented Jul 1, 2014

Uh oh!

baishuo commented Jul 1, 2014

Uh oh!

aarondav commented Jul 1, 2014

Uh oh!

Choose a reason for hiding this comment

Uh oh!

cloud-fan commented Jul 1, 2014

Uh oh!

Choose a reason for hiding this comment

Uh oh!

baishuo commented Jul 1, 2014

Uh oh!

rxin commented Jul 1, 2014

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

chenghao-intel commented Jul 1, 2014

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

baishuo commented Jul 1, 2014

Uh oh!

baishuo commented Jul 1, 2014

Uh oh!

concretevitamin commented Jul 1, 2014

Uh oh!

concretevitamin commented Jul 1, 2014

Uh oh!

concretevitamin commented Jul 3, 2014

Uh oh!

AmplabJenkins commented Jul 3, 2014

Uh oh!

AmplabJenkins commented Jul 3, 2014

Uh oh!

AmplabJenkins commented Jul 3, 2014

Uh oh!

AmplabJenkins commented Jul 3, 2014

Uh oh!

aarondav commented Jul 3, 2014

Uh oh!

rxin commented Jul 3, 2014

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

baishuo commented Jul 4, 2014

Uh oh!

AmplabJenkins commented Jul 4, 2014

Uh oh!