Skip to content

Conversation

@baishuo
Copy link
Contributor

@baishuo baishuo commented Jul 1, 2014

use concurrent.ConcurrentHashMap instead of util.Collections.synchronizedMap

use concurrent.ConcurrentHashMap instead of util.Collections.synchronizedMap
@AmplabJenkins
Copy link

Can one of the admins verify this patch?

@aarondav
Copy link
Contributor

aarondav commented Jul 1, 2014

Jenkins, test this please.

@AmplabJenkins
Copy link

Merged build triggered.

@AmplabJenkins
Copy link

Merged build started.

@aarondav
Copy link
Contributor

aarondav commented Jul 1, 2014

Out of curiosity, what motivates this change? I thought typically ConcurrentHashMap is used for more heavyweight concurrent data structures, while synchronizedMap() is used when concurrency is rare.

Either way, our usage is not threadsafe, since we do things like

if (map.contains(x)) map.get(x) else defaultValue

@AmplabJenkins
Copy link

Merged build finished.

@AmplabJenkins
Copy link

Refer to this link for build results: https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/16277/

add some synchronized
@baishuo
Copy link
Contributor Author

baishuo commented Jul 1, 2014

I add some synchronized please see if it is thread safe,and Jenkins should test this once more

@aarondav
Copy link
Contributor

aarondav commented Jul 1, 2014

This is indeed threadsafe, but perhaps overeager. I think we should aim for get()s to be relatively fast, and I think we can avoid extra synchronization there.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Perhaps we can remove this synchronized, I don't think we care about the consistency guarantees of inserting multiple properties at once :)

@cloud-fan
Copy link
Contributor

With the new synchronizeds you added in usage, I don't think we need ConcurrentHashMap any more. Maybe just a simple HashMap is enough.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Similarly, here, just

Option(settings.get(key))

@baishuo
Copy link
Contributor Author

baishuo commented Jul 1, 2014

thanks @aarondav ,had modified according to your comment,please help me to check if it is proper

@rxin
Copy link
Contributor

rxin commented Jul 1, 2014

Can you undo the indent spacing change?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Probably adding the checking logic is better.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Agree. How about Option(settings.get(key)).orElse(throw new NoSuchElementException(key))?

@chenghao-intel
Copy link
Contributor

And I also saw the code:

def toDebugString: String = {
    settings.synchronized {
      settings.asScala.toArray.sorted.map{ case (k, v) => s"$k=$v" }.mkString("\n")
    }
  }

Should we remove the synchronized block also? since we use the ConcurrentHashMap instead.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We should not be using ConcurrentHashMap because this will be a very low contention code path. For low contention code path, ConcurrentHashMap is a very poor choice (as a matter of fact it'll likely be much slower than synchronized, and use a lot more memory)

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

back to Collections.synchronizedMap

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@rxin I think the performance distinction is extremely minor in this case, as there is only one ConcurrentHashMap. ConcurrentHashMap's API tends to be nicer to use, though, as people may not realize that iteration over a SynchronizedMap is not threadsafe, like in the current implementation of SQLConf.

As @baishuo mentioned, if we use synchronizedMap we'll have to add settings.synchronized {} in a few places now.

@baishuo
Copy link
Contributor Author

baishuo commented Jul 1, 2014

Hi,@rxin ,had remove indent spacing on
def set(props: Properties): Unit = {
props.asScala.foreach { case (k, v) => this.settings.put(k, v) }
}
please help me to check if it is proper

@baishuo
Copy link
Contributor Author

baishuo commented Jul 1, 2014

hi @rxin, how to modify is proper?
use java.util.Collections.synchronizedMap and
use settings.sycnronize {
...............
}
to ensure the thread safe?

@concretevitamin
Copy link
Contributor

Yeah, what is motivating this change? When this class got introduced, @rxin commented that java.util.ConcurrentHashMap had bad memory footprint and suggested the current approach instead.

@concretevitamin
Copy link
Contributor

Sorry, I didn't realize Reynold had already commented on this thread. The current changes with Option look good.

@concretevitamin
Copy link
Contributor

Jenkins, ok to test.

@AmplabJenkins
Copy link

Merged build triggered.

@AmplabJenkins
Copy link

Merged build started.

@AmplabJenkins
Copy link

Merged build finished.

@AmplabJenkins
Copy link

Refer to this link for build results: https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/16302/

@aarondav
Copy link
Contributor

aarondav commented Jul 3, 2014

@rxin wins because he says that SQLConf will become a Thread-local variable. This looks good, the only thing to change for thread-safety is to add a synchronized for getAll().

@rxin
Copy link
Contributor

rxin commented Jul 3, 2014

Note that the latest code no longer compiles ....

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

this line should be getOrElse

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This follows code style from BlockManager:

  def getLocalFromDisk(blockId: BlockId, serializer: Serializer): Option[Iterator[Any]] = {
    diskStore.getValues(blockId, serializer).orElse(
      sys.error("Block " + blockId + " not found on disk, though it should be"))
  }

Anyway throw expression return Nothing, which can work with both getOrElse and orElse

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

orElse literally does not compile here, as it returns Option(Nothing).

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ah, I see. get(key: String) need to return a String which I missed. My bad :P

@baishuo
Copy link
Contributor Author

baishuo commented Jul 4, 2014

ooh,sorry about the compile error,had change orElse to getOrElse. thank you @rxin

@AmplabJenkins
Copy link

Merged build triggered.

@AmplabJenkins
Copy link

Merged build started.

@AmplabJenkins
Copy link

Merged build finished.

@AmplabJenkins
Copy link

Refer to this link for build results: https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/16335/

@rxin
Copy link
Contributor

rxin commented Jul 4, 2014

Jenkins, retest this please.

@AmplabJenkins
Copy link

Merged build triggered.

@AmplabJenkins
Copy link

Merged build started.

@AmplabJenkins
Copy link

Merged build finished. All automated tests passed.

@AmplabJenkins
Copy link

All automated tests passed.
Refer to this link for build results: https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/16338/

@rxin
Copy link
Contributor

rxin commented Jul 4, 2014

Thanks. I'm merging this in master & branch-1.0.

asfgit pushed a commit that referenced this pull request Jul 4, 2014
use concurrent.ConcurrentHashMap instead of util.Collections.synchronizedMap

Author: baishuo(白硕) <[email protected]>

Closes #1272 from baishuo/master and squashes the following commits:

51ec55d [baishuo(白硕)] Update SQLConf.scala
63da043 [baishuo(白硕)] Update SQLConf.scala
36b6dbd [baishuo(白硕)] Update SQLConf.scala
864faa0 [baishuo(白硕)] Update SQLConf.scala
593096b [baishuo(白硕)] Update SQLConf.scala
7304d9b [baishuo(白硕)] Update SQLConf.scala
843581c [baishuo(白硕)] Update SQLConf.scala
1d3e4a2 [baishuo(白硕)] Update SQLConf.scala
0740f28 [baishuo(白硕)] Update SQLConf.scala

(cherry picked from commit 0bbe612)
Signed-off-by: Reynold Xin <[email protected]>
@asfgit asfgit closed this in 0bbe612 Jul 4, 2014
asfgit pushed a commit that referenced this pull request Jul 4, 2014
use concurrent.ConcurrentHashMap instead of util.Collections.synchronizedMap

Author: baishuo(白硕) <[email protected]>

Closes #1272 from baishuo/master and squashes the following commits:

51ec55d [baishuo(白硕)] Update SQLConf.scala
63da043 [baishuo(白硕)] Update SQLConf.scala
36b6dbd [baishuo(白硕)] Update SQLConf.scala
864faa0 [baishuo(白硕)] Update SQLConf.scala
593096b [baishuo(白硕)] Update SQLConf.scala
7304d9b [baishuo(白硕)] Update SQLConf.scala
843581c [baishuo(白硕)] Update SQLConf.scala
1d3e4a2 [baishuo(白硕)] Update SQLConf.scala
0740f28 [baishuo(白硕)] Update SQLConf.scala

(cherry picked from commit 0bbe612)
Signed-off-by: Reynold Xin <[email protected]>
xiliu82 pushed a commit to xiliu82/spark that referenced this pull request Sep 4, 2014
use concurrent.ConcurrentHashMap instead of util.Collections.synchronizedMap

Author: baishuo(白硕) <[email protected]>

Closes apache#1272 from baishuo/master and squashes the following commits:

51ec55d [baishuo(白硕)] Update SQLConf.scala
63da043 [baishuo(白硕)] Update SQLConf.scala
36b6dbd [baishuo(白硕)] Update SQLConf.scala
864faa0 [baishuo(白硕)] Update SQLConf.scala
593096b [baishuo(白硕)] Update SQLConf.scala
7304d9b [baishuo(白硕)] Update SQLConf.scala
843581c [baishuo(白硕)] Update SQLConf.scala
1d3e4a2 [baishuo(白硕)] Update SQLConf.scala
0740f28 [baishuo(白硕)] Update SQLConf.scala
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

7 participants