-
Notifications
You must be signed in to change notification settings - Fork 28.9k
Update SQLConf.scala #1272
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Update SQLConf.scala #1272
Conversation
use concurrent.ConcurrentHashMap instead of util.Collections.synchronizedMap
|
Can one of the admins verify this patch? |
|
Jenkins, test this please. |
|
Merged build triggered. |
|
Merged build started. |
|
Out of curiosity, what motivates this change? I thought typically ConcurrentHashMap is used for more heavyweight concurrent data structures, while synchronizedMap() is used when concurrency is rare. Either way, our usage is not threadsafe, since we do things like if (map.contains(x)) map.get(x) else defaultValue |
|
Merged build finished. |
|
Refer to this link for build results: https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/16277/ |
add some synchronized
|
I add some synchronized please see if it is thread safe,and Jenkins should test this once more |
|
This is indeed threadsafe, but perhaps overeager. I think we should aim for get()s to be relatively fast, and I think we can avoid extra synchronization there. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Perhaps we can remove this synchronized, I don't think we care about the consistency guarantees of inserting multiple properties at once :)
|
With the new |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Similarly, here, just
Option(settings.get(key))|
thanks @aarondav ,had modified according to your comment,please help me to check if it is proper |
|
Can you undo the indent spacing change? |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Probably adding the checking logic is better.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Agree. How about Option(settings.get(key)).orElse(throw new NoSuchElementException(key))?
|
And I also saw the code: Should we remove the synchronized block also? since we use the ConcurrentHashMap instead. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We should not be using ConcurrentHashMap because this will be a very low contention code path. For low contention code path, ConcurrentHashMap is a very poor choice (as a matter of fact it'll likely be much slower than synchronized, and use a lot more memory)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
back to Collections.synchronizedMap
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@rxin I think the performance distinction is extremely minor in this case, as there is only one ConcurrentHashMap. ConcurrentHashMap's API tends to be nicer to use, though, as people may not realize that iteration over a SynchronizedMap is not threadsafe, like in the current implementation of SQLConf.
As @baishuo mentioned, if we use synchronizedMap we'll have to add settings.synchronized {} in a few places now.
|
Hi,@rxin ,had remove indent spacing on |
|
hi @rxin, how to modify is proper? |
|
Yeah, what is motivating this change? When this class got introduced, @rxin commented that java.util.ConcurrentHashMap had bad memory footprint and suggested the current approach instead. |
|
Sorry, I didn't realize Reynold had already commented on this thread. The current changes with Option look good. |
|
Jenkins, ok to test. |
|
Merged build triggered. |
|
Merged build started. |
|
Merged build finished. |
|
Refer to this link for build results: https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/16302/ |
|
@rxin wins because he says that SQLConf will become a Thread-local variable. This looks good, the only thing to change for thread-safety is to add a synchronized for getAll(). |
|
Note that the latest code no longer compiles .... |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
this line should be getOrElse
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This follows code style from BlockManager:
def getLocalFromDisk(blockId: BlockId, serializer: Serializer): Option[Iterator[Any]] = {
diskStore.getValues(blockId, serializer).orElse(
sys.error("Block " + blockId + " not found on disk, though it should be"))
}
Anyway throw expression return Nothing, which can work with both getOrElse and orElse
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
orElse literally does not compile here, as it returns Option(Nothing).
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Ah, I see. get(key: String) need to return a String which I missed. My bad :P
|
ooh,sorry about the compile error,had change orElse to getOrElse. thank you @rxin |
|
Merged build triggered. |
|
Merged build started. |
|
Merged build finished. |
|
Refer to this link for build results: https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/16335/ |
|
Jenkins, retest this please. |
|
Merged build triggered. |
|
Merged build started. |
|
Merged build finished. All automated tests passed. |
|
All automated tests passed. |
|
Thanks. I'm merging this in master & branch-1.0. |
use concurrent.ConcurrentHashMap instead of util.Collections.synchronizedMap Author: baishuo(白硕) <[email protected]> Closes #1272 from baishuo/master and squashes the following commits: 51ec55d [baishuo(白硕)] Update SQLConf.scala 63da043 [baishuo(白硕)] Update SQLConf.scala 36b6dbd [baishuo(白硕)] Update SQLConf.scala 864faa0 [baishuo(白硕)] Update SQLConf.scala 593096b [baishuo(白硕)] Update SQLConf.scala 7304d9b [baishuo(白硕)] Update SQLConf.scala 843581c [baishuo(白硕)] Update SQLConf.scala 1d3e4a2 [baishuo(白硕)] Update SQLConf.scala 0740f28 [baishuo(白硕)] Update SQLConf.scala (cherry picked from commit 0bbe612) Signed-off-by: Reynold Xin <[email protected]>
use concurrent.ConcurrentHashMap instead of util.Collections.synchronizedMap Author: baishuo(白硕) <[email protected]> Closes #1272 from baishuo/master and squashes the following commits: 51ec55d [baishuo(白硕)] Update SQLConf.scala 63da043 [baishuo(白硕)] Update SQLConf.scala 36b6dbd [baishuo(白硕)] Update SQLConf.scala 864faa0 [baishuo(白硕)] Update SQLConf.scala 593096b [baishuo(白硕)] Update SQLConf.scala 7304d9b [baishuo(白硕)] Update SQLConf.scala 843581c [baishuo(白硕)] Update SQLConf.scala 1d3e4a2 [baishuo(白硕)] Update SQLConf.scala 0740f28 [baishuo(白硕)] Update SQLConf.scala (cherry picked from commit 0bbe612) Signed-off-by: Reynold Xin <[email protected]>
use concurrent.ConcurrentHashMap instead of util.Collections.synchronizedMap Author: baishuo(白硕) <[email protected]> Closes apache#1272 from baishuo/master and squashes the following commits: 51ec55d [baishuo(白硕)] Update SQLConf.scala 63da043 [baishuo(白硕)] Update SQLConf.scala 36b6dbd [baishuo(白硕)] Update SQLConf.scala 864faa0 [baishuo(白硕)] Update SQLConf.scala 593096b [baishuo(白硕)] Update SQLConf.scala 7304d9b [baishuo(白硕)] Update SQLConf.scala 843581c [baishuo(白硕)] Update SQLConf.scala 1d3e4a2 [baishuo(白硕)] Update SQLConf.scala 0740f28 [baishuo(白硕)] Update SQLConf.scala
use concurrent.ConcurrentHashMap instead of util.Collections.synchronizedMap