-
Notifications
You must be signed in to change notification settings - Fork 28.9k
[SPARK-1112, 2156] use min akka frame size to decide how to send task results #1124
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Changes from all commits
5ca1edd
c277831
41c85e7
ec3ec28
4f376a4
File filter
Filter by extension
Conversations
Jump to
Diff view
Diff view
There are no files selected for viewing
| Original file line number | Diff line number | Diff line change |
|---|---|---|
|
|
@@ -97,10 +97,6 @@ private[spark] class Executor( | |
| private val urlClassLoader = createClassLoader() | ||
| private val replClassLoader = addReplClassLoaderIfNeeded(urlClassLoader) | ||
|
|
||
| // Akka's message frame size. If task result is bigger than this, we use the block manager | ||
| // to send the result back. | ||
| private val akkaFrameSize = AkkaUtils.maxFrameSizeBytes(conf) | ||
|
|
||
| // Start worker thread pool | ||
| val threadPool = Utils.newDaemonCachedThreadPool("Executor task launch worker") | ||
|
|
||
|
|
@@ -212,7 +208,12 @@ private[spark] class Executor( | |
| val serializedDirectResult = ser.serialize(directResult) | ||
| logInfo("Serialized size of result for " + taskId + " is " + serializedDirectResult.limit) | ||
| val serializedResult = { | ||
| if (serializedDirectResult.limit >= akkaFrameSize - 1024) { | ||
| // TODO: [SPARK-1112] We use the min frame size to determine whether to use Akka to send | ||
| // the task result or block manager. Since this is via the backend, whose actor system is | ||
| // initialized before receiving the Spark conf, and hence it does not know | ||
| // `spark.akka.frameSize`. A temporary solution is using the min frame size. | ||
| // [SPARK-2156] We subtract 200K to leave some space for other data in the Akka message. | ||
| if (serializedDirectResult.limit >= AkkaUtils.minFrameSizeBytes - 200 * 1024) { | ||
|
Contributor
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Why 200K and should we change the similar code that does this for sending task closures (which also subtracts 1024)?
Contributor
Author
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. We computed the difference between the size of the akka message and the size of serialized task result (~10M). The difference is smaller than 60K. I set 200K to be safe. Could you point me to the places where we use 1024?
Contributor
Author
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. @kayousterhout I saw the line in
Contributor
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Re: your last question, I have no idea...you definitely know more about this than I do at this point. Using the higher, 200K value seems like a safe alternative to what's currently there. It would be great to add this constant in AkkaUtils so we don't need to manually track this down if it changes again in the future. |
||
| logInfo("Storing result for " + taskId + " in local BlockManager") | ||
| val blockId = TaskResultBlockId(taskId) | ||
| env.blockManager.putBytes( | ||
|
|
||
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
200 KB may not enough , Its value should increase as the serialized DirectResult becomes larger .
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I tested with a double array of size very close to 10 * 1024 * 1024. The akka message overhead is about 30-60K. This PR doesn't fix the issues with receiving new tasks from the driver that are bigger than 10MB. @pwendell is working on it.