Skip to content

Conversation

@woshilaiceshide
Copy link
Contributor

Make spark's "local[N]" better.
In our company, we use "local[N]" in production. It works exellentlly. It's our best choice.

…uld be touched by "spark.task.cpus" for every finish/start-up of tasks.
@AmplabJenkins
Copy link

Can one of the admins verify this patch?

@rxin
Copy link
Contributor

rxin commented Jul 23, 2014

Jenkins, test this please.

@rxin
Copy link
Contributor

rxin commented Jul 23, 2014

Do you mind creating a JIRA ticket add add the ticket title to the pull request, like other PRs do? Thanks!

issues.apache.org/jira/browse/SPARK

@SparkQA
Copy link

SparkQA commented Jul 23, 2014

QA tests have started for PR 1544. This patch merges cleanly.
View progress: https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/17024/consoleFull

@SparkQA
Copy link

SparkQA commented Jul 23, 2014

QA results for PR 1544:
- This patch PASSES unit tests.
- This patch merges cleanly
- This patch adds no public classes

For more information see test ouptut:
https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/17024/consoleFull

@mateiz
Copy link
Contributor

mateiz commented Jul 23, 2014

This makes sense but I'm slightly confused by it, why not just launch local[N] with a smaller N if you want fewer threads? Because this setting is the same for each task.

@asfgit asfgit closed this in f776bc9 Jul 23, 2014
@mateiz
Copy link
Contributor

mateiz commented Jul 23, 2014

BTW I've merged this, thanks for the patch.

@woshilaiceshide
Copy link
Contributor Author

@mateiz, because in spark-v1.0.1, "spark.default.parallelism" is not considered in class LocalBackend, which is assigned to totalCores, and totalCores is derived from "local[N]". In spark-v1.0.1, if I want to increase the default parallelism(p), I should increase N in "local[N]", which increases the number(t) of tasks that can be launched in the only local executor, so "spark.task.cpus"(c) comes in. Finally, I make the equation: p-(t-1)+1 = c*2 , which will be true when the split factor is big enough. Be sure to refer to https://github.com/apache/spark/blob/v1.0.1/core/src/main/scala/org/apache/spark/scheduler/local/LocalBackend.scala
But, this bug is revised in the current master: https://github.com/apache/spark/blob/master/core/src/main/scala/org/apache/spark/scheduler/local/LocalBackend.scala

We use "local[N]" in our production, so we paied more attention to "local[N]".

@junnyxi
Copy link

junnyxi commented Jul 24, 2014

waiting the result.

xiliu82 pushed a commit to xiliu82/spark that referenced this pull request Sep 4, 2014
…uld be touched by "spark.task.cpus" for every finish/start-up of tasks.

Make spark's "local[N]" better.
In our company, we use "local[N]" in production. It works exellentlly. It's our best choice.

Author: woshilaiceshide <[email protected]>

Closes apache#1544 from woshilaiceshide/localX and squashes the following commits:

6c85154 [woshilaiceshide] [CORE] SPARK-2640: In "local[N]", free cores of the only executor should be touched by "spark.task.cpus" for every finish/start-up of tasks.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

6 participants