update spark.default.parallelism #389

CrazyJvm · 2014-04-11T06:55:54Z

actually, the value 8 is only valid in mesos fine-grained mode :
override def defaultParallelism() = sc.conf.getInt("spark.default.parallelism", 8)

while in coarse-grained model including mesos coares-grained, the value of the property depending on core numbers!
override def defaultParallelism(): Int = { conf.getInt("spark.default.parallelism", math.max(totalCoreCount.get(), 2)) }

actually, the value 8 is only valid in mesos fine-grained mode : <code> override def defaultParallelism() = sc.conf.getInt("spark.default.parallelism", 8) </code> while in coarse-grained model including mesos coares-grained, the value of the property depending on core numbers! <code> override def defaultParallelism(): Int = { conf.getInt("spark.default.parallelism", math.max(totalCoreCount.get(), 2)) } </code>

AmplabJenkins · 2014-04-11T06:58:11Z

Can one of the admins verify this patch?

mateiz · 2014-04-13T01:29:29Z

Jenkins, test this please

mateiz · 2014-04-13T01:29:32Z

Good catch

AmplabJenkins · 2014-04-13T01:33:12Z

Merged build triggered.

AmplabJenkins · 2014-04-13T01:33:20Z

Merged build started.

AmplabJenkins · 2014-04-13T01:34:13Z

Merged build finished.

AmplabJenkins · 2014-04-13T01:34:14Z

Refer to this link for build results: https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/14084/

pwendell · 2014-04-13T03:17:19Z

Jenkins, retest this please.

AmplabJenkins · 2014-04-13T03:18:11Z

Merged build triggered.

AmplabJenkins · 2014-04-13T03:18:21Z

Merged build started.

AmplabJenkins · 2014-04-13T04:49:24Z

Merged build finished.

AmplabJenkins · 2014-04-13T04:49:25Z

Refer to this link for build results: https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/14086/

mateiz · 2014-04-13T22:13:01Z

docs/configuration.md

Actually to have valid HTML, add </li> at the end of these.

oh yeah, missed </li>. fixed it.

"By default, this uses only 8 parallel tasks to do the grouping." is a big misleading. Please refer to apache#389 detail is as following code : <code> def defaultPartitioner(rdd: RDD[_], others: RDD[_]*): Partitioner = { val bySize = (Seq(rdd) ++ others).sortBy(_.partitions.size).reverse for (r <- bySize if r.partitioner.isDefined) { return r.partitioner.get } if (rdd.context.conf.contains("spark.default.parallelism")) { new HashPartitioner(rdd.context.defaultParallelism) } else { new HashPartitioner(bySize.head.partitions.size) } } </code>

rxin · 2014-04-15T06:00:33Z

Jenkins, retest this please.

AmplabJenkins · 2014-04-15T06:03:13Z

Merged build triggered.

AmplabJenkins · 2014-04-15T06:03:23Z

Merged build started.

AmplabJenkins · 2014-04-15T06:54:55Z

Merged build finished. All automated tests passed.

AmplabJenkins · 2014-04-15T06:54:55Z

All automated tests passed.
Refer to this link for build results: https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/14136/

CrazyJvm · 2014-04-16T06:07:36Z

Jenkins test result is OK, but Travis fails. So...what's going on?

pwendell · 2014-04-16T16:13:58Z

Built this locally and it looked gogd, so I'm merging it. Don't worry about Travis - it's currently experimental.

actually, the value 8 is only valid in mesos fine-grained mode : <code> override def defaultParallelism() = sc.conf.getInt("spark.default.parallelism", 8) </code> while in coarse-grained model including mesos coares-grained, the value of the property depending on core numbers! <code> override def defaultParallelism(): Int = { conf.getInt("spark.default.parallelism", math.max(totalCoreCount.get(), 2)) } </code> Author: Chen Chao <[email protected]> Closes #389 from CrazyJvm/patch-2 and squashes the following commits: 84a7fe4 [Chen Chao] miss </li> at the end of every single line 04a9796 [Chen Chao] change format ee0fae0 [Chen Chao] update spark.default.parallelism (cherry picked from commit 9edd887) Signed-off-by: Patrick Wendell <[email protected]>

"By default, this uses only 8 parallel tasks to do the grouping." is a big misleading. Please refer to #389 detail is as following code : def defaultPartitioner(rdd: RDD[_], others: RDD[_]*): Partitioner = { val bySize = (Seq(rdd) ++ others).sortBy(_.partitions.size).reverse for (r <- bySize if r.partitioner.isDefined) { return r.partitioner.get } if (rdd.context.conf.contains("spark.default.parallelism")) { new HashPartitioner(rdd.context.defaultParallelism) } else { new HashPartitioner(bySize.head.partitions.size) } } Author: Chen Chao <[email protected]> Closes #403 from CrazyJvm/patch-4 and squashes the following commits: 42f6c9e [Chen Chao] fix format 829a995 [Chen Chao] fix format 1568336 [Chen Chao] misleading task number of groupByKey

"By default, this uses only 8 parallel tasks to do the grouping." is a big misleading. Please refer to #389 detail is as following code : def defaultPartitioner(rdd: RDD[_], others: RDD[_]*): Partitioner = { val bySize = (Seq(rdd) ++ others).sortBy(_.partitions.size).reverse for (r <- bySize if r.partitioner.isDefined) { return r.partitioner.get } if (rdd.context.conf.contains("spark.default.parallelism")) { new HashPartitioner(rdd.context.defaultParallelism) } else { new HashPartitioner(bySize.head.partitions.size) } } Author: Chen Chao <[email protected]> Closes #403 from CrazyJvm/patch-4 and squashes the following commits: 42f6c9e [Chen Chao] fix format 829a995 [Chen Chao] fix format 1568336 [Chen Chao] misleading task number of groupByKey (cherry picked from commit 9c40b9e) Signed-off-by: Reynold Xin <[email protected]>

<code> private[streaming] def defaultPartitioner(numPartitions: Int = self.ssc.sc.defaultParallelism){ new HashPartitioner(numPartitions) } </code> it represents that the default task number in Spark Streaming relies on the variable defaultParallelism in SparkContext, which is decided by the config property spark.default.parallelism the property "spark.default.parallelism" refers to apache#389

private[streaming] def defaultPartitioner(numPartitions: Int = self.ssc.sc.defaultParallelism){ new HashPartitioner(numPartitions) } it represents that the default task number in Spark Streaming relies on the variable defaultParallelism in SparkContext, which is decided by the config property spark.default.parallelism the property "spark.default.parallelism" refers to #389 Author: Chen Chao <[email protected]> Closes #766 from CrazyJvm/patch-7 and squashes the following commits: 0b7efba [Chen Chao] Update streaming-programming-guide.md cc5b66c [Chen Chao] default task number misleading in several places (cherry picked from commit 2f63995) Signed-off-by: Reynold Xin <[email protected]>

private[streaming] def defaultPartitioner(numPartitions: Int = self.ssc.sc.defaultParallelism){ new HashPartitioner(numPartitions) } it represents that the default task number in Spark Streaming relies on the variable defaultParallelism in SparkContext, which is decided by the config property spark.default.parallelism the property "spark.default.parallelism" refers to #389 Author: Chen Chao <[email protected]> Closes #766 from CrazyJvm/patch-7 and squashes the following commits: 0b7efba [Chen Chao] Update streaming-programming-guide.md cc5b66c [Chen Chao] default task number misleading in several places

actually, the value 8 is only valid in mesos fine-grained mode : <code> override def defaultParallelism() = sc.conf.getInt("spark.default.parallelism", 8) </code> while in coarse-grained model including mesos coares-grained, the value of the property depending on core numbers! <code> override def defaultParallelism(): Int = { conf.getInt("spark.default.parallelism", math.max(totalCoreCount.get(), 2)) } </code> Author: Chen Chao <[email protected]> Closes apache#389 from CrazyJvm/patch-2 and squashes the following commits: 84a7fe4 [Chen Chao] miss </li> at the end of every single line 04a9796 [Chen Chao] change format ee0fae0 [Chen Chao] update spark.default.parallelism

"By default, this uses only 8 parallel tasks to do the grouping." is a big misleading. Please refer to apache#389 detail is as following code : def defaultPartitioner(rdd: RDD[_], others: RDD[_]*): Partitioner = { val bySize = (Seq(rdd) ++ others).sortBy(_.partitions.size).reverse for (r <- bySize if r.partitioner.isDefined) { return r.partitioner.get } if (rdd.context.conf.contains("spark.default.parallelism")) { new HashPartitioner(rdd.context.defaultParallelism) } else { new HashPartitioner(bySize.head.partitions.size) } } Author: Chen Chao <[email protected]> Closes apache#403 from CrazyJvm/patch-4 and squashes the following commits: 42f6c9e [Chen Chao] fix format 829a995 [Chen Chao] fix format 1568336 [Chen Chao] misleading task number of groupByKey

private[streaming] def defaultPartitioner(numPartitions: Int = self.ssc.sc.defaultParallelism){ new HashPartitioner(numPartitions) } it represents that the default task number in Spark Streaming relies on the variable defaultParallelism in SparkContext, which is decided by the config property spark.default.parallelism the property "spark.default.parallelism" refers to apache#389 Author: Chen Chao <[email protected]> Closes apache#766 from CrazyJvm/patch-7 and squashes the following commits: 0b7efba [Chen Chao] Update streaming-programming-guide.md cc5b66c [Chen Chao] default task number misleading in several places

…dle (apache#389) [SPARK-24767] Propagate MDC to spark-submit thread in InProcessAppHandle (apache#389) [SPARK-24813][BUILD][FOLLOW-UP][HOTFIX] HiveExternalCatalogVersionsSuite still flaky; fall back to Apache archive

Apply AS enabled flavor in FusionCloud job

apache#389) * [SPARK-48173][SQL][3.5] CheckAnalysis should see the entire query plan backport apache#46439 to 3.5 ### What changes were proposed in this pull request? This is a follow-up of apache#38029 . Some custom check rules need to see the entire query plan tree to get some context, but apache#38029 breaks it as it checks the query plan of dangling CTE relations recursively. This PR fixes it by putting back the dangling CTE relation in the main query plan and then check the main query plan. ### Why are the changes needed? Revert the breaking change to custom check rules ### Does this PR introduce _any_ user-facing change? No for most users. This restores the behavior of Spark 3.3 and earlier for custom check rules. ### How was this patch tested? existing tests. ### Was this patch authored or co-authored using generative AI tooling? No Closes apache#46442 from cloud-fan/check2. Lead-authored-by: Wenchen Fan <[email protected]> Co-authored-by: Wenchen Fan <[email protected]> Signed-off-by: Dongjoon Hyun <[email protected]> * fix --------- Signed-off-by: Dongjoon Hyun <[email protected]> Co-authored-by: Wenchen Fan <[email protected]> Co-authored-by: Wenchen Fan <[email protected]>

change format

04a9796

mateiz reviewed Apr 13, 2014
View reviewed changes

docs/configuration.md Outdated

Copy link

Contributor

mateiz Apr 13, 2014

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actually to have valid HTML, add </li> at the end of these.

miss </li> at the end of every single line

84a7fe4

oh yeah, missed </li>. fixed it.

CrazyJvm mentioned this pull request Apr 14, 2014

misleading task number of groupByKey #403

Closed

asfgit closed this in 9edd887 Apr 16, 2014

CrazyJvm mentioned this pull request May 14, 2014

default task number misleading in several places #766

Closed

tangzhankun pushed a commit to tangzhankun/spark that referenced this pull request Aug 9, 2017

fixes apache#389 - increase SparkReadinessWatcher wait time (apache#419)

bd50627

bzhaoopenstack pushed a commit to bzhaoopenstack/spark that referenced this pull request Sep 11, 2019

Merge pull request apache#389 from theopenlab/as-flavor

b6841f5

Apply AS enabled flavor in FusionCloud job

update spark.default.parallelism #389

update spark.default.parallelism #389

Uh oh!

Conversation

CrazyJvm commented Apr 11, 2014

Uh oh!

AmplabJenkins commented Apr 11, 2014

Uh oh!

mateiz commented Apr 13, 2014

Uh oh!

mateiz commented Apr 13, 2014

Uh oh!

AmplabJenkins commented Apr 13, 2014

Uh oh!

AmplabJenkins commented Apr 13, 2014

Uh oh!

AmplabJenkins commented Apr 13, 2014

Uh oh!

AmplabJenkins commented Apr 13, 2014

Uh oh!

pwendell commented Apr 13, 2014

Uh oh!

AmplabJenkins commented Apr 13, 2014

Uh oh!

AmplabJenkins commented Apr 13, 2014

Uh oh!

AmplabJenkins commented Apr 13, 2014

Uh oh!

AmplabJenkins commented Apr 13, 2014

Uh oh!

mateiz Apr 13, 2014

Choose a reason for hiding this comment

Uh oh!

rxin commented Apr 15, 2014

Uh oh!

AmplabJenkins commented Apr 15, 2014

Uh oh!

AmplabJenkins commented Apr 15, 2014

Uh oh!

AmplabJenkins commented Apr 15, 2014

Uh oh!

AmplabJenkins commented Apr 15, 2014

Uh oh!

CrazyJvm commented Apr 16, 2014

Uh oh!

pwendell commented Apr 16, 2014

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

5 participants