Skip to content

Conversation

@CrazyJvm
Copy link
Contributor

actually, the value 8 is only valid in mesos fine-grained mode :

override def defaultParallelism() = sc.conf.getInt("spark.default.parallelism", 8)

while in coarse-grained model including mesos coares-grained, the value of the property depending on core numbers!

override def defaultParallelism(): Int = {
conf.getInt("spark.default.parallelism", math.max(totalCoreCount.get(), 2))
}

actually, the value 8 is only valid in mesos fine-grained mode :
<code>
  override def defaultParallelism() = sc.conf.getInt("spark.default.parallelism", 8)
</code>

while in coarse-grained model including mesos coares-grained, the value of the property depending on core numbers!
<code> 
override def defaultParallelism(): Int = {
    conf.getInt("spark.default.parallelism", math.max(totalCoreCount.get(), 2))
  }
</code>
@AmplabJenkins
Copy link

Can one of the admins verify this patch?

@mateiz
Copy link
Contributor

mateiz commented Apr 13, 2014

Jenkins, test this please

@mateiz
Copy link
Contributor

mateiz commented Apr 13, 2014

Good catch

@AmplabJenkins
Copy link

Merged build triggered.

@AmplabJenkins
Copy link

Merged build started.

@AmplabJenkins
Copy link

Merged build finished.

@AmplabJenkins
Copy link

Refer to this link for build results: https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/14084/

@pwendell
Copy link
Contributor

Jenkins, retest this please.

@AmplabJenkins
Copy link

Merged build triggered.

@AmplabJenkins
Copy link

Merged build started.

@AmplabJenkins
Copy link

Merged build finished.

@AmplabJenkins
Copy link

Refer to this link for build results: https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/14086/

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actually to have valid HTML, add </li> at the end of these.

oh yeah, missed </li>. fixed it.
CrazyJvm added a commit to CrazyJvm/spark that referenced this pull request Apr 14, 2014
"By default, this uses only 8 parallel tasks to do the grouping." is a big misleading. Please refer to apache#389 

detail is as following code :
<code>
  def defaultPartitioner(rdd: RDD[_], others: RDD[_]*): Partitioner = {
    val bySize = (Seq(rdd) ++ others).sortBy(_.partitions.size).reverse
    for (r <- bySize if r.partitioner.isDefined) {
      return r.partitioner.get
    }
    if (rdd.context.conf.contains("spark.default.parallelism")) {
      new HashPartitioner(rdd.context.defaultParallelism)
    } else {
      new HashPartitioner(bySize.head.partitions.size)
    }
  }
</code>
@rxin
Copy link
Contributor

rxin commented Apr 15, 2014

Jenkins, retest this please.

@AmplabJenkins
Copy link

Merged build triggered.

@AmplabJenkins
Copy link

Merged build started.

@AmplabJenkins
Copy link

Merged build finished. All automated tests passed.

@AmplabJenkins
Copy link

All automated tests passed.
Refer to this link for build results: https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/14136/

@CrazyJvm
Copy link
Contributor Author

Jenkins test result is OK, but Travis fails. So...what's going on?

@pwendell
Copy link
Contributor

Built this locally and it looked gogd, so I'm merging it. Don't worry about Travis - it's currently experimental.

@asfgit asfgit closed this in 9edd887 Apr 16, 2014
asfgit pushed a commit that referenced this pull request Apr 16, 2014
actually, the value 8 is only valid in mesos fine-grained mode :
<code>
  override def defaultParallelism() = sc.conf.getInt("spark.default.parallelism", 8)
</code>

while in coarse-grained model including mesos coares-grained, the value of the property depending on core numbers!
<code>
override def defaultParallelism(): Int = {
   conf.getInt("spark.default.parallelism", math.max(totalCoreCount.get(), 2))
  }
</code>

Author: Chen Chao <[email protected]>

Closes #389 from CrazyJvm/patch-2 and squashes the following commits:

84a7fe4 [Chen Chao] miss </li> at the end of every single line
04a9796 [Chen Chao] change format
ee0fae0 [Chen Chao] update spark.default.parallelism
(cherry picked from commit 9edd887)

Signed-off-by: Patrick Wendell <[email protected]>
asfgit pushed a commit that referenced this pull request Apr 17, 2014
"By default, this uses only 8 parallel tasks to do the grouping." is a big misleading. Please refer to #389

detail is as following code :

  def defaultPartitioner(rdd: RDD[_], others: RDD[_]*): Partitioner = {
    val bySize = (Seq(rdd) ++ others).sortBy(_.partitions.size).reverse
    for (r <- bySize if r.partitioner.isDefined) {
      return r.partitioner.get
    }
    if (rdd.context.conf.contains("spark.default.parallelism")) {
      new HashPartitioner(rdd.context.defaultParallelism)
    } else {
      new HashPartitioner(bySize.head.partitions.size)
    }
  }

Author: Chen Chao <[email protected]>

Closes #403 from CrazyJvm/patch-4 and squashes the following commits:

42f6c9e [Chen Chao] fix format
829a995 [Chen Chao] fix format
1568336 [Chen Chao] misleading task number of groupByKey
asfgit pushed a commit that referenced this pull request Apr 17, 2014
"By default, this uses only 8 parallel tasks to do the grouping." is a big misleading. Please refer to #389

detail is as following code :

  def defaultPartitioner(rdd: RDD[_], others: RDD[_]*): Partitioner = {
    val bySize = (Seq(rdd) ++ others).sortBy(_.partitions.size).reverse
    for (r <- bySize if r.partitioner.isDefined) {
      return r.partitioner.get
    }
    if (rdd.context.conf.contains("spark.default.parallelism")) {
      new HashPartitioner(rdd.context.defaultParallelism)
    } else {
      new HashPartitioner(bySize.head.partitions.size)
    }
  }

Author: Chen Chao <[email protected]>

Closes #403 from CrazyJvm/patch-4 and squashes the following commits:

42f6c9e [Chen Chao] fix format
829a995 [Chen Chao] fix format
1568336 [Chen Chao] misleading task number of groupByKey

(cherry picked from commit 9c40b9e)
Signed-off-by: Reynold Xin <[email protected]>
CrazyJvm added a commit to CrazyJvm/spark that referenced this pull request May 14, 2014
<code>
  private[streaming] def defaultPartitioner(numPartitions: Int = self.ssc.sc.defaultParallelism){
    new HashPartitioner(numPartitions)
  }
</code>

it represents that the default task number in Spark Streaming relies on the variable defaultParallelism in SparkContext, which is decided by the config property spark.default.parallelism

the property "spark.default.parallelism" refers to apache#389
asfgit pushed a commit that referenced this pull request May 15, 2014
  private[streaming] def defaultPartitioner(numPartitions: Int = self.ssc.sc.defaultParallelism){
    new HashPartitioner(numPartitions)
  }

it represents that the default task number in Spark Streaming relies on the variable defaultParallelism in SparkContext, which is decided by the config property spark.default.parallelism

the property "spark.default.parallelism" refers to #389

Author: Chen Chao <[email protected]>

Closes #766 from CrazyJvm/patch-7 and squashes the following commits:

0b7efba [Chen Chao] Update streaming-programming-guide.md
cc5b66c [Chen Chao] default task number misleading in several places

(cherry picked from commit 2f63995)
Signed-off-by: Reynold Xin <[email protected]>
asfgit pushed a commit that referenced this pull request May 15, 2014
  private[streaming] def defaultPartitioner(numPartitions: Int = self.ssc.sc.defaultParallelism){
    new HashPartitioner(numPartitions)
  }

it represents that the default task number in Spark Streaming relies on the variable defaultParallelism in SparkContext, which is decided by the config property spark.default.parallelism

the property "spark.default.parallelism" refers to #389

Author: Chen Chao <[email protected]>

Closes #766 from CrazyJvm/patch-7 and squashes the following commits:

0b7efba [Chen Chao] Update streaming-programming-guide.md
cc5b66c [Chen Chao] default task number misleading in several places
pdeyhim pushed a commit to pdeyhim/spark-1 that referenced this pull request Jun 25, 2014
actually, the value 8 is only valid in mesos fine-grained mode :
<code>
  override def defaultParallelism() = sc.conf.getInt("spark.default.parallelism", 8)
</code>

while in coarse-grained model including mesos coares-grained, the value of the property depending on core numbers!
<code>
override def defaultParallelism(): Int = {
   conf.getInt("spark.default.parallelism", math.max(totalCoreCount.get(), 2))
  }
</code>

Author: Chen Chao <[email protected]>

Closes apache#389 from CrazyJvm/patch-2 and squashes the following commits:

84a7fe4 [Chen Chao] miss </li> at the end of every single line
04a9796 [Chen Chao] change format
ee0fae0 [Chen Chao] update spark.default.parallelism
pdeyhim pushed a commit to pdeyhim/spark-1 that referenced this pull request Jun 25, 2014
"By default, this uses only 8 parallel tasks to do the grouping." is a big misleading. Please refer to apache#389

detail is as following code :

  def defaultPartitioner(rdd: RDD[_], others: RDD[_]*): Partitioner = {
    val bySize = (Seq(rdd) ++ others).sortBy(_.partitions.size).reverse
    for (r <- bySize if r.partitioner.isDefined) {
      return r.partitioner.get
    }
    if (rdd.context.conf.contains("spark.default.parallelism")) {
      new HashPartitioner(rdd.context.defaultParallelism)
    } else {
      new HashPartitioner(bySize.head.partitions.size)
    }
  }

Author: Chen Chao <[email protected]>

Closes apache#403 from CrazyJvm/patch-4 and squashes the following commits:

42f6c9e [Chen Chao] fix format
829a995 [Chen Chao] fix format
1568336 [Chen Chao] misleading task number of groupByKey
pdeyhim pushed a commit to pdeyhim/spark-1 that referenced this pull request Jun 25, 2014
  private[streaming] def defaultPartitioner(numPartitions: Int = self.ssc.sc.defaultParallelism){
    new HashPartitioner(numPartitions)
  }

it represents that the default task number in Spark Streaming relies on the variable defaultParallelism in SparkContext, which is decided by the config property spark.default.parallelism

the property "spark.default.parallelism" refers to apache#389

Author: Chen Chao <[email protected]>

Closes apache#766 from CrazyJvm/patch-7 and squashes the following commits:

0b7efba [Chen Chao] Update streaming-programming-guide.md
cc5b66c [Chen Chao] default task number misleading in several places
tangzhankun pushed a commit to tangzhankun/spark that referenced this pull request Aug 9, 2017
mccheah pushed a commit to mccheah/spark that referenced this pull request Nov 28, 2018
…dle (apache#389)

[SPARK-24767] Propagate MDC to spark-submit thread in InProcessAppHandle (apache#389)
[SPARK-24813][BUILD][FOLLOW-UP][HOTFIX] HiveExternalCatalogVersionsSuite still flaky; fall back to Apache archive
bzhaoopenstack pushed a commit to bzhaoopenstack/spark that referenced this pull request Sep 11, 2019
Apply AS enabled flavor in FusionCloud job
turboFei pushed a commit to turboFei/spark that referenced this pull request Nov 6, 2025
apache#389)

* [SPARK-48173][SQL][3.5] CheckAnalysis should see the entire query plan

backport apache#46439 to 3.5

### What changes were proposed in this pull request?

This is a follow-up of apache#38029 . Some custom check rules need to see the entire query plan tree to get some context, but apache#38029 breaks it as it checks the query plan of dangling CTE relations recursively.

This PR fixes it by putting back the dangling CTE relation in the main query plan and then check the main query plan.

### Why are the changes needed?

Revert the breaking change to custom check rules

### Does this PR introduce _any_ user-facing change?

No for most users. This restores the behavior of Spark 3.3 and earlier for custom check rules.

### How was this patch tested?

existing tests.

### Was this patch authored or co-authored using generative AI tooling?

No

Closes apache#46442 from cloud-fan/check2.

Lead-authored-by: Wenchen Fan <[email protected]>
Co-authored-by: Wenchen Fan <[email protected]>
Signed-off-by: Dongjoon Hyun <[email protected]>

* fix

---------

Signed-off-by: Dongjoon Hyun <[email protected]>
Co-authored-by: Wenchen Fan <[email protected]>
Co-authored-by: Wenchen Fan <[email protected]>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

5 participants