Skip to content

Conversation

@marmbrus
Copy link
Contributor

No description provided.

@AmplabJenkins
Copy link

Merged build triggered.

@AmplabJenkins
Copy link

Merged build started.

@AmplabJenkins
Copy link

Merged build finished. All automated tests passed.

@AmplabJenkins
Copy link

All automated tests passed.
Refer to this link for build results: https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/14354/

@rxin
Copy link
Contributor

rxin commented Apr 23, 2014

Ok merged. Thanks!

asfgit pushed a commit that referenced this pull request Apr 23, 2014
Author: Michael Armbrust <[email protected]>

Closes #496 from marmbrus/javaBeanBug and squashes the following commits:

644fedd [Michael Armbrust] Bean methods must be public.

(cherry picked from commit 39f85e0)
Signed-off-by: Reynold Xin <[email protected]>
@asfgit asfgit closed this in 39f85e0 Apr 23, 2014
@marmbrus marmbrus deleted the javaBeanBug branch April 23, 2014 22:00
pwendell added a commit to pwendell/spark that referenced this pull request May 12, 2014
Fix bug in worker clean-up in UI

Introduced in d5a96fe (/cc @aarondav).

This should be picked into 0.8 and 0.9 as well. The bug causes old (zombie) workers on a node to not disappear immediately from the UI when a new one registers.
jhartlaub pushed a commit to jhartlaub/spark that referenced this pull request May 27, 2014
Fix bug in worker clean-up in UI

Introduced in d5a96fe (/cc @aarondav).

This should be picked into 0.8 and 0.9 as well. The bug causes old (zombie) workers on a node to not disappear immediately from the UI when a new one registers.
(cherry picked from commit a1cd185)

Signed-off-by: Patrick Wendell <[email protected]>
pdeyhim pushed a commit to pdeyhim/spark-1 that referenced this pull request Jun 25, 2014
Author: Michael Armbrust <[email protected]>

Closes apache#496 from marmbrus/javaBeanBug and squashes the following commits:

644fedd [Michael Armbrust] Bean methods must be public.
andrewor14 pushed a commit to andrewor14/spark that referenced this pull request Jan 8, 2015
Fix bug in worker clean-up in UI

Introduced in d5a96fe (/cc @aarondav).

This should be picked into 0.8 and 0.9 as well. The bug causes old (zombie) workers on a node to not disappear immediately from the UI when a new one registers.
(cherry picked from commit a1cd185)

Signed-off-by: Patrick Wendell <[email protected]>
yifeih added a commit to yifeih/spark that referenced this pull request Feb 25, 2019
bzhaoopenstack pushed a commit to bzhaoopenstack/spark that referenced this pull request Sep 11, 2019
Bazel is now packed in base images for CNCF project tests now, and the tests are successfully running, we can get rid of the related role for installing it to make the repo clean and easier to maintain. If the current way of doing the tests is not good enough, we can come up with other ways like build bazel in disk-image-builder or try to get an official bazel source in Ubuntu deb to make the whole process simpler.
arjunshroff pushed a commit to arjunshroff/spark that referenced this pull request Nov 24, 2020
RolatZhang pushed a commit to RolatZhang/spark that referenced this pull request Aug 15, 2022
…pache#496)

* [SPARK-34980][SQL] Support coalesce partition through union in AQE

### What changes were proposed in this pull request?

- Split plan into several groups, and every child of union is a new group
- Coalesce paritition for every group

### Why are the changes needed?

#### First Issue
The rule `CoalesceShufflePartitions` can only coalesce paritition if
* leaf node is ShuffleQueryStage
* all shuffle have same partition number

With `Union`, it might break the assumption. Let's say we have such plan
```
Union
   HashAggregate
      ShuffleQueryStage
   FileScan
```
`CoalesceShufflePartitions` can not optimize it and the result partition would be `shuffle partition + FileScan partition` which can be quite lagre.

It's better to support partial optimize with `Union`.

#### Second Issue
the coalesce partition formule used the **sum value** as the total input size and it's not friendly for union, see
```
// ShufflePartitionsUtil.coalescePartitions
val totalPostShuffleInputSize = mapOutputStatistics.flatMap(_.map(_.bytesByPartitionId.sum)).sum
```

So for such case:
```
Union
   HashAggregate
      ShuffleQueryStage
   HashAggregate
      ShuffleQueryStage
```
The `CoalesceShufflePartitions` rule will return an unexpected partition number.

### Does this PR introduce _any_ user-facing change?

Probably yes, the result partition might changed.

### How was this patch tested?

Add test.

Closes apache#32084 from ulysses-you/SPARK-34980.

Lead-authored-by: ulysses-you <[email protected]>
Co-authored-by: ulysses <[email protected]>
Co-authored-by: Wenchen Fan <[email protected]>
Signed-off-by: Wenchen Fan <[email protected]>

(cherry picked from commit 0e23bd7)

* Remove unused import

Co-authored-by: ulysses-you <[email protected]>
RolatZhang pushed a commit to RolatZhang/spark that referenced this pull request Aug 15, 2022
…pache#496)

* [SPARK-34980][SQL] Support coalesce partition through union in AQE

### What changes were proposed in this pull request?

- Split plan into several groups, and every child of union is a new group
- Coalesce paritition for every group

### Why are the changes needed?

#### First Issue
The rule `CoalesceShufflePartitions` can only coalesce paritition if
* leaf node is ShuffleQueryStage
* all shuffle have same partition number

With `Union`, it might break the assumption. Let's say we have such plan
```
Union
   HashAggregate
      ShuffleQueryStage
   FileScan
```
`CoalesceShufflePartitions` can not optimize it and the result partition would be `shuffle partition + FileScan partition` which can be quite lagre.

It's better to support partial optimize with `Union`.

#### Second Issue
the coalesce partition formule used the **sum value** as the total input size and it's not friendly for union, see
```
// ShufflePartitionsUtil.coalescePartitions
val totalPostShuffleInputSize = mapOutputStatistics.flatMap(_.map(_.bytesByPartitionId.sum)).sum
```

So for such case:
```
Union
   HashAggregate
      ShuffleQueryStage
   HashAggregate
      ShuffleQueryStage
```
The `CoalesceShufflePartitions` rule will return an unexpected partition number.

### Does this PR introduce _any_ user-facing change?

Probably yes, the result partition might changed.

### How was this patch tested?

Add test.

Closes apache#32084 from ulysses-you/SPARK-34980.

Lead-authored-by: ulysses-you <[email protected]>
Co-authored-by: ulysses <[email protected]>
Co-authored-by: Wenchen Fan <[email protected]>
Signed-off-by: Wenchen Fan <[email protected]>

(cherry picked from commit 0e23bd7)

* Remove unused import

Co-authored-by: ulysses-you <[email protected]>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants