Skip to content

Conversation

@techaddict
Copy link
Contributor

No description provided.

@AmplabJenkins
Copy link

Can one of the admins verify this patch?

@mateiz
Copy link
Contributor

mateiz commented Apr 23, 2014

Jenkins, test this please

@AmplabJenkins
Copy link

Merged build triggered.

@AmplabJenkins
Copy link

Merged build started.

@AmplabJenkins
Copy link

Merged build finished.

@AmplabJenkins
Copy link

Refer to this link for build results: https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/14406/

@techaddict
Copy link
Contributor Author

@mateiz retest this please, I think failure is due to some other PR, as these changes have nothing to do with Streaming.

@rxin
Copy link
Contributor

rxin commented Apr 24, 2014

Jenkins, retest this please.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

per Spark style guide, can you change this to include curly braces, e.g.

if (allLeaf) {
  break = true
} else {
  level += 1
}

Thanks.

@AmplabJenkins
Copy link

Merged build triggered.

@AmplabJenkins
Copy link

Merged build started.

@AmplabJenkins
Copy link

Merged build finished. All automated tests passed.

@AmplabJenkins
Copy link

All automated tests passed.
Refer to this link for build results: https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/14424/

@techaddict
Copy link
Contributor Author

@rxin all suggested changes done, tests passed

@rxin
Copy link
Contributor

rxin commented Apr 24, 2014

Thanks. I've merged this.

@asfgit asfgit closed this in bb68f47 Apr 24, 2014
asfgit pushed a commit that referenced this pull request Apr 24, 2014
Author: Sandeep <[email protected]>

Closes #503 from techaddict/fix-79 and squashes the following commits:

e3f6746 [Sandeep] Style changes
07a4f6b [Sandeep] for loop to While loop
0a6d8e9 [Sandeep] Breakable for loop to While loop

(cherry picked from commit bb68f47)
Signed-off-by: Reynold Xin <[email protected]>
pwendell added a commit to pwendell/spark that referenced this pull request May 12, 2014
Fix bug on read-side of external sort when using Snappy.

This case wasn't handled correctly and this patch fixes it.
pdeyhim pushed a commit to pdeyhim/spark-1 that referenced this pull request Jun 25, 2014
Author: Sandeep <[email protected]>

Closes apache#503 from techaddict/fix-79 and squashes the following commits:

e3f6746 [Sandeep] Style changes
07a4f6b [Sandeep] for loop to While loop
0a6d8e9 [Sandeep] Breakable for loop to While loop
andrewor14 pushed a commit to andrewor14/spark that referenced this pull request Jan 8, 2015
Fix bug on read-side of external sort when using Snappy.

This case wasn't handled correctly and this patch fixes it.
(cherry picked from commit 3d6e754)

Signed-off-by: Patrick Wendell <[email protected]>
mccheah referenced this pull request in palantir/spark Sep 26, 2017
markhamstra pushed a commit to markhamstra/spark that referenced this pull request Nov 7, 2017
bzhaoopenstack pushed a commit to bzhaoopenstack/spark that referenced this pull request Sep 11, 2019
* Add support for config-gcc role

You can use below config to enbale gcc-7:
```
  roles:
    - role: config-gcc
      gcc_version: 7
```

Close-issue: theopenlab/openlab#239

* Add periodic job for envoy and containerd

This patch add the periodic jobs for envoy and containerd arm
build.

- containerd-build-arm64
- envoy-build-arm64
arjunshroff pushed a commit to arjunshroff/spark that referenced this pull request Nov 24, 2020
turboFei pushed a commit to turboFei/spark that referenced this pull request Nov 6, 2025
### What changes were proposed in this pull request?

This PR makes the `SortMergeJoin` to run in batches.

### Why are the changes needed?

1. Improve `SortMergeJoin` performance.
    For example(Please download two files from https://issues.apache.org/jira/browse/SPARK-49627):
    ```scala
    import org.apache.spark.benchmark.Benchmark
    
    sql(
      """
        |CREATE TEMPORARY VIEW t1
        |USING org.apache.spark.sql.parquet
        |OPTIONS (
        |  path "file:///Users/yumwang/Downloads/t1.gz.parquet"
        |)
        |""".stripMargin)
    sql(
      """
        |CREATE TEMPORARY VIEW t2
        |USING org.apache.spark.sql.parquet
        |OPTIONS (
        |  path "file:///Users/yumwang/Downloads/t2.snappy.parquet"
        |)
        |""".stripMargin)

    val benchmark = new Benchmark("Benchmark SortMergeJoin in batch", 121529592L, minNumIters = 1)

    benchmark.addCase("Run benchmark") { _ =>
      sql("select /*+ MERGE(t2) */ t2.id from t1 left join t2 on t1.id = t2.id").write.format("noop").mode("Overwrite").save()
    }
    benchmark.run()
    ```
    The benchmark result before this PR:
    ```
    OpenJDK 64-Bit Server VM 17.0.7+7-LTS on Mac OS X 14.5
    Apple M2 Max
    Benchmark SortMergeJoin in batch:                        Best Time(ms)   Avg Time(ms)   Stdev(ms)    Rate(M/s)   Per Row(ns)   Relative
    ------------------------------------------------------------------------------------------------------------------------
    Run benchmark                                     32733          32733           0          3.7         269.3       1.0X
    ```
    The benchmark result after this PR:
    ```
    OpenJDK 64-Bit Server VM 17.0.7+7-LTS on Mac OS X 14.5
    Apple M2 Max
    Benchmark SortMergeJoin in batch:                        Best Time(ms)   Avg Time(ms)   Stdev(ms)    Rate(M/s)   Per Row(ns)   Relative
    ------------------------------------------------------------------------------------------------------------------------
    Run benchmark                                  23191          23191           0          5.2         190.8       1.4X
    ```
2. Avoid Executor OOM. The list of [`currentRows`](https://github.com/apache/spark/blob/e3133f4abf1cd5667abe5f0d05fa0af0df3033ae/sql/core/src/main/java/org/apache/spark/sql/execution/BufferedRowIterator.java#L34) will not become very large. Before this PR, it may be larger than 27295000, after this PR, it is around 65535.

### Does this PR introduce _any_ user-facing change?

No.

### How was this patch tested?

Unit test and benchmark test.


### Was this patch authored or co-authored using generative AI tooling?

No.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

4 participants