-
Notifications
You must be signed in to change notification settings - Fork 28.9k
[Fix #79] Replace Breakable For Loops By While Loops #503
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
|
Can one of the admins verify this patch? |
|
Jenkins, test this please |
|
Merged build triggered. |
|
Merged build started. |
|
Merged build finished. |
|
Refer to this link for build results: https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/14406/ |
|
@mateiz retest this please, I think failure is due to some other PR, as these changes have nothing to do with Streaming. |
|
Jenkins, retest this please. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
per Spark style guide, can you change this to include curly braces, e.g.
if (allLeaf) {
break = true
} else {
level += 1
}Thanks.
|
Merged build triggered. |
|
Merged build started. |
|
Merged build finished. All automated tests passed. |
|
All automated tests passed. |
|
@rxin all suggested changes done, tests passed |
|
Thanks. I've merged this. |
Author: Sandeep <[email protected]> Closes #503 from techaddict/fix-79 and squashes the following commits: e3f6746 [Sandeep] Style changes 07a4f6b [Sandeep] for loop to While loop 0a6d8e9 [Sandeep] Breakable for loop to While loop (cherry picked from commit bb68f47) Signed-off-by: Reynold Xin <[email protected]>
Fix bug on read-side of external sort when using Snappy. This case wasn't handled correctly and this patch fixes it.
Author: Sandeep <[email protected]> Closes apache#503 from techaddict/fix-79 and squashes the following commits: e3f6746 [Sandeep] Style changes 07a4f6b [Sandeep] for loop to While loop 0a6d8e9 [Sandeep] Breakable for loop to While loop
Fix bug on read-side of external sort when using Snappy. This case wasn't handled correctly and this patch fixes it. (cherry picked from commit 3d6e754) Signed-off-by: Patrick Wendell <[email protected]>
* Add support for config-gcc role
You can use below config to enbale gcc-7:
```
roles:
- role: config-gcc
gcc_version: 7
```
Close-issue: theopenlab/openlab#239
* Add periodic job for envoy and containerd
This patch add the periodic jobs for envoy and containerd arm
build.
- containerd-build-arm64
- envoy-build-arm64
### What changes were proposed in this pull request?
This PR makes the `SortMergeJoin` to run in batches.
### Why are the changes needed?
1. Improve `SortMergeJoin` performance.
For example(Please download two files from https://issues.apache.org/jira/browse/SPARK-49627):
```scala
import org.apache.spark.benchmark.Benchmark
sql(
"""
|CREATE TEMPORARY VIEW t1
|USING org.apache.spark.sql.parquet
|OPTIONS (
| path "file:///Users/yumwang/Downloads/t1.gz.parquet"
|)
|""".stripMargin)
sql(
"""
|CREATE TEMPORARY VIEW t2
|USING org.apache.spark.sql.parquet
|OPTIONS (
| path "file:///Users/yumwang/Downloads/t2.snappy.parquet"
|)
|""".stripMargin)
val benchmark = new Benchmark("Benchmark SortMergeJoin in batch", 121529592L, minNumIters = 1)
benchmark.addCase("Run benchmark") { _ =>
sql("select /*+ MERGE(t2) */ t2.id from t1 left join t2 on t1.id = t2.id").write.format("noop").mode("Overwrite").save()
}
benchmark.run()
```
The benchmark result before this PR:
```
OpenJDK 64-Bit Server VM 17.0.7+7-LTS on Mac OS X 14.5
Apple M2 Max
Benchmark SortMergeJoin in batch: Best Time(ms) Avg Time(ms) Stdev(ms) Rate(M/s) Per Row(ns) Relative
------------------------------------------------------------------------------------------------------------------------
Run benchmark 32733 32733 0 3.7 269.3 1.0X
```
The benchmark result after this PR:
```
OpenJDK 64-Bit Server VM 17.0.7+7-LTS on Mac OS X 14.5
Apple M2 Max
Benchmark SortMergeJoin in batch: Best Time(ms) Avg Time(ms) Stdev(ms) Rate(M/s) Per Row(ns) Relative
------------------------------------------------------------------------------------------------------------------------
Run benchmark 23191 23191 0 5.2 190.8 1.4X
```
2. Avoid Executor OOM. The list of [`currentRows`](https://github.com/apache/spark/blob/e3133f4abf1cd5667abe5f0d05fa0af0df3033ae/sql/core/src/main/java/org/apache/spark/sql/execution/BufferedRowIterator.java#L34) will not become very large. Before this PR, it may be larger than 27295000, after this PR, it is around 65535.
### Does this PR introduce _any_ user-facing change?
No.
### How was this patch tested?
Unit test and benchmark test.
### Was this patch authored or co-authored using generative AI tooling?
No.
No description provided.