-
Notifications
You must be signed in to change notification settings - Fork 28.9k
fix the head notation of package object dsl #632
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
|
Can one of the admins verify this patch? |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Don't remove this line.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
ok,line added
|
I guess this got pretty stale. Thanks for updating it! Jenkins, test this please. |
use GOPROXY for faster downloading of go modules and fix conformance job.
### What changes were proposed in this pull request?
In origin way to judge if a DataSet is empty by
```
def isEmpty: Boolean = withAction("isEmpty", limit(1).groupBy().count().queryExecution) { plan =>
plan.executeCollect().head.getLong(0) == 0
}
```
will add two shuffles by `limit()`, `groupby() and count()`, then collect all data to driver.
In this way we can avoid `oom` when collect data to driver. But it will trigger all partitions calculated and add more shuffle process.
We change it to
```
def isEmpty: Boolean = withAction("isEmpty", select().queryExecution) { plan =>
plan.executeTake(1).isEmpty
}
```
After these pr, we will add a column pruning to origin LogicalPlan and use `executeTake()` API.
then we won't add more shuffle process and just compute only one partition's data in last stage.
In this way we can reduce cost when we call `DataSet.isEmpty()` and won't bring memory issue to driver side.
### Why are the changes needed?
Optimize Dataset.isEmpty()
### Does this PR introduce any user-facing change?
No
### How was this patch tested?
Origin UT
Closes apache#26500 from AngersZhuuuu/SPARK-29874.
Authored-by: angerszhu <[email protected]>
Signed-off-by: Wenchen Fan <[email protected]>
KE-42110 Upgrade snappy-java to 1.1.10.1 (apache#632)
apache#632) [HADP-53083][SPARK-47383][CORE] Support `spark.shutdown.timeout` config (apache#311) Make the shutdown hook timeout configurable. If this is not defined it falls back to the existing behavior, which uses a default timeout of 30 seconds, or whatever is defined in core-site.xml for the hadoop.service.shutdown.timeout property. Spark sometimes times out during the shutdown process. This can result in data left in the queues to be dropped and causes metadata loss (e.g. event logs, anything written by custom listeners). This is not easily configurable before this change. The underlying `org.apache.hadoop.util.ShutdownHookManager` has a default timeout of 30 seconds. It can be configured by setting hadoop.service.shutdown.timeout, but this must be done in the core-site.xml/core-default.xml because a new hadoop conf object is created and there is no opportunity to modify it. Yes, a new config `spark.shutdown.timeout` is added. Manual testing in spark-shell. This behavior is not practical to write a unit test for. No Closes apache#45504 from robreeves/sc_shutdown_timeout. Authored-by: Rob Reeves <[email protected]> Signed-off-by: Dongjoon Hyun <[email protected]> Signed-off-by: Yujie Li <[email protected]> Co-authored-by: Rob Reeves <[email protected]>
Some obvious bugs in the head notation, fix them.