Skip to content

Conversation

@scwf
Copy link
Contributor

@scwf scwf commented May 4, 2014

Some obvious bugs in the head notation, fix them.

@AmplabJenkins
Copy link

Can one of the admins verify this patch?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Don't remove this line.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

ok,line added

@marmbrus
Copy link
Contributor

marmbrus commented May 7, 2014

I guess this got pretty stale. Thanks for updating it!

Jenkins, test this please.

@scwf scwf closed this May 14, 2014
@scwf scwf deleted the dslfix branch August 22, 2014 15:25
bzhaoopenstack pushed a commit to bzhaoopenstack/spark that referenced this pull request Sep 11, 2019
use GOPROXY for faster downloading of go modules and fix conformance job.
johnhany97 pushed a commit to johnhany97/spark that referenced this pull request Jan 15, 2020

### What changes were proposed in this pull request?
In  origin way to judge if a DataSet is empty by
```
 def isEmpty: Boolean = withAction("isEmpty", limit(1).groupBy().count().queryExecution) { plan =>
    plan.executeCollect().head.getLong(0) == 0
  }
```
will add two shuffles by `limit()`, `groupby() and count()`, then collect all data to driver.
In this way we can avoid `oom` when collect data to driver. But it will trigger all partitions calculated and add more shuffle process.

We change it to
```
  def isEmpty: Boolean = withAction("isEmpty", select().queryExecution) { plan =>
    plan.executeTake(1).isEmpty
  }
```
After these pr, we will add a column pruning to origin LogicalPlan and use `executeTake()` API.
then we won't add more shuffle process and just compute only one partition's data in last stage.
In this way we can reduce cost when we call `DataSet.isEmpty()` and won't bring memory issue to driver side.

### Why are the changes needed?
Optimize Dataset.isEmpty()

### Does this PR introduce any user-facing change?
No

### How was this patch tested?
Origin UT

Closes apache#26500 from AngersZhuuuu/SPARK-29874.

Authored-by: angerszhu <[email protected]>
Signed-off-by: Wenchen Fan <[email protected]>
RolatZhang pushed a commit to RolatZhang/spark that referenced this pull request Aug 18, 2023
RolatZhang pushed a commit to RolatZhang/spark that referenced this pull request Aug 18, 2023
RolatZhang pushed a commit to RolatZhang/spark that referenced this pull request Dec 8, 2023
KE-42110 Upgrade snappy-java to 1.1.10.1 (apache#632)
turboFei pushed a commit to turboFei/spark that referenced this pull request Nov 6, 2025
apache#632)

[HADP-53083][SPARK-47383][CORE] Support `spark.shutdown.timeout` config (apache#311)

Make the shutdown hook timeout configurable. If this is not defined it
falls back to the existing behavior, which uses a default timeout of 30
seconds, or whatever is defined in core-site.xml for the
hadoop.service.shutdown.timeout property.

Spark sometimes times out during the shutdown process. This can result
in data left in the queues to be dropped and causes metadata loss (e.g.
event logs, anything written by custom listeners).

This is not easily configurable before this change. The underlying
`org.apache.hadoop.util.ShutdownHookManager` has a default timeout of 30
seconds. It can be configured by setting
hadoop.service.shutdown.timeout, but this must be done in the
core-site.xml/core-default.xml because a new hadoop conf object is
created and there is no opportunity to modify it.

Yes, a new config `spark.shutdown.timeout` is added.

Manual testing in spark-shell. This behavior is not practical to write a
unit test for.

No

Closes apache#45504 from robreeves/sc_shutdown_timeout.

Authored-by: Rob Reeves <[email protected]>

Signed-off-by: Dongjoon Hyun <[email protected]>
Signed-off-by: Yujie Li <[email protected]>
Co-authored-by: Rob Reeves <[email protected]>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants