Skip to content

Conversation

@ahirreddy
Copy link
Contributor

Changed the Pyrolite dependency to a build which targets Java 6.

@AmplabJenkins
Copy link

Merged build triggered.

@AmplabJenkins
Copy link

Merged build started.

@rxin
Copy link
Contributor

rxin commented Apr 22, 2014

lgtm

@ahirreddy
Copy link
Contributor Author

This might fail the build initially, because I uploaded pyrolite to sonatype just a little while ago. It might be a while before it propagates.

@rxin
Copy link
Contributor

rxin commented Apr 22, 2014

do we actually publish our own pyrolite?

@ahirreddy ahirreddy changed the title Updated Pyrolite Dependency to be Java 6 compatible [SPARK-1560]: Updated Pyrolite Dependency to be Java 6 compatible Apr 22, 2014
@ahirreddy
Copy link
Contributor Author

Yeah, we publish our own. The Pyrolite project itself doesn't maintain anything in maven

@AmplabJenkins
Copy link

Merged build finished. All automated tests passed.

@AmplabJenkins
Copy link

All automated tests passed.
Refer to this link for build results: https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/14323/

@ahirreddy
Copy link
Contributor Author

@pwendell I'm not sure if this invoked the SparkSQL tests

@pwendell
Copy link
Contributor

That's fine, I'm just gonna merge this. The heuristic didn't detect changes in SQL so it didn't run the tests, but we'll catch any errors in the nightly tests.

asfgit pushed a commit that referenced this pull request Apr 22, 2014
Changed the Pyrolite dependency to a build which targets Java 6.

Author: Ahir Reddy <[email protected]>

Closes #479 from ahirreddy/java6-pyrolite and squashes the following commits:

8ea25d3 [Ahir Reddy] Updated maven build to use java 6 compatible pyrolite
dabc703 [Ahir Reddy] Updated Pyrolite dependency to be Java 6 compatible
(cherry picked from commit 0f87e6a)

Signed-off-by: Patrick Wendell <[email protected]>
@asfgit asfgit closed this in 0f87e6a Apr 22, 2014
pdeyhim pushed a commit to pdeyhim/spark-1 that referenced this pull request Jun 25, 2014
Changed the Pyrolite dependency to a build which targets Java 6.

Author: Ahir Reddy <[email protected]>

Closes apache#479 from ahirreddy/java6-pyrolite and squashes the following commits:

8ea25d3 [Ahir Reddy] Updated maven build to use java 6 compatible pyrolite
dabc703 [Ahir Reddy] Updated Pyrolite dependency to be Java 6 compatible
mccheah referenced this pull request in palantir/spark Sep 26, 2017
…river/executors (#479)

* Added configuration properties to inject arbitrary secrets into the driver/executors

* Addressed comments
markhamstra pushed a commit to markhamstra/spark that referenced this pull request Nov 7, 2017
…river/executors (apache#479)

* Added configuration properties to inject arbitrary secrets into the driver/executors

* Addressed comments
bzhaoopenstack pushed a commit to bzhaoopenstack/spark that referenced this pull request Sep 11, 2019
Move outside periodic jobs back to OpenLab
RolatZhang pushed a commit to RolatZhang/spark that referenced this pull request Aug 15, 2022
…#479)

* Revert "KE-37052 translate boolean column to V2Predicate (apache#477)"

This reverts commit 7796f19.

* KE-37052 translate boolean column to V2Predicate (apache#476)

* KE-37052 translate boolean column to V2Predicate

* update spark version
turboFei pushed a commit to turboFei/spark that referenced this pull request Nov 6, 2025
…fespan of `TaskInfo.accumulables()` (apache#479)

### What changes were proposed in this pull request?
`AccumulableInfo` is one of the top heap consumers in driver's heap dumps for stages with many tasks. For a stage with a large number of tasks (**_O(100k)_**), we saw **30%** of the heap usage stemming from `TaskInfo.accumulables()`.

![image](https://github.com/apache/spark/assets/10495099/13ef5d07-abfc-47fd-81b6-705f599db011)

The `TaskSetManager` today keeps around the TaskInfo objects ([ref1](https://github.com/apache/spark/blob/c1ba963e64a22dea28e17b1ed954e6d03d38da1e/core/src/main/scala/org/apache/spark/scheduler/TaskSetManager.scala#L134), [ref2](https://github.com/apache/spark/blob/c1ba963e64a22dea28e17b1ed954e6d03d38da1e/core/src/main/scala/org/apache/spark/scheduler/TaskSetManager.scala#L192))) and in turn the task metrics (`AccumulableInfo`) for every task attempt until the stage is completed. This means that for stages with a large number of tasks, we keep metrics for all the tasks (`AccumulableInfo`) around even when the task has completed and its metrics have been aggregated. Given a task has a large number of metrics, stages with many tasks end up with a large heap usage in the form of task metrics.

This PR is an opt-in change (disabled by default) to reduce the driver's heap usage for stages with many tasks by no longer referencing the task metrics of completed tasks. Once a task is completed in `TaskSetManager`, we no longer keep its metrics around. Upon task completion, we clone the `TaskInfo` object and empty out the metrics for the clone. The cloned `TaskInfo` is retained by the `TaskSetManager` while the original `TaskInfo` object with the metrics is sent over to the `DAGScheduler` where the task metrics are aggregated. Thus for a completed task, `TaskSetManager` holds a `TaskInfo` object with empty metrics. This reduces the memory footprint by ensuring that the number of task metric objects is proportional to the number of active tasks and not to the total number of tasks in the stage.

### Config to gate changes
The changes in the PR are guarded with the Spark conf `spark.scheduler.dropTaskInfoAccumulablesOnTaskCompletion.enabled` which can be used for rollback or staged rollouts.

### Why are the changes disabled by default?
The PR introduces a breaking change wherein the `TaskInfo.accumulables()` are empty for `Resubmitted` tasks upon the loss of an executor. Read apache#44321 (review) for details.

### Why are the changes needed?

Reduce driver's heap usage, especially for stages with many tasks

## Benchmarking
On a cluster running a scan stage with 100k tasks, the TaskSetManager's heap usage dropped from 1.1 GB to 37 MB. This  **reduced the total driver's heap usage by 38%**, down to 2 GB from 3.5 GB.

**BEFORE**

![image](https://github.com/databricks/runtime/assets/10495099/7c1599f3-3587-48a1-b019-84115b1bb90d)
**WITH FIX**
<img width="1386" alt="image" src="https://github.com/databricks/runtime/assets/10495099/b85129c8-dc10-4ee2-898d-61c8e7449616">

### Does this PR introduce _any_ user-facing change?

No

### How was this patch tested?

Added new tests and did benchmarking on a cluster.

### Was this patch authored or co-authored using generative AI tooling?

Generated-by: Github Copilot

Closes apache#44321 from utkarsh39/SPARK-46383.

Authored-by: Utkarsh <[email protected]>
Signed-off-by: Wenchen Fan <[email protected]>

(cherry picked from commit 28da1d8)

Co-authored-by: Utkarsh <[email protected]>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

4 participants