[SPARK-1560]: Updated Pyrolite Dependency to be Java 6 compatible #479

ahirreddy · 2014-04-22T07:35:39Z

Changed the Pyrolite dependency to a build which targets Java 6.

AmplabJenkins · 2014-04-22T07:37:56Z

Merged build triggered.

AmplabJenkins · 2014-04-22T07:38:03Z

Merged build started.

rxin · 2014-04-22T07:39:29Z

lgtm

ahirreddy · 2014-04-22T07:39:34Z

This might fail the build initially, because I uploaded pyrolite to sonatype just a little while ago. It might be a while before it propagates.

rxin · 2014-04-22T07:40:02Z

do we actually publish our own pyrolite?

ahirreddy · 2014-04-22T07:41:58Z

Yeah, we publish our own. The Pyrolite project itself doesn't maintain anything in maven

AmplabJenkins · 2014-04-22T08:17:32Z

Merged build finished. All automated tests passed.

AmplabJenkins · 2014-04-22T08:17:32Z

All automated tests passed.
Refer to this link for build results: https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/14323/

ahirreddy · 2014-04-22T08:55:21Z

@pwendell I'm not sure if this invoked the SparkSQL tests

pwendell · 2014-04-22T16:44:24Z

That's fine, I'm just gonna merge this. The heuristic didn't detect changes in SQL so it didn't run the tests, but we'll catch any errors in the nightly tests.

Changed the Pyrolite dependency to a build which targets Java 6. Author: Ahir Reddy <[email protected]> Closes #479 from ahirreddy/java6-pyrolite and squashes the following commits: 8ea25d3 [Ahir Reddy] Updated maven build to use java 6 compatible pyrolite dabc703 [Ahir Reddy] Updated Pyrolite dependency to be Java 6 compatible (cherry picked from commit 0f87e6a) Signed-off-by: Patrick Wendell <[email protected]>

Changed the Pyrolite dependency to a build which targets Java 6. Author: Ahir Reddy <[email protected]> Closes apache#479 from ahirreddy/java6-pyrolite and squashes the following commits: 8ea25d3 [Ahir Reddy] Updated maven build to use java 6 compatible pyrolite dabc703 [Ahir Reddy] Updated Pyrolite dependency to be Java 6 compatible

…river/executors (#479) * Added configuration properties to inject arbitrary secrets into the driver/executors * Addressed comments

…river/executors (apache#479) * Added configuration properties to inject arbitrary secrets into the driver/executors * Addressed comments

Move outside periodic jobs back to OpenLab

…#479) * Revert "KE-37052 translate boolean column to V2Predicate (apache#477)" This reverts commit 7796f19. * KE-37052 translate boolean column to V2Predicate (apache#476) * KE-37052 translate boolean column to V2Predicate * update spark version

…fespan of `TaskInfo.accumulables()` (apache#479) ### What changes were proposed in this pull request? `AccumulableInfo` is one of the top heap consumers in driver's heap dumps for stages with many tasks. For a stage with a large number of tasks (**_O(100k)_**), we saw **30%** of the heap usage stemming from `TaskInfo.accumulables()`. ![image](https://github.com/apache/spark/assets/10495099/13ef5d07-abfc-47fd-81b6-705f599db011) The `TaskSetManager` today keeps around the TaskInfo objects ([ref1](https://github.com/apache/spark/blob/c1ba963e64a22dea28e17b1ed954e6d03d38da1e/core/src/main/scala/org/apache/spark/scheduler/TaskSetManager.scala#L134), [ref2](https://github.com/apache/spark/blob/c1ba963e64a22dea28e17b1ed954e6d03d38da1e/core/src/main/scala/org/apache/spark/scheduler/TaskSetManager.scala#L192))) and in turn the task metrics (`AccumulableInfo`) for every task attempt until the stage is completed. This means that for stages with a large number of tasks, we keep metrics for all the tasks (`AccumulableInfo`) around even when the task has completed and its metrics have been aggregated. Given a task has a large number of metrics, stages with many tasks end up with a large heap usage in the form of task metrics. This PR is an opt-in change (disabled by default) to reduce the driver's heap usage for stages with many tasks by no longer referencing the task metrics of completed tasks. Once a task is completed in `TaskSetManager`, we no longer keep its metrics around. Upon task completion, we clone the `TaskInfo` object and empty out the metrics for the clone. The cloned `TaskInfo` is retained by the `TaskSetManager` while the original `TaskInfo` object with the metrics is sent over to the `DAGScheduler` where the task metrics are aggregated. Thus for a completed task, `TaskSetManager` holds a `TaskInfo` object with empty metrics. This reduces the memory footprint by ensuring that the number of task metric objects is proportional to the number of active tasks and not to the total number of tasks in the stage. ### Config to gate changes The changes in the PR are guarded with the Spark conf `spark.scheduler.dropTaskInfoAccumulablesOnTaskCompletion.enabled` which can be used for rollback or staged rollouts. ### Why are the changes disabled by default? The PR introduces a breaking change wherein the `TaskInfo.accumulables()` are empty for `Resubmitted` tasks upon the loss of an executor. Read apache#44321 (review) for details. ### Why are the changes needed? Reduce driver's heap usage, especially for stages with many tasks ## Benchmarking On a cluster running a scan stage with 100k tasks, the TaskSetManager's heap usage dropped from 1.1 GB to 37 MB. This **reduced the total driver's heap usage by 38%**, down to 2 GB from 3.5 GB. **BEFORE** ![image](https://github.com/databricks/runtime/assets/10495099/7c1599f3-3587-48a1-b019-84115b1bb90d) **WITH FIX** <img width="1386" alt="image" src="https://github.com/databricks/runtime/assets/10495099/b85129c8-dc10-4ee2-898d-61c8e7449616"> ### Does this PR introduce _any_ user-facing change? No ### How was this patch tested? Added new tests and did benchmarking on a cluster. ### Was this patch authored or co-authored using generative AI tooling? Generated-by: Github Copilot Closes apache#44321 from utkarsh39/SPARK-46383. Authored-by: Utkarsh <[email protected]> Signed-off-by: Wenchen Fan <[email protected]> (cherry picked from commit 28da1d8) Co-authored-by: Utkarsh <[email protected]>

ahirreddy added 2 commits April 22, 2014 00:12

Updated Pyrolite dependency to be Java 6 compatible

dabc703

Updated maven build to use java 6 compatible pyrolite

8ea25d3

ahirreddy changed the title ~~Updated Pyrolite Dependency to be Java 6 compatible~~ [SPARK-1560]: Updated Pyrolite Dependency to be Java 6 compatible Apr 22, 2014

asfgit closed this in 0f87e6a Apr 22, 2014

mccheah referenced this pull request in palantir/spark Sep 26, 2017

Added configuration properties to inject arbitrary secrets into the d…

f28cb17

…river/executors (#479) * Added configuration properties to inject arbitrary secrets into the driver/executors * Addressed comments

bzhaoopenstack pushed a commit to bzhaoopenstack/spark that referenced this pull request Sep 11, 2019

Merge pull request apache#479 from theopenlab/periodic-pipeline

70db21f

Move outside periodic jobs back to OpenLab

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

[SPARK-1560]: Updated Pyrolite Dependency to be Java 6 compatible #479

[SPARK-1560]: Updated Pyrolite Dependency to be Java 6 compatible #479

Uh oh!

ahirreddy commented Apr 22, 2014

Uh oh!

AmplabJenkins commented Apr 22, 2014

Uh oh!

AmplabJenkins commented Apr 22, 2014

Uh oh!

rxin commented Apr 22, 2014

Uh oh!

ahirreddy commented Apr 22, 2014

Uh oh!

rxin commented Apr 22, 2014

Uh oh!

ahirreddy commented Apr 22, 2014

Uh oh!

AmplabJenkins commented Apr 22, 2014

Uh oh!

AmplabJenkins commented Apr 22, 2014

Uh oh!

ahirreddy commented Apr 22, 2014

Uh oh!

pwendell commented Apr 22, 2014

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

4 participants

[SPARK-1560]: Updated Pyrolite Dependency to be Java 6 compatible #479

[SPARK-1560]: Updated Pyrolite Dependency to be Java 6 compatible #479

Uh oh!

Conversation

ahirreddy commented Apr 22, 2014

Uh oh!

AmplabJenkins commented Apr 22, 2014

Uh oh!

AmplabJenkins commented Apr 22, 2014

Uh oh!

rxin commented Apr 22, 2014

Uh oh!

ahirreddy commented Apr 22, 2014

Uh oh!

rxin commented Apr 22, 2014

Uh oh!

ahirreddy commented Apr 22, 2014

Uh oh!

AmplabJenkins commented Apr 22, 2014

Uh oh!

AmplabJenkins commented Apr 22, 2014

Uh oh!

ahirreddy commented Apr 22, 2014

Uh oh!

pwendell commented Apr 22, 2014

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

4 participants