Skip to content

Conversation

@zsxwing
Copy link
Member

@zsxwing zsxwing commented Feb 26, 2016

What changes were proposed in this pull request?

Sometimes, network disconnection event won't be triggered for other potential race conditions that we may not have thought of, then the executor will keep sending heartbeats to driver and won't exit.

This PR adds a new configuration spark.executor.heartbeat.maxFailures to kill Executor when it's unable to heartbeat to the driver more than spark.executor.heartbeat.maxFailures times.

How was this patch tested?

unit tests

@SparkQA
Copy link

SparkQA commented Feb 26, 2016

Test build #52078 has finished for PR 11401 at commit e8ad9dd.

  • This patch passes all tests.
  • This patch merges cleanly.
  • This patch adds no public classes.

@zsxwing
Copy link
Member Author

zsxwing commented Feb 26, 2016

cc @andrewor14

@SparkQA
Copy link

SparkQA commented Feb 27, 2016

Test build #52088 has finished for PR 11401 at commit 1a1e746.

  • This patch passes all tests.
  • This patch merges cleanly.
  • This patch adds no public classes.

@andrewor14
Copy link
Contributor

m

@asfgit asfgit closed this in 17a253c Feb 29, 2016
@zsxwing zsxwing deleted the SPARK-13522 branch February 29, 2016 19:24
asfgit pushed a commit that referenced this pull request Feb 29, 2016
## What changes were proposed in this pull request?

Just fixed the log place introduced by #11401

## How was this patch tested?

unit tests.

Author: Shixiong Zhu <[email protected]>

Closes #11432 from zsxwing/SPARK-13522-follow-up.
roygao94 pushed a commit to roygao94/spark that referenced this pull request Mar 22, 2016
…eartbeat to driver more than N times

## What changes were proposed in this pull request?

Sometimes, network disconnection event won't be triggered for other potential race conditions that we may not have thought of, then the executor will keep sending heartbeats to driver and won't exit.

This PR adds a new configuration `spark.executor.heartbeat.maxFailures` to kill Executor when it's unable to heartbeat to the driver more than `spark.executor.heartbeat.maxFailures` times.

## How was this patch tested?

unit tests

Author: Shixiong Zhu <[email protected]>

Closes apache#11401 from zsxwing/SPARK-13522.
roygao94 pushed a commit to roygao94/spark that referenced this pull request Mar 22, 2016
## What changes were proposed in this pull request?

Just fixed the log place introduced by apache#11401

## How was this patch tested?

unit tests.

Author: Shixiong Zhu <[email protected]>

Closes apache#11432 from zsxwing/SPARK-13522-follow-up.
asfgit pushed a commit that referenced this pull request May 11, 2016
…eartbeat to driver more than N times

## What changes were proposed in this pull request?

Sometimes, network disconnection event won't be triggered for other potential race conditions that we may not have thought of, then the executor will keep sending heartbeats to driver and won't exit.

This PR adds a new configuration `spark.executor.heartbeat.maxFailures` to kill Executor when it's unable to heartbeat to the driver more than `spark.executor.heartbeat.maxFailures` times.

## How was this patch tested?

unit tests

Author: Shixiong Zhu <[email protected]>

Closes #11401 from zsxwing/SPARK-13522.
asfgit pushed a commit that referenced this pull request May 11, 2016
## What changes were proposed in this pull request?

Just fixed the log place introduced by #11401

## How was this patch tested?

unit tests.

Author: Shixiong Zhu <[email protected]>

Closes #11432 from zsxwing/SPARK-13522-follow-up.
zzcclp pushed a commit to zzcclp/spark that referenced this pull request May 12, 2016
…eartbeat to driver more than N times

## What changes were proposed in this pull request?

Sometimes, network disconnection event won't be triggered for other potential race conditions that we may not have thought of, then the executor will keep sending heartbeats to driver and won't exit.

This PR adds a new configuration `spark.executor.heartbeat.maxFailures` to kill Executor when it's unable to heartbeat to the driver more than `spark.executor.heartbeat.maxFailures` times.

## How was this patch tested?

unit tests

Author: Shixiong Zhu <[email protected]>

Closes apache#11401 from zsxwing/SPARK-13522.

(cherry picked from commit 86bf93e)
zzcclp pushed a commit to zzcclp/spark that referenced this pull request May 12, 2016
## What changes were proposed in this pull request?

Just fixed the log place introduced by apache#11401

## How was this patch tested?

unit tests.

Author: Shixiong Zhu <[email protected]>

Closes apache#11432 from zsxwing/SPARK-13522-follow-up.

(cherry picked from commit ced71d3)
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants