Skip to content

Conversation

@zsxwing
Copy link
Member

@zsxwing zsxwing commented Feb 26, 2016

What changes were proposed in this pull request?

When the driver removes an executor's state, the connection between the driver and the executor may be still alive so that the executor cannot exit automatically (E.g., Master will send RemoveExecutor when a work is lost but the executor is still alive), so the driver should try to tell the executor to stop itself. Otherwise, we will leak an executor.

This PR modified the driver to send StopExecutor to the executor when it's removed.

How was this patch tested?

manual test: increase the worker heartbeat interval to force it's always timeout and the leak executors are gone.

case RemoveExecutor(executorId, reason) =>
// We will remove the executor's state and cannot restore it. However, the connection
// between the driver and the executor may be still alive so that the executor won't exit
// automatically (E.g., Master will send RemoveExecutor when a work is lost but the executor
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

this is the general class so I don't think we should mention standalone-mode specific things

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Removed it

@SparkQA
Copy link

SparkQA commented Feb 26, 2016

Test build #52074 has finished for PR 11399 at commit a42a43d.

  • This patch passes all tests.
  • This patch merges cleanly.
  • This patch adds no public classes.

@SparkQA
Copy link

SparkQA commented Feb 26, 2016

Test build #52072 has finished for PR 11399 at commit 435f020.

  • This patch passes all tests.
  • This patch merges cleanly.
  • This patch adds no public classes.

@SparkQA
Copy link

SparkQA commented Feb 26, 2016

Test build #52075 has finished for PR 11399 at commit 5d2fc40.

  • This patch passes all tests.
  • This patch merges cleanly.
  • This patch adds no public classes.

@SparkQA
Copy link

SparkQA commented Feb 26, 2016

Test build #52077 has finished for PR 11399 at commit a46e015.

  • This patch passes all tests.
  • This patch merges cleanly.
  • This patch adds no public classes.

@andrewor14
Copy link
Contributor

Merged into master. I'm a bit hesitant to backport this into 1.6 because it may change behavior in a way that we did not anticipate. If there is interest to backport we should discuss more.

@asfgit asfgit closed this in ad61529 Feb 26, 2016
@zsxwing zsxwing deleted the SPARK-13519 branch February 26, 2016 23:18
asfgit pushed a commit that referenced this pull request May 11, 2016
…leaning executor's state

## What changes were proposed in this pull request?

When the driver removes an executor's state, the connection between the driver and the executor may be still alive so that the executor cannot exit automatically (E.g., Master will send RemoveExecutor when a work is lost but the executor is still alive), so the driver should try to tell the executor to stop itself. Otherwise, we will leak an executor.

This PR modified the driver to send `StopExecutor` to the executor when it's removed.

## How was this patch tested?

manual test: increase the worker heartbeat interval to force it's always timeout and the leak executors are gone.

Author: Shixiong Zhu <[email protected]>

Closes #11399 from zsxwing/SPARK-13519.
zzcclp pushed a commit to zzcclp/spark that referenced this pull request May 12, 2016
…leaning executor's state

## What changes were proposed in this pull request?

When the driver removes an executor's state, the connection between the driver and the executor may be still alive so that the executor cannot exit automatically (E.g., Master will send RemoveExecutor when a work is lost but the executor is still alive), so the driver should try to tell the executor to stop itself. Otherwise, we will leak an executor.

This PR modified the driver to send `StopExecutor` to the executor when it's removed.

## How was this patch tested?

manual test: increase the worker heartbeat interval to force it's always timeout and the leak executors are gone.

Author: Shixiong Zhu <[email protected]>

Closes apache#11399 from zsxwing/SPARK-13519.

(cherry picked from commit c433c0a)
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants