Skip to content

Conversation

@witgo
Copy link
Contributor

@witgo witgo commented May 30, 2014

No description provided.

@witgo witgo changed the title In some cases, yarn does not automatically restart the container [WIP] In some cases, yarn does not automatically restart the container May 30, 2014
@AmplabJenkins
Copy link

Merged build triggered.

@AmplabJenkins
Copy link

Merged build started.

@sryza
Copy link
Contributor

sryza commented May 30, 2014

This is already handled in ExecutorLauncher.launchReporterThread and ApplicationMaster.launchReporterThread, no?

@AmplabJenkins
Copy link

Merged build finished. All automated tests passed.

@AmplabJenkins
Copy link

All automated tests passed.
Refer to this link for build results: https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/15306/

@witgo witgo changed the title [WIP] In some cases, yarn does not automatically restart the container In some cases, yarn does not automatically restart the container May 31, 2014
@AmplabJenkins
Copy link

Merged build triggered.

@AmplabJenkins
Copy link

Merged build started.

@witgo
Copy link
Contributor Author

witgo commented May 31, 2014

@sryza
When yarnAllocator.getNumExecutorsFailed return value is greater than zero .
yarnAllocator.getNumExecutorsRunning < args.numExecutors is true forever .
That is to say,In this case,only expression userThread.isAlive or !driverClosed is false, ExecutorLauncher.launchReporterThread or ApplicationMaster.launchReporterThread will execute.

@witgo witgo changed the title In some cases, yarn does not automatically restart the container In some cases, spark-yarn does not automatically restart the container May 31, 2014
@AmplabJenkins
Copy link

Merged build finished. All automated tests passed.

@AmplabJenkins
Copy link

All automated tests passed.
Refer to this link for build results: https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/15314/

@witgo witgo changed the title In some cases, spark-yarn does not automatically restart the container In some cases, spark-yarn does not automatically restart the failed container May 31, 2014
@witgo witgo changed the title In some cases, spark-yarn does not automatically restart the failed container [SPARK-1978] In some cases, spark-yarn does not automatically restart the failed container May 31, 2014
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

So I believe this is what this TODO is referring to so you can remove that TODO.

@AmplabJenkins
Copy link

Merged build triggered.

@AmplabJenkins
Copy link

Merged build started.

@AmplabJenkins
Copy link

Merged build finished. All automated tests passed.

@AmplabJenkins
Copy link

All automated tests passed.
Refer to this link for build results: https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/15476/

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

similar here can you move this up above allocateResources

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

can you change this to have similar logic - allocate outside loop, then inside loop add missing and then allocate.

@AmplabJenkins
Copy link

Merged build triggered.

@AmplabJenkins
Copy link

Merged build started.

@AmplabJenkins
Copy link

Merged build finished. All automated tests passed.

@AmplabJenkins
Copy link

All automated tests passed.
Refer to this link for build results: https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/15551/

@AmplabJenkins
Copy link

Merged build triggered.

@AmplabJenkins
Copy link

Merged build started.

@AmplabJenkins
Copy link

Merged build finished. All automated tests passed.

@AmplabJenkins
Copy link

All automated tests passed.
Refer to this link for build results: https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/15568/

@tgravescs
Copy link
Contributor

Thanks @witgo if you can change the order of the logic in the ExecutorLauncher to match, this looks good.

@AmplabJenkins
Copy link

Merged build triggered.

@witgo
Copy link
Contributor Author

witgo commented Jun 10, 2014

Done

@AmplabJenkins
Copy link

Merged build started.

@AmplabJenkins
Copy link

Merged build finished. All automated tests passed.

@AmplabJenkins
Copy link

All automated tests passed.
Refer to this link for build results: https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/15593/

@tgravescs
Copy link
Contributor

Looks good, +1. Thanks @witgo

@asfgit asfgit closed this in 884ca71 Jun 10, 2014
@witgo witgo deleted the allocateExecutors branch June 10, 2014 15:38
asfgit pushed a commit that referenced this pull request Jun 10, 2014
… the failed container

Author: witgo <[email protected]>

Closes #921 from witgo/allocateExecutors and squashes the following commits:

bc3aa66 [witgo] review commit
8800eba [witgo] Merge branch 'master' of https://github.com/apache/spark into allocateExecutors
32ac7af [witgo] review commit
056b8c7 [witgo] Merge branch 'master' of https://github.com/apache/spark into allocateExecutors
04c6f7e [witgo] Merge branch 'master' into allocateExecutors
aff827c [witgo] review commit
5c376e0 [witgo] Merge branch 'master' of https://github.com/apache/spark into allocateExecutors
1faf4f4 [witgo] Merge branch 'master' into allocateExecutors
3c464bd [witgo] add time limit to allocateExecutors
e00b656 [witgo] In some cases, yarn does not automatically restart the container
@tgravescs
Copy link
Contributor

I merged this into branch-1.0 also

pdeyhim pushed a commit to pdeyhim/spark-1 that referenced this pull request Jun 25, 2014
… the failed container

Author: witgo <[email protected]>

Closes apache#921 from witgo/allocateExecutors and squashes the following commits:

bc3aa66 [witgo] review commit
8800eba [witgo] Merge branch 'master' of https://github.com/apache/spark into allocateExecutors
32ac7af [witgo] review commit
056b8c7 [witgo] Merge branch 'master' of https://github.com/apache/spark into allocateExecutors
04c6f7e [witgo] Merge branch 'master' into allocateExecutors
aff827c [witgo] review commit
5c376e0 [witgo] Merge branch 'master' of https://github.com/apache/spark into allocateExecutors
1faf4f4 [witgo] Merge branch 'master' into allocateExecutors
3c464bd [witgo] add time limit to allocateExecutors
e00b656 [witgo] In some cases, yarn does not automatically restart the container
xiliu82 pushed a commit to xiliu82/spark that referenced this pull request Sep 4, 2014
… the failed container

Author: witgo <[email protected]>

Closes apache#921 from witgo/allocateExecutors and squashes the following commits:

bc3aa66 [witgo] review commit
8800eba [witgo] Merge branch 'master' of https://github.com/apache/spark into allocateExecutors
32ac7af [witgo] review commit
056b8c7 [witgo] Merge branch 'master' of https://github.com/apache/spark into allocateExecutors
04c6f7e [witgo] Merge branch 'master' into allocateExecutors
aff827c [witgo] review commit
5c376e0 [witgo] Merge branch 'master' of https://github.com/apache/spark into allocateExecutors
1faf4f4 [witgo] Merge branch 'master' into allocateExecutors
3c464bd [witgo] add time limit to allocateExecutors
e00b656 [witgo] In some cases, yarn does not automatically restart the container
agirish pushed a commit to HPEEzmeral/apache-spark that referenced this pull request May 5, 2022
udaynpusa pushed a commit to mapr/spark that referenced this pull request Jan 30, 2024
mapr-devops pushed a commit to mapr/spark that referenced this pull request May 8, 2025
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

4 participants