Skip to content

Conversation

@nongli
Copy link
Contributor

@nongli nongli commented Dec 1, 2015

The issue is that the output commiter is not idempotent and retry attempts will
fail because the output file already exists. It is not safe to clean up the file
as this output committer is by design not retryable. Currently, the job fails
with a confusing file exists error. This patch is a stop gap to tell the user
to look at the top of the error log for the proper message.

This is difficult to test locally as Spark is hardcoded not to retry. Manually
verified by upping the retry attempts.

nongli and others added 4 commits November 30, 2015 12:48
The issue is that the output commiter is not idempotent and retry attempts will
fail because the output file already exists. It is not safe to clean up the file
as this output committer is by design not retryable. Currently, the job fails
with a confusing file exists error. This patch is a stop gap to tell the user
to look at the top of the error log for the proper message.

This is difficult to test locally as Spark is hardcoded not to retry. Manually
verified by upping the retry attempts.
@yhuai
Copy link
Contributor

yhuai commented Dec 1, 2015

https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/46989/consoleFull is the test. The originally pr (#9942) was somehow automatically closed...

@yhuai
Copy link
Contributor

yhuai commented Dec 1, 2015

https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/46989/consoleFull is good. I am merging it to master and branch 1.6.

asfgit pushed a commit that referenced this pull request Dec 1, 2015
The issue is that the output commiter is not idempotent and retry attempts will
fail because the output file already exists. It is not safe to clean up the file
as this output committer is by design not retryable. Currently, the job fails
with a confusing file exists error. This patch is a stop gap to tell the user
to look at the top of the error log for the proper message.

This is difficult to test locally as Spark is hardcoded not to retry. Manually
verified by upping the retry attempts.

Author: Nong Li <[email protected]>
Author: Nong Li <[email protected]>

Closes #10080 from nongli/spark-11328.

(cherry picked from commit 47a0abc)
Signed-off-by: Yin Huai <[email protected]>
@asfgit asfgit closed this in 47a0abc Dec 1, 2015
asfgit pushed a commit that referenced this pull request Dec 1, 2015
The issue is that the output commiter is not idempotent and retry attempts will
fail because the output file already exists. It is not safe to clean up the file
as this output committer is by design not retryable. Currently, the job fails
with a confusing file exists error. This patch is a stop gap to tell the user
to look at the top of the error log for the proper message.

This is difficult to test locally as Spark is hardcoded not to retry. Manually
verified by upping the retry attempts.

Author: Nong Li <[email protected]>
Author: Nong Li <[email protected]>

Closes #10080 from nongli/spark-11328.

(cherry picked from commit 47a0abc)
Signed-off-by: Yin Huai <[email protected]>
@yhuai
Copy link
Contributor

yhuai commented Dec 2, 2015

Also merged into branch 1.5.

@SparkQA
Copy link

SparkQA commented Dec 2, 2015

Test build #47011 has finished for PR 10080 at commit c4375ec.

  • This patch passes all tests.
  • This patch merges cleanly.
  • This patch adds no public classes.

@nongli nongli deleted the spark-11328 branch December 2, 2015 19:30
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants