Skip to content

Conversation

@pzzs
Copy link
Contributor

@pzzs pzzs commented Apr 26, 2016

I find that some task recompute before FAILED_TO_UNCOMPRESS happened,and I think that retry operation Corrupted shuffle file that caused this problem. I debug the code and corrupted the shuffle file before it has been readed, this problem happened every time.maybe we can regenerate the shuffle file when it is corrupted

@AmplabJenkins
Copy link

Can one of the admins verify this patch?

@pzzs pzzs changed the title [SPARK-4105][Core] regenerate the shuffle file when it is corrupted [SPARK-4105][CORE] regenerate the shuffle file when it is corrupted Apr 26, 2016
@srowen
Copy link
Member

srowen commented Apr 26, 2016

I get that it's just a band-aid, but it isn't solving the underlying problem right?

@jerryshao
Copy link
Contributor

and I think that retry operation Corrupted shuffle file that caused this problem

Can you explain more about the problem you encountered?

@pzzs
Copy link
Contributor Author

pzzs commented Apr 27, 2016

yeah, I haven't found the root-cause yet and been troubled by this problem for a long time. Any idea for this problem @srowen

@pzzs pzzs closed this Apr 27, 2016
@pzzs pzzs reopened this Apr 27, 2016
@pzzs
Copy link
Contributor Author

pzzs commented Apr 27, 2016

I find that some task recompute before FAILED_TO_UNCOMPRESS happened and think that something like #9610 caused this problem. @jerryshao

@jerryshao
Copy link
Contributor

some task recompute before FAILED_TO_UNCOMPRESS happened

What's the meaning of this? From the code you changed, looks like this corrupted file is happened in shuffle fetch, so what are you referring to "task recompute", map task or reduce task?

Also it would be better to have a simple reproducible case to narrow down the problem and fix it. Otherwise I don't think current fix is quite solid.

@viper-kun
Copy link
Contributor

@jerryshao @srowen
We met this problem in spark 1.4, spark 1.5 and spark 1.6 and just know shuffle file is broken. We can reproduce this problem by modify shuffle file, but don't know the root-cause. Any idea for this problem?

@jerryshao
Copy link
Contributor

Since I don't meet this problem recently, so I cannot exactly tell what actually cause it, maybe race condition, maybe flush problem.

Since you already have the reproducible case, why not dig into more details.

@pzzs
Copy link
Contributor Author

pzzs commented Apr 27, 2016

Now ,i just know that corrupted shuffle file could caused this problem, but i do not know why shufflle file is corrupted. @jerryshao @viper-kun

@pzzs pzzs closed this Jun 27, 2016
@quanter007
Copy link

Can one of the admins verify this patch?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

6 participants