-
Notifications
You must be signed in to change notification settings - Fork 28.9k
[SPARK-12598][Core] bug in setMinPartitions #10546
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
|
Agree, compare to the impl in But @datafarmer please see https://cwiki.apache.org/confluence/display/SPARK/Contributing+to+Spark for how we suggest changes first. @kmader WDYT? |
|
@srowen I guess that I should have created a JIRA ticket first. I just created one: SPARK-12598 |
|
@datafarmer go ahead and update the title here and consider updating the PR itself per above. |
|
@srowen I'll update the PR per your changes. BTW, the FileStatus method isDir is deprecated. Should I change it to isDirectory, or is that something for another PR? |
|
@datafarmer I've just seconds ago merged a change that replaces these deprecated calls, since we can assume Hadoop 2.2+ now. Yes, isDirectory is correct now. |
|
@datafarmer are you able to update this? |
|
@srowen It should already be updated per your request. Let me know if there is something else that needs to be done. |
|
@datafarmer this still shows a merge conflict though. That's what needs to be resolved with a rebase. |
2e21d1b to
73b0e0b
Compare
|
@srowen Should be OK now. |
|
Test build #2347 has finished for PR 10546 at commit
|
There is a bug in the calculation of ```maxSplitSize```. The ```totalLen``` should be divided by ```minPartitions``` and not by ```files.size```. Author: Darek Blasiak <[email protected]> Closes #10546 from datafarmer/setminpartitionsbug. (cherry picked from commit 8346518) Signed-off-by: Sean Owen <[email protected]>
|
Merged to master/1.6 |
There is a bug in the calculation of
maxSplitSize. ThetotalLenshould be divided byminPartitionsand not byfiles.size.