-
Notifications
You must be signed in to change notification settings - Fork 28.9k
[SPARK-17259] [build] Hadoop 2.7 profile to depend on Hadoop 2.7.3 #14827
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[SPARK-17259] [build] Hadoop 2.7 profile to depend on Hadoop 2.7.3 #14827
Conversation
…is is to see if the jenkins builds pick this up; I'm not proposing it as part of the final patch
|
This patch tries to set the default version to 2.7; I'll see if SBT picks it up. This is not something I'm proposing for the final merge; there I expect people to still go |
|
I'd rather jump straight to the question: is there much value in separately supporting Hadoop 2.2 -> 2.5? And then if we're on 2.6+, is there even any difference with 2.7 that requires a separate profile? these Hadoop profiles are an annoyance, but needed when Hadoop 1.x was in the picture. They're barely needed now. |
|
Test build #64464 has finished for PR 14827 at commit
|
|
|
||
| <profile> | ||
| <id>hadoop-2.7</id> | ||
| <activation> |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
IIRC as soon as you set any profile at all, the active by default ones are disabled. This could be problematic.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
How can I set this profile for an SBT build/test run? I just want to have SBT doing it as otherwise the patch is unverified by the machinery. And nobody wants that —do they?
|
Sean, the reason for a 2.7 profile is more significant with SPARK-7481 and cloud support, as it can explicitly pull in hadoop-azure (2.7+ only) and hadoop-aws (2.6+ only). |
|
OK that makes sense as a reason to have 2.7 vs <2.6. We already have a profile for 2.7 anyway. I don't know if it will help to make it active by default here given how profile activation works. Bumping to 2.7.3 is fine. But do you perhaps mean to suggest we bump the default Hadoop profile up? to 2.6, 2.7 even? that would really be a change to how the release is built, and how the PR builders run. If we're doing that, it's worth asking: what's the cost/benefit of supporting <2.6 anyway? I think all the major distros are on at least 2.6 for like 2 years. EMR is on 2.7. CDH is the laggard if anything being on 2.6 + a large number of patches towards 2.7. It would let us undo a mild bit of reflection hackery in the code and more freely use Hadoop APIs. We'd get rid of loads of build profiles too, hey. I would not mind hijacking your issue and turning it into this question instead. Whether it makes sense to start pulling in hadoop-aws etc is a different question. |
|
I don't know what the default Hadoop version should be, that's the kind of thing to discuss on mailing lists personally, I'd rush to make 2.6 the bare minimum version; nobody should be using anything below, especially given JVM requirements mean that you can't easily go below that. (Twitter are still using 2.6 and leading the 2.6.5 release BTW; they are the main 2.6 user that I know of). One thing that would be good would be for jenkins to test on a later profile alongside the bare minimum version considered supportable. Testing with old version: ensures that you don' t accidentally code for later APIs. Testing with newer version: ensures that any module built for the later versions only work, and catch regressions in Hadoop itself.
Regarding pulling in aws &c, the WiP patch pulls things in automatically on 2.7 profile. I could add a Anyway, how about you start the discussion on Hadoop versions, this profile goes in *and I make the spark cloud a specific profile which only compiles/runs if hadoop version > 2.7. (you'd need to set both; already you need |
|
Go ahead and close this one but I think you deserve 'credit' for the JIRA change, if that makes any difference. |
What changes were proposed in this pull request?
increment the
hadoop.versionvalue in thehadoop-2.7profile from 2.7.2 to 2.7.3This switches to the latest release in the 2.7.x line. Bug fixes, continued compatibility with Java 7.
How was this patch tested?
spark unit tests system tests performed on the version of spark created with this profile enabled