[SPARK-17259] [build] Hadoop 2.7 profile to depend on Hadoop 2.7.3 #14827

steveloughran · 2016-08-26T10:50:56Z

What changes were proposed in this pull request?

increment the hadoop.version value in the hadoop-2.7 profile from 2.7.2 to 2.7.3

This switches to the latest release in the 2.7.x line. Bug fixes, continued compatibility with Java 7.

How was this patch tested?

spark unit tests system tests performed on the version of spark created with this profile enabled

…is is to see if the jenkins builds pick this up; I'm not proposing it as part of the final patch

steveloughran · 2016-08-26T10:52:23Z

This patch tries to set the default version to 2.7; I'll see if SBT picks it up.

This is not something I'm proposing for the final merge; there I expect people to still go -Phadoop-2.7. What I'm trying to do is to get SBT to run all its tests against Hadoop 2.7.3, so that jenkins can assess the validity of the proposal.

srowen · 2016-08-26T10:54:09Z

I'd rather jump straight to the question: is there much value in separately supporting Hadoop 2.2 -> 2.5? And then if we're on 2.6+, is there even any difference with 2.7 that requires a separate profile? these Hadoop profiles are an annoyance, but needed when Hadoop 1.x was in the picture. They're barely needed now.

SparkQA · 2016-08-26T10:54:38Z

Test build #64464 has finished for PR 14827 at commit 515b9ce.

This patch fails build dependency tests.
This patch merges cleanly.
This patch adds no public classes.

srowen · 2016-08-26T10:55:19Z

pom.xml


    <profile>
      <id>hadoop-2.7</id>
+      <activation>


IIRC as soon as you set any profile at all, the active by default ones are disabled. This could be problematic.

How can I set this profile for an SBT build/test run? I just want to have SBT doing it as otherwise the patch is unverified by the machinery. And nobody wants that —do they?

steveloughran · 2016-09-02T13:11:33Z

Sean, the reason for a 2.7 profile is more significant with SPARK-7481 and cloud support, as it can explicitly pull in hadoop-azure (2.7+ only) and hadoop-aws (2.6+ only).

srowen · 2016-09-02T14:27:03Z

OK that makes sense as a reason to have 2.7 vs <2.6. We already have a profile for 2.7 anyway. I don't know if it will help to make it active by default here given how profile activation works.

Bumping to 2.7.3 is fine. But do you perhaps mean to suggest we bump the default Hadoop profile up? to 2.6, 2.7 even? that would really be a change to how the release is built, and how the PR builders run.

If we're doing that, it's worth asking: what's the cost/benefit of supporting <2.6 anyway? I think all the major distros are on at least 2.6 for like 2 years. EMR is on 2.7. CDH is the laggard if anything being on 2.6 + a large number of patches towards 2.7.

It would let us undo a mild bit of reflection hackery in the code and more freely use Hadoop APIs. We'd get rid of loads of build profiles too, hey.

I would not mind hijacking your issue and turning it into this question instead.

Whether it makes sense to start pulling in hadoop-aws etc is a different question.

steveloughran · 2016-09-15T10:17:21Z

I don't know what the default Hadoop version should be, that's the kind of thing to discuss on mailing lists

personally, I'd rush to make 2.6 the bare minimum version; nobody should be using anything below, especially given JVM requirements mean that you can't easily go below that. (Twitter are still using 2.6 and leading the 2.6.5 release BTW; they are the main 2.6 user that I know of).

One thing that would be good would be for jenkins to test on a later profile alongside the bare minimum version considered supportable. Testing with old version: ensures that you don' t accidentally code for later APIs. Testing with newer version: ensures that any module built for the later versions only work, and catch regressions in Hadoop itself.

I don't know what Hadopo APIs MapR codes against.
Yeah, reflection is bad. Makes it hard to identify when methods are being used, when things change

Regarding pulling in aws &c, the WiP patch pulls things in automatically on 2.7 profile. I could add a cloud option which would only build the module if set; and only include the JARs in the spark-assembly. I had had the module pull in the hadoop cloud JARs but not any of the dependent JARs; this would keep the spark-assembly JAR, but on Hadoop < 2.7.3 cause problems on service load.

Anyway, how about you start the discussion on Hadoop versions, this profile goes in *and I make the spark cloud a specific profile which only compiles/runs if hadoop version > 2.7. (you'd need to set both; already you need -Phive for the dataframe-on-cloud tests anyway

srowen · 2016-09-19T15:10:31Z

Go ahead and close this one but I think you deserve 'credit' for the JIRA change, if that makes any difference.

steveloughran added 2 commits August 26, 2016 11:43

[SPARK-17259] Hadoop 2.7 profile to depend on Hadoop 2.7.3

3bc87c6

[SPARK-17259] Hadoop 2.7 profile declare itself active by default. Th…

515b9ce

…is is to see if the jenkins builds pick this up; I'm not proposing it as part of the final patch

srowen reviewed Aug 26, 2016
View reviewed changes

steveloughran changed the title ~~[SPARK-17259] [build] [WiP] Hadoop 2.7 profile to depend on Hadoop 2.7.3~~ [SPARK-17259] [build] Hadoop 2.7 profile to depend on Hadoop 2.7.3 Sep 16, 2016

steveloughran mentioned this pull request Sep 19, 2016

[SPARK-17558] Bump Hadoop 2.7 version from 2.7.2 to 2.7.3 #15115

Closed

HyukjinKwon mentioned this pull request Sep 22, 2016

[BUILD] Closes some stale PRs #15198

Closed

asfgit closed this in 5c5396c Sep 23, 2016

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

[SPARK-17259] [build] Hadoop 2.7 profile to depend on Hadoop 2.7.3 #14827

[SPARK-17259] [build] Hadoop 2.7 profile to depend on Hadoop 2.7.3 #14827

Uh oh!

steveloughran commented Aug 26, 2016 •

edited

Loading

Uh oh!

steveloughran commented Aug 26, 2016

Uh oh!

srowen commented Aug 26, 2016

Uh oh!

SparkQA commented Aug 26, 2016

Uh oh!

srowen Aug 26, 2016

Uh oh!

steveloughran Aug 26, 2016

Uh oh!

steveloughran commented Sep 2, 2016

Uh oh!

srowen commented Sep 2, 2016

Uh oh!

steveloughran commented Sep 15, 2016 •

edited

Loading

Uh oh!

srowen commented Sep 19, 2016

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

[SPARK-17259] [build] Hadoop 2.7 profile to depend on Hadoop 2.7.3 #14827

[SPARK-17259] [build] Hadoop 2.7 profile to depend on Hadoop 2.7.3 #14827

Uh oh!

Conversation

steveloughran commented Aug 26, 2016 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

What changes were proposed in this pull request?

How was this patch tested?

Uh oh!

steveloughran commented Aug 26, 2016

Uh oh!

srowen commented Aug 26, 2016

Uh oh!

SparkQA commented Aug 26, 2016

Uh oh!

srowen Aug 26, 2016

Choose a reason for hiding this comment

Uh oh!

steveloughran Aug 26, 2016

Choose a reason for hiding this comment

Uh oh!

steveloughran commented Sep 2, 2016

Uh oh!

srowen commented Sep 2, 2016

Uh oh!

steveloughran commented Sep 15, 2016 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

srowen commented Sep 19, 2016

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

steveloughran commented Aug 26, 2016 •

edited

Loading

steveloughran commented Sep 15, 2016 •

edited

Loading