Skip to content

Conversation

@a-roberts
Copy link
Contributor

What changes were proposed in this pull request?

Use Hadoop 2.6.5 for the Hadoop 2.6 profile, I see a bunch of fixes including security ones in the release notes that we should pick up

How was this patch tested?

Running the unit tests now with IBM's SDK for Java and let's see what happens with OpenJDK in the community builder - expecting no trouble as it is only a minor release.

@SparkQA
Copy link

SparkQA commented Dec 8, 2016

Test build #69858 has finished for PR 16212 at commit d777246.

  • This patch fails Spark unit tests.
  • This patch merges cleanly.
  • This patch adds no public classes.

@srowen
Copy link
Member

srowen commented Dec 8, 2016

Most fixes won't matter for Spark because Spark only uses some client-side APIs. Is there a specific fix that's important?

It's generally OK to update through maintenance releases regularly, though tackling them one by one is probably too much overhead. At least, are there other maintenance releases of other Hadoop versions we should update?

I consider updating a bunch of things at once at each minor release of Spark, usually

@a-roberts
Copy link
Contributor Author

http://www.openwall.com/lists/oss-security/2016/11/29/1 mentions Hadoop 2.7.x users should upgrade to 2.7.3 and Hadoop 2.6.x users should upgrade to 2.6.5, so if our Hadoop users are moving up to 2.6.5 for 1.6.x, can we be certain Spark will work if we use Hadoop 2.6.4 classes with Hadoop 2.6.5 ones? I'm thinking specifically in terms of autogenerated serial version UID mismatches that may occur

@srowen
Copy link
Member

srowen commented Dec 8, 2016

It sounds like that affects the HDFS NameNode, but Spark has nothing to do with that.

It shouldn't be incompatible across maintenance releases in any event. Hence occasionally bumping it is a decent practice, like at minor releases. But otherwise I might not bother actively updating versions unless it buys something.

@srowen
Copy link
Member

srowen commented Dec 10, 2016

Let's close this if there's no pressing need to update. Spark actually doesn't touch most of Hadoop, just client APIs.

srowen added a commit to srowen/spark that referenced this pull request Jan 1, 2017
@srowen srowen mentioned this pull request Jan 1, 2017
@asfgit asfgit closed this in ba48812 Jan 2, 2017
@a-roberts
Copy link
Contributor Author

@srowen There's a mention here that the YARN NodeManager and CredentialProvider classes present a risk (we bundle and provide the latter, org.apache.hadoop.security.alias.CredentialProvider). I see no direct uses in the Spark code; but I think somebody could use the CredentialProvider we bundle and be impacted.

Bumping up to Hadoop 2.6.5 now would shield us from more potentially relevant CVEs that keep popping up (and save us time investigating) that are only impacting classes in 2.6.4 Hadoop and below.

@srowen
Copy link
Member

srowen commented Jan 16, 2017

OK, if there's any reasonable case for updating it, that's fine. It won't matter much to Spark, in any event. Go ahead again.

@a-roberts
Copy link
Contributor Author

a-roberts commented Jan 17, 2017

Created again at #16616 as I can't reopen this myself or push to the branch without making changes

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants