Skip to content

Conversation

@yhuai
Copy link
Contributor

@yhuai yhuai commented Nov 25, 2015

If we need to download Hive/Hadoop artifacts, try to download a Hadoop that matches the Hadoop used by Spark. If the Hadoop artifact cannot be resolved (e.g. Hadoop version is a vendor specific version like 2.0.0-cdh4.1.1), we will use Hadoop 2.4.0 (we used to hard code this version as the hadoop that we will download from maven) and we will not share Hadoop classes.

I tested this match in my laptop with the following confs (these confs are used by our builds). All tests are good.

build/sbt -Phadoop-1 -Dhadoop.version=1.2.1 -Pkinesis-asl -Phive-thriftserver -Phive
build/sbt -Phadoop-1 -Dhadoop.version=2.0.0-mr1-cdh4.1.1 -Pkinesis-asl -Phive-thriftserver -Phive
build/sbt -Pyarn -Phadoop-2.2 -Pkinesis-asl -Phive-thriftserver -Phive
build/sbt -Pyarn -Phadoop-2.3 -Dhadoop.version=2.3.0 -Pkinesis-asl -Phive-thriftserver -Phive

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@marmbrus This should be a val, right?

@yhuai
Copy link
Contributor Author

yhuai commented Nov 25, 2015

cc @marmbrus @JoshRosen for review.

@yhuai
Copy link
Contributor Author

yhuai commented Nov 26, 2015

what is the pic?

@yhuai yhuai changed the title [SPARK-11998] [SQL] When downloading Hadoop artifacts from maven, we need to try to download the version that is used by Spark [SPARK-11998] [SQL] [test-hadoop2.2] When downloading Hadoop artifacts from maven, we need to try to download the version that is used by Spark Nov 26, 2015
@yhuai
Copy link
Contributor Author

yhuai commented Nov 26, 2015

test this please

@SparkQA
Copy link

SparkQA commented Nov 26, 2015

Test build #46716 has finished for PR 9979 at commit b3317a9.

  • This patch passes all tests.
  • This patch merges cleanly.
  • This patch adds no public classes.

@SparkQA
Copy link

SparkQA commented Nov 26, 2015

Test build #46717 has finished for PR 9979 at commit 83e92ac.

  • This patch passes all tests.
  • This patch merges cleanly.
  • This patch adds no public classes.

@yhuai
Copy link
Contributor Author

yhuai commented Nov 26, 2015

@JoshRosen
Copy link
Contributor

LGTM pending tests, per our offline discussion; this seems fine given that this auto-downloading of Hive classes isn't the recommended approach for production deployments, so the risks here don't seem huge.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't think you need this. It's a partial function

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Done

@SparkQA
Copy link

SparkQA commented Nov 26, 2015

Test build #46722 has finished for PR 9979 at commit 83e92ac.

  • This patch passes all tests.
  • This patch merges cleanly.
  • This patch adds no public classes.

…p that matches the Hadoop used by Spark. If the Hadoop artifact cannot be resolved (e.g. Hadoop version is a vendor specific version like 2.0.0-cdh4.1.1), we will use Hadoop 2.4.0 (we used to hard code this version as the hadoop that we will download from maven) and we will not share Hadoop classes.
@yhuai yhuai changed the title [SPARK-11998] [SQL] [test-hadoop2.2] When downloading Hadoop artifacts from maven, we need to try to download the version that is used by Spark [SPARK-11998] [SQL] [test-hadoop2.0] When downloading Hadoop artifacts from maven, we need to try to download the version that is used by Spark Nov 26, 2015
@yhuai
Copy link
Contributor Author

yhuai commented Nov 26, 2015

@SparkQA
Copy link

SparkQA commented Nov 26, 2015

Test build #46778 has finished for PR 9979 at commit 1f4605e.

  • This patch passes all tests.
  • This patch merges cleanly.
  • This patch adds no public classes.

@yhuai
Copy link
Contributor Author

yhuai commented Nov 27, 2015

OK. I am merging this to master and branch 1.6. I will watch the builds and see if there is any new issues.

@yhuai
Copy link
Contributor Author

yhuai commented Nov 27, 2015

hmm... not sure why https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/46778/consoleFull used hadoop 2.3 profile.

@yhuai
Copy link
Contributor Author

yhuai commented Nov 27, 2015

asfgit pushed a commit that referenced this pull request Nov 27, 2015
…from maven, we need to try to download the version that is used by Spark

If we need to download Hive/Hadoop artifacts, try to download a Hadoop that matches the Hadoop used by Spark. If the Hadoop artifact cannot be resolved (e.g. Hadoop version is a vendor specific version like 2.0.0-cdh4.1.1), we will use Hadoop 2.4.0 (we used to hard code this version as the hadoop that we will download from maven) and we will not share Hadoop classes.

I tested this match in my laptop with the following confs (these confs are used by our builds). All tests are good.
```
build/sbt -Phadoop-1 -Dhadoop.version=1.2.1 -Pkinesis-asl -Phive-thriftserver -Phive
build/sbt -Phadoop-1 -Dhadoop.version=2.0.0-mr1-cdh4.1.1 -Pkinesis-asl -Phive-thriftserver -Phive
build/sbt -Pyarn -Phadoop-2.2 -Pkinesis-asl -Phive-thriftserver -Phive
build/sbt -Pyarn -Phadoop-2.3 -Dhadoop.version=2.3.0 -Pkinesis-asl -Phive-thriftserver -Phive
```

Author: Yin Huai <[email protected]>

Closes #9979 from yhuai/versionsSuite.

(cherry picked from commit ad76562)
Signed-off-by: Yin Huai <[email protected]>
@asfgit asfgit closed this in ad76562 Nov 27, 2015
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

4 participants