Skip to content

Conversation

@JoshRosen
Copy link
Contributor

This patch modifies Spark's SBT build so that it no longer uses retrieveManaged / lib_managed to store its dependencies. The motivations for this change are nicely described on the JIRA ticket (SPARK-7841); my personal interest in doing this stems from the fact that lib_managed has caused me some pain while debugging dependency issues in another PR of mine.

Removing our use of lib_managed would be trivial except for one snag: the Datanucleus JARs, required by Spark SQL's Hive integration, cannot be included in assembly JARs due to problems with merging OSGI plugin.xml files. As a result, several places in the packaging and deployment pipeline assume that these Datanucleus JARs are copied to lib_managed/jars. In the interest of maintaining compatibility, I have chosen to retain the lib_managed/jars directory only for these Datanucleus JARs and have added custom code to SparkBuild.scala to automatically copy those JARs to that folder as part of the assembly task.

dev/mima also depended on lib_managed in a hacky way in order to set classpaths when generating MiMa excludes; I've updated this to obtain the classpaths directly from SBT instead.

/cc @dragos @marmbrus @pwendell @srowen

@JoshRosen
Copy link
Contributor Author

By the way, this change stands a good chance of significantly speeding up our Jenkins builds since they'll no longer waste time re-downloading JARs (it may also reduce the net flakiness).

@JoshRosen JoshRosen changed the title [SPARK-7841] Stop using retrieveManaged to retrieve dependencies in SBT [SPARK-7841][BUILD] Stop using retrieveManaged to retrieve dependencies in SBT Nov 9, 2015
@SparkQA
Copy link

SparkQA commented Nov 10, 2015

Test build #45428 has finished for PR 9575 at commit 828c3b4.

  • This patch fails Spark unit tests.
  • This patch merges cleanly.
  • This patch adds no public classes.

@JoshRosen
Copy link
Contributor Author

Jenkins, retest this please.

@SparkQA
Copy link

SparkQA commented Nov 10, 2015

Test build #45444 has finished for PR 9575 at commit 828c3b4.

  • This patch passes all tests.
  • This patch merges cleanly.
  • This patch adds the following public classes (experimental):\n * class MasterWebUI(\n * public class JavaAFTSurvivalRegressionExample\n

@pwendell
Copy link
Contributor

It's hard for me to rule out that there is no other reason lib_managed is used at present. I audited all the uses of it I could find in the codebase and it appears they all relate to the DataNucleus jars. So LGTM.

@srowen
Copy link
Member

srowen commented Nov 10, 2015

Removing or minimizing use of this directory sounds good. I'm not an expert on how it's used but the reasoning and change sound consistent. Also, would be great to avoid a small number of network-related test failures if possible

@dragos
Copy link
Contributor

dragos commented Nov 10, 2015

Great patch, I've been applying a simplified version on my local clone for a while now.. Glad to see this going in.

@andrewor14
Copy link
Contributor

+1!

@marmbrus
Copy link
Contributor

Great, I'm going to merge this to master and 1.6.

asfgit pushed a commit that referenced this pull request Nov 10, 2015
…es in SBT

This patch modifies Spark's SBT build so that it no longer uses `retrieveManaged` / `lib_managed` to store its dependencies. The motivations for this change are nicely described on the JIRA ticket ([SPARK-7841](https://issues.apache.org/jira/browse/SPARK-7841)); my personal interest in doing this stems from the fact that `lib_managed` has caused me some pain while debugging dependency issues in another PR of mine.

Removing our use of `lib_managed` would be trivial except for one snag: the Datanucleus JARs, required by Spark SQL's Hive integration, cannot be included in assembly JARs due to problems with merging OSGI `plugin.xml` files. As a result, several places in the packaging and deployment pipeline assume that these Datanucleus JARs are copied to `lib_managed/jars`. In the interest of maintaining compatibility, I have chosen to retain the `lib_managed/jars` directory _only_ for these Datanucleus JARs and have added custom code to `SparkBuild.scala` to automatically copy those JARs to that folder as part of the `assembly` task.

`dev/mima` also depended on `lib_managed` in a hacky way in order to set classpaths when generating MiMa excludes; I've updated this to obtain the classpaths directly from SBT instead.

/cc dragos marmbrus pwendell srowen

Author: Josh Rosen <[email protected]>

Closes #9575 from JoshRosen/SPARK-7841.

(cherry picked from commit 689386b)
Signed-off-by: Michael Armbrust <[email protected]>
@asfgit asfgit closed this in 689386b Nov 10, 2015
@JoshRosen JoshRosen deleted the SPARK-7841 branch November 10, 2015 18:21
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

7 participants