-
Notifications
You must be signed in to change notification settings - Fork 28.9k
[SPARK-7841][BUILD] Stop using retrieveManaged to retrieve dependencies in SBT #9575
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
|
By the way, this change stands a good chance of significantly speeding up our Jenkins builds since they'll no longer waste time re-downloading JARs (it may also reduce the net flakiness). |
|
Test build #45428 has finished for PR 9575 at commit
|
|
Jenkins, retest this please. |
|
Test build #45444 has finished for PR 9575 at commit
|
|
It's hard for me to rule out that there is no other reason lib_managed is used at present. I audited all the uses of it I could find in the codebase and it appears they all relate to the DataNucleus jars. So LGTM. |
|
Removing or minimizing use of this directory sounds good. I'm not an expert on how it's used but the reasoning and change sound consistent. Also, would be great to avoid a small number of network-related test failures if possible |
|
Great patch, I've been applying a simplified version on my local clone for a while now.. Glad to see this going in. |
|
+1! |
|
Great, I'm going to merge this to master and 1.6. |
…es in SBT This patch modifies Spark's SBT build so that it no longer uses `retrieveManaged` / `lib_managed` to store its dependencies. The motivations for this change are nicely described on the JIRA ticket ([SPARK-7841](https://issues.apache.org/jira/browse/SPARK-7841)); my personal interest in doing this stems from the fact that `lib_managed` has caused me some pain while debugging dependency issues in another PR of mine. Removing our use of `lib_managed` would be trivial except for one snag: the Datanucleus JARs, required by Spark SQL's Hive integration, cannot be included in assembly JARs due to problems with merging OSGI `plugin.xml` files. As a result, several places in the packaging and deployment pipeline assume that these Datanucleus JARs are copied to `lib_managed/jars`. In the interest of maintaining compatibility, I have chosen to retain the `lib_managed/jars` directory _only_ for these Datanucleus JARs and have added custom code to `SparkBuild.scala` to automatically copy those JARs to that folder as part of the `assembly` task. `dev/mima` also depended on `lib_managed` in a hacky way in order to set classpaths when generating MiMa excludes; I've updated this to obtain the classpaths directly from SBT instead. /cc dragos marmbrus pwendell srowen Author: Josh Rosen <[email protected]> Closes #9575 from JoshRosen/SPARK-7841. (cherry picked from commit 689386b) Signed-off-by: Michael Armbrust <[email protected]>
This patch modifies Spark's SBT build so that it no longer uses
retrieveManaged/lib_managedto store its dependencies. The motivations for this change are nicely described on the JIRA ticket (SPARK-7841); my personal interest in doing this stems from the fact thatlib_managedhas caused me some pain while debugging dependency issues in another PR of mine.Removing our use of
lib_managedwould be trivial except for one snag: the Datanucleus JARs, required by Spark SQL's Hive integration, cannot be included in assembly JARs due to problems with merging OSGIplugin.xmlfiles. As a result, several places in the packaging and deployment pipeline assume that these Datanucleus JARs are copied tolib_managed/jars. In the interest of maintaining compatibility, I have chosen to retain thelib_managed/jarsdirectory only for these Datanucleus JARs and have added custom code toSparkBuild.scalato automatically copy those JARs to that folder as part of theassemblytask.dev/mimaalso depended onlib_managedin a hacky way in order to set classpaths when generating MiMa excludes; I've updated this to obtain the classpaths directly from SBT instead./cc @dragos @marmbrus @pwendell @srowen