-
Notifications
You must be signed in to change notification settings - Fork 28.9k
[SPARK-24287][Core] Spark -packages option should support classifier, no-transitive, and custom conf #21339
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
|
hi there, we also found the same problem, do we have any progress on this patch? |
|
Hoo boy, I remember trying to make this work a long time ago. See #17416 . I wasn't able to. Let me run tests to see how this goes. |
|
Test build #4223 has finished for PR 21339 at commit
|
|
thanks for the follow up. i will rebase this patch to latest master |
|
ping @fangshil |
|
Lets start the review progress. Rebase to latest master. @srowen could you trigger a new test? |
… no-transitive, and custom conf
30f378f to
3e409f5
Compare
|
I'm interested in this issue, and happy to see there's existing PR to evaluate. |
|
I've pulled this patch and played with functionalities, and works well. I've used hive-exec which has default classifier as well as
Not able to load since it's not in default Spark classpath. (expected)
With classifier it properly loads it as transitive dependencies. (works well)
It ignores whole transitive dependencies, hence
It excludes
The jar itself contains
Even we exclude FYI I had to fix scalastyle before building. I'll start reviewing the code and leave comments. cc. @srowen @gaborgsomogyi You might be interested on this PR. |
HeartSaVioR
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Left major comments first. Let me continue taking a look later, maybe in couple of days.
| import java.security.PrivilegedExceptionAction | ||
| import java.text.ParseException | ||
| import java.util.UUID | ||
| import java.util.Collections |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This should be placed before import java.util.UUID. Please run ./dev/scalastyle before pushing to branch.
| // Exclude dependencies(name separated by #) for this artifact | ||
| case "exclude" => pvalue.split("#").foreach { ex => | ||
| dd.addExcludeRule(ivyConfName, | ||
| createExclusion("*:*" + ex + "*:*", ivySettings, ivyConfName))} |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
For now, if we want to exclude org.json:json we should give json as parameter and it will exclude everything which artifact name contains json. This behaves as very misleading: could we just let end users specify details for groupId and artifactId, and let end users explicitly use * if really needed?
|
Also ping @fangshil |
| def params: String = if (extraParams.isEmpty) { | ||
| "" | ||
| } else { | ||
| "?" + extraParams.map{ case (k, v) => (k + "=" + v) }.mkString(":") |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This won't pass the style check I think. "?" + extraParams.map { case (k, v) => k + "=" + v }.mkString(":")
But shouldn't it be mkString("&")?
| * @param coordinates Comma-delimited string of maven coordinates | ||
| * @return Sequence of Maven coordinates | ||
| * Extracts artifact coordinates from a comma-delimited string. Coordinates should be provided | ||
| * in the format `groupId:artifactId:version?param1=value1\¶m2\&value2:..` or |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I kind of thought the coordinate was groupId:artifcatId:classifier:version or something but I can't find a reference for that. OK, I guess this is as much as we can do here to make up a new syntax
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
+1 on this, I would do something like groupId:artifcatId:version:classifier. Having classifier as a param is weird.
| * in the format `groupId:artifactId:version?param1=value1\¶m2\&value2:..` or | ||
| * `groupId/artifactId:version?param1=value1\¶m2=value2:..` | ||
| * | ||
| * Param splitter & is the background process char in cli, so when multiple params is used, |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I don't think this comment is clear. "Shells usually use & as a reserved character, so escape these characters with \& or double-quote the whole argument if used"?
| * Optional params are 'classifier', 'transitive', 'exclude', 'conf': | ||
| * classifier: classifier of the artifact | ||
| * transitive: whether to resolve transitive deps for the artifact | ||
| * exlude: exclude list of transitive artifacts for this artifact(e.g. "a#b#c") |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
exlude -> exclude
I think this is all becoming overkill. Just add support for classifier. If someone needs something more complex, they shouldn't be using this mechanism.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@srowen In our production environment, we also used 'transitive', 'exclude' and 'conf' options quite often. The most commonly used one is 'transitive'. We think these options are flexible and important to support complex scenarios. Why do you think packages option should not support complex scenarios?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
At the first glance I think the additional params can serve testing purposes and tend to agree with Sean. I would personally use the added params (except classifier) only when jars are not properly built (different versions in classpath, etc...).
|
@HeartSaVioR thanks for your review! I will address your comments shortly. I saw you tested 'classifier', 'transitive' and 'exclude' options. @srowen suggested we should only add 'classifier' instead of making it more complex, but in our prod env(using this patch for over 2 years now), we do see a lot of use cases using other options. What do you think? |
|
That really sounds like a case where you want to have a proper build expressing dependencies of your project, and package it with your app |
|
I'm not sure about IMHO, once we support |
|
This was not meant to replicate the type of dependency handling you get in a build tool. It's a convenience mostly for
Anything more though I'm really not sure about. |
|
Have been thinking this more, and while I still feel supporting Suppose the case there's conflict between Spark libs and transitive dependencies which would make some problems:
So |
|
BTW I'm curious there was a specific reason/decision around picking the library (Aether vs Ivy), because we tend to support around Maven coordinate (classifier in this case) which is just natively supported in Aether, maybe less painful to struggle about. Maybe I'm missing what Ivy only supports while Aether does not. |
|
We have been supporting --packages internally for more than 2 years, and restricted --packages for ad-hoc testing in shell and notebook. As a typical use case, use starts spark-shell as part of the development process for easy debugging on pieces of the job logic, before assembling a whole prod job through a build tool. In this scenario, supporting a powerful --package similar using to a build tool were a requirement especially for us(a java company). In our experience, the four options we added have covered all the corner case so far. My thoughts are mostly aligned with @HeartSaVioR, while in addition, I think both the 4 options we proposed are 'good for completeness of this feature'. |
|
I'm not strongly against this, as the extra complexity isn't high. Mostly I'm concerned that we'll continue to find corner cases and exceptions in this 'support', when this was never meant to be a full package manager. Why Ivy? I am guessing because that's what Maven uses, or used, and we wanted to align the behavior as much as possible. |
Why I said Aether vs Ivy is that Aether has been much closer for Maven, and Maven pulls Aether into its codebase: it was EPLv1 and now it renamed to https://github.com/apache/maven/tree/master/maven-artifact |
|
Maven used Ivy in the past, right? probably at the time this was implemented. |
|
Honestly don't know about the past, and looks like Maven has been providing both as plugins. Maybe not that important. I just feel that we don't have to know how to represent Maven things in Ivy API, like special care for |
|
Can one of the admins verify this patch? |
|
We're closing this PR because it hasn't been updated in a while. This isn't a judgement on the merit of the PR in any way. It's just a way of keeping the PR queue manageable. |
What changes were proposed in this pull request?
We should extend Spark's -package option to support:
New artifact spec to be reviewed:
Coordinates are split by ','
Each coordinate should be provided in the format
groupId:artifactId:version?param1=value1\¶m2\&value2:..orgroupId/artifactId:version?param1=value1\¶m2=value2:..Param splitter needs to be escaped as & , or entire coordinates string needs to be enclosed in double quotes, since & is the background process char in cli.
Optional params are 'classifier', 'transitive', 'exclude', 'conf':
classifier: classifier of the artifact
transitive: whether to resolve transitive deps for the artifact
exlude: exclude list of transitive artifacts for this artifact(e.g. "a#b#c")
conf: the ivy conf of the artifact
We have tested this patch internally and it greatly increases the flexibility when user uses -packages option
How was this patch tested?
added unit test