-
Notifications
You must be signed in to change notification settings - Fork 331
Reuse shadowJar for spark client bundle jar maven publish #1857
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
testJar??
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
sorry for the confusion, that is a bad name. I am just referring the original default jar job, updated the classifier to "defaultJar" to be more clear.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
What is it when we do not override it to null?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
the original name is something like polaris-spark-3.5_2.12-0.11.0-beta-incubating-SNAPSHOT-bundle.jar, we did this because the jar without classifier is taken by the default jar job with name polaris-spark-3.5_2.12-0.11.0-beta-incubating-SNAPSHOT.jar. However, Spark does not support using classifier in the package config, so we make this jar the jar for this project, since this jar is the actual jar needed by spark, i think it actually should be the jar project without any classifier
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yes, I understand the intent :) My question is about the need to set archiveClassifier to null... Do we have to use null here?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Oh, sorry, we don't have to, the default is null. i was putting it there to be clear, and I can remove it if preferred, but I think it might be better to be more explicit in the code.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
From my POV removing the assignment is preferable since the value is the same as default.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'd prefer to have a comment about adding a classifier to the jar task instead.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
sg! i removed the specification of the classifier, and added a comment at the place where i added the classifier for the jar task
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@dimas-b after switch to use the shadowJarPlugin, i need to specify the classifier here, otherwise, it seems configuring to generate a jar with classifier "all", but I was also able to get rid of the other jar change
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
actually, sorry, it seems still needed, my previous gradlew build seems coming from cache. Added it back!
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
sgtm
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Why not remove the plain jar artifact from this module completely?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I tried that before, however, it seems the task test depends on the jar job in the default configuration. i tried to switch the test task to depends on the createPolarisSparkJar, but because that jar job did a relocation of module com.fasterxml, one of our test fails the deserialization test, because it is now looking for the shaded one, not the original one.
So far I haven't found a good solution yet, so I kept the original jar. wondering if you got some better solutions for this problem?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks for the detailed analysis, @gh-yzou ! Unfortunately, I do not have a better solution off the top of my head.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
How about using the internal classifier for this jar? I suppose it is not meant for reuse.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
yes, it is not intended for reuse. The name "internal" make sense to me, upated
00ca1b7 to
b6f25a7
Compare
dimas-b
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM 👍 Thanks, @gh-yzou !
|
@dimas-b i think you asked a question somewhere, but it doesn't show up in the PR for some reason. For the artifact, i don't think we have "client" in the artifact name, the iceberg one is called iceberg-spark-runtime-xxxx.jar, and our polaris one is called polaris-spark-xxx.jar. For iceberg, i guess the reason is that iceberg-spark is already taken by another projects, but i don't think we need to be exactly the same as iceberg. |
|
Re: I value short jar names, but at the same time it might be worth clarifying whether this jar applies to the whole of Polaris integration with Spark or just to Generic Tables. In other words, do we foresee making any other Polaris jars to be put on the Spark class path? If no, the current name is fine from my POV, if yes, let's discuss that naming convention on the |
4fecc92 to
841bcc4
Compare
dimas-b
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Changes LGTM, but I believe the PR description is a bit off WRT actual changes now 🤔 WDYT?
|
@dimas-b sorry, i updated the title, but forgot to update the description, also updated the description |
* fix spark client * fix test failure and address feedback * fix error * update regression test * update classifier name * address comment * add change * update doc * update build and readme * add back jr * udpate dependency * add change * update * update tests * remove merge service file * update readme * update readme
…ache#1857)" This reverts commit 1f7f127.
)" (#1921) …857)" This reverts commit 1f7f127. The shadowJar plugin actually stops publish the original jar, which is not what spark client intend to publish for the --package usage. Revert it for now, will follow up with a better way to reuse the shadow jar plugin, likely with a separate bundle project
…ache#1857)" This reverts commit 1f7f127.
* fix spark client * fix test failure and address feedback * fix error * update regression test * update classifier name * address comment * add change * update doc * update build and readme * add back jr * udpate dependency * add change * update * update tests * remove merge service file * update readme * update readme
…ache#1857)" This reverts commit 40f4d36.
…untime to avoid spark compatibilities issue (#1908) * add change * add comment * update change * add comment * add change * add tests * add comment * clean up style check * update build * Revert "Reuse shadowJar for spark client bundle jar maven publish (#1857)" This reverts commit 1f7f127. * Reuse shadowJar for spark client bundle jar maven publish (#1857) * fix spark client * fix test failure and address feedback * fix error * update regression test * update classifier name * address comment * add change * update doc * update build and readme * add back jr * udpate dependency * add change * update * update tests * remove merge service file * update readme * update readme * update checkstyl * rebase with main * Revert "Reuse shadowJar for spark client bundle jar maven publish (#1857)" This reverts commit 40f4d36. * update checkstyle * revert change * address comments * trigger tests
)" (#1921) …857)" This reverts commit 1f7f127. The shadowJar plugin actually stops publish the original jar, which is not what spark client intend to publish for the --package usage. Revert it for now, will follow up with a better way to reuse the shadow jar plugin, likely with a separate bundle project
…untime to avoid spark compatibilities issue (#1908) * add change * add comment * update change * add comment * add change * add tests * add comment * clean up style check * update build * Revert "Reuse shadowJar for spark client bundle jar maven publish (#1857)" This reverts commit 1f7f127. * Reuse shadowJar for spark client bundle jar maven publish (#1857) * fix spark client * fix test failure and address feedback * fix error * update regression test * update classifier name * address comment * add change * update doc * update build and readme * add back jr * udpate dependency * add change * update * update tests * remove merge service file * update readme * update readme * update checkstyl * rebase with main * Revert "Reuse shadowJar for spark client bundle jar maven publish (#1857)" This reverts commit 40f4d36. * update checkstyle * revert change * address comments * trigger tests
We previously added a special check in PublishingHelperPlugin.kt to check specifically for the jar job for polaris-spark project, and publish the artifact output for the ShadowJar Task added. However, we already have a shadowJar infra that takes care of the maven publish.
In this PR, we switch to reuse the shardowJar Infra, and reverted the change we added before.