Skip to content

Conversation

@gh-yzou
Copy link
Contributor

@gh-yzou gh-yzou commented Jun 11, 2025

We previously added a special check in PublishingHelperPlugin.kt to check specifically for the jar job for polaris-spark project, and publish the artifact output for the ShadowJar Task added. However, we already have a shadowJar infra that takes care of the maven publish.
In this PR, we switch to reuse the shardowJar Infra, and reverted the change we added before.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

testJar??

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

sorry for the confusion, that is a bad name. I am just referring the original default jar job, updated the classifier to "defaultJar" to be more clear.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

What is it when we do not override it to null?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

the original name is something like polaris-spark-3.5_2.12-0.11.0-beta-incubating-SNAPSHOT-bundle.jar, we did this because the jar without classifier is taken by the default jar job with name polaris-spark-3.5_2.12-0.11.0-beta-incubating-SNAPSHOT.jar. However, Spark does not support using classifier in the package config, so we make this jar the jar for this project, since this jar is the actual jar needed by spark, i think it actually should be the jar project without any classifier

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes, I understand the intent :) My question is about the need to set archiveClassifier to null... Do we have to use null here?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Oh, sorry, we don't have to, the default is null. i was putting it there to be clear, and I can remove it if preferred, but I think it might be better to be more explicit in the code.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

From my POV removing the assignment is preferable since the value is the same as default.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'd prefer to have a comment about adding a classifier to the jar task instead.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

sg! i removed the specification of the classifier, and added a comment at the place where i added the classifier for the jar task

Copy link
Contributor Author

@gh-yzou gh-yzou Jun 12, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@dimas-b after switch to use the shadowJarPlugin, i need to specify the classifier here, otherwise, it seems configuring to generate a jar with classifier "all", but I was also able to get rid of the other jar change

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

actually, sorry, it seems still needed, my previous gradlew build seems coming from cache. Added it back!

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

sgtm

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why not remove the plain jar artifact from this module completely?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I tried that before, however, it seems the task test depends on the jar job in the default configuration. i tried to switch the test task to depends on the createPolarisSparkJar, but because that jar job did a relocation of module com.fasterxml, one of our test fails the deserialization test, because it is now looking for the shaded one, not the original one.
So far I haven't found a good solution yet, so I kept the original jar. wondering if you got some better solutions for this problem?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for the detailed analysis, @gh-yzou ! Unfortunately, I do not have a better solution off the top of my head.

Copy link
Contributor

@dimas-b dimas-b Jun 11, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

How about using the internal classifier for this jar? I suppose it is not meant for reuse.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

yes, it is not intended for reuse. The name "internal" make sense to me, upated

@gh-yzou gh-yzou force-pushed the yzou-test-plugin branch from 00ca1b7 to b6f25a7 Compare June 11, 2025 23:09
dimas-b
dimas-b previously approved these changes Jun 11, 2025
Copy link
Contributor

@dimas-b dimas-b left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM 👍 Thanks, @gh-yzou !

@github-project-automation github-project-automation bot moved this from PRs In Progress to Ready to merge in Basic Kanban Board Jun 11, 2025
@gh-yzou
Copy link
Contributor Author

gh-yzou commented Jun 11, 2025

@dimas-b i think you asked a question somewhere, but it doesn't show up in the PR for some reason. For the artifact, i don't think we have "client" in the artifact name, the iceberg one is called iceberg-spark-runtime-xxxx.jar, and our polaris one is called polaris-spark-xxx.jar. For iceberg, i guess the reason is that iceberg-spark is already taken by another projects, but i don't think we need to be exactly the same as iceberg.
Some of the doc description might introduce some confusion, i went one more pass to make sure the description are more consistent.

@dimas-b
Copy link
Contributor

dimas-b commented Jun 11, 2025

Re: polaris-spark-xxx.jar. it is not really related to this PR :)

I value short jar names, but at the same time it might be worth clarifying whether this jar applies to the whole of Polaris integration with Spark or just to Generic Tables.

In other words, do we foresee making any other Polaris jars to be put on the Spark class path?

If no, the current name is fine from my POV, if yes, let's discuss that naming convention on the dev ML (since it's not about this build change really).

@gh-yzou gh-yzou force-pushed the yzou-test-plugin branch from 4fecc92 to 841bcc4 Compare June 18, 2025 18:30
Copy link
Contributor

@dimas-b dimas-b left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Changes LGTM, but I believe the PR description is a bit off WRT actual changes now 🤔 WDYT?

@gh-yzou
Copy link
Contributor Author

gh-yzou commented Jun 18, 2025

@dimas-b sorry, i updated the title, but forgot to update the description, also updated the description

@gh-yzou gh-yzou merged commit 1f7f127 into apache:main Jun 18, 2025
12 checks passed
@github-project-automation github-project-automation bot moved this from Ready to merge to Done in Basic Kanban Board Jun 18, 2025
flyrain pushed a commit that referenced this pull request Jun 18, 2025
* fix spark client

* fix test failure and address feedback

* fix error

* update regression test

* update classifier name

* address comment

* add change

* update doc

* update build and readme

* add back jr

* udpate dependency

* add change

* update

* update tests

* remove merge service file

* update readme

* update readme
gh-yzou added a commit to gh-yzou/polaris that referenced this pull request Jun 21, 2025
eric-maynard pushed a commit that referenced this pull request Jun 22, 2025
)" (#1921)

…857)"

This reverts commit 1f7f127.

The shadowJar plugin actually stops publish the original jar, which is not what spark client intend to publish for the --package usage. 

Revert it for now, will follow up with a better way to reuse the shadow jar plugin, likely with a separate bundle project
gh-yzou added a commit to gh-yzou/polaris that referenced this pull request Jun 23, 2025
gh-yzou added a commit to gh-yzou/polaris that referenced this pull request Jun 23, 2025
* fix spark client

* fix test failure and address feedback

* fix error

* update regression test

* update classifier name

* address comment

* add change

* update doc

* update build and readme

* add back jr

* udpate dependency

* add change

* update

* update tests

* remove merge service file

* update readme

* update readme
gh-yzou added a commit to gh-yzou/polaris that referenced this pull request Jun 23, 2025
gh-yzou added a commit that referenced this pull request Jun 23, 2025
…untime to avoid spark compatibilities issue (#1908)

* add change

* add comment

* update change

* add comment

* add change

* add tests

* add comment

* clean up style check

* update build

* Revert "Reuse shadowJar for spark client bundle jar maven publish (#1857)"

This reverts commit 1f7f127.

* Reuse shadowJar for spark client bundle jar maven publish (#1857)

* fix spark client

* fix test failure and address feedback

* fix error

* update regression test

* update classifier name

* address comment

* add change

* update doc

* update build and readme

* add back jr

* udpate dependency

* add change

* update

* update tests

* remove merge service file

* update readme

* update readme

* update checkstyl

* rebase with main

* Revert "Reuse shadowJar for spark client bundle jar maven publish (#1857)"

This reverts commit 40f4d36.

* update checkstyle

* revert change

* address comments

* trigger tests
flyrain pushed a commit that referenced this pull request Jun 23, 2025
)" (#1921)

…857)"

This reverts commit 1f7f127.

The shadowJar plugin actually stops publish the original jar, which is not what spark client intend to publish for the --package usage. 

Revert it for now, will follow up with a better way to reuse the shadow jar plugin, likely with a separate bundle project
flyrain pushed a commit that referenced this pull request Jun 23, 2025
…untime to avoid spark compatibilities issue (#1908)

* add change

* add comment

* update change

* add comment

* add change

* add tests

* add comment

* clean up style check

* update build

* Revert "Reuse shadowJar for spark client bundle jar maven publish (#1857)"

This reverts commit 1f7f127.

* Reuse shadowJar for spark client bundle jar maven publish (#1857)

* fix spark client

* fix test failure and address feedback

* fix error

* update regression test

* update classifier name

* address comment

* add change

* update doc

* update build and readme

* add back jr

* udpate dependency

* add change

* update

* update tests

* remove merge service file

* update readme

* update readme

* update checkstyl

* rebase with main

* Revert "Reuse shadowJar for spark client bundle jar maven publish (#1857)"

This reverts commit 40f4d36.

* update checkstyle

* revert change

* address comments

* trigger tests
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

6 participants