-
Notifications
You must be signed in to change notification settings - Fork 332
Update spark client to use the shaded iceberg-core in iceberg-spark-runtime to avoid spark compatibilities issue #1908
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This change is fine from my POV, but it does not seem to match the PR title and description. Please mention this refactoring there.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
because the classes used by client and server now are different in terms of the import (server uses the regular import, but client uses the shaded one coming from iceberg). So the implementation of the class at client side are now different compare with server side, and the client side is now suppose to use the one provided at client side. This change is to use the class provided by the client side, which will be auto generated in the coming task.
Move on, the only contract client have with server will be the API spec, which i think is the correct thing to do.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I added a comment here, and also updated the description to contain more details
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
+1 to the TODO, but as of now this class is checked in to the source repo, not generated at build time 🤔
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
yes, this class requires a different code generation script because the code looks different compare with the one generated by server side in terms of the import. where the server side uses
import com.fasterxml.jackson.annotation.JsonCreator
but we will need to use
import org.apache.iceberg.shaded.com.fasterxml.jackson.annotation.JsonCreator;
at client side to enable reuse of iceberg rest client correctly, because the RESTClient shipped in the iceberg-spark-runtime has the jackson library shaded. i will have to add new generation script and build file to make sure things can be generated properly, and I have an issue to track the improvement #1909
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I mean: do we need the @Generated annotation here? The file appears to be modified after generation and manually committed to git 🤔
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
i see. yes, we do not need the @generated annotation here, removed.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Nit:
| /** | |
| * That class provides rest client that is can be used to talk to Polaris Management service and | |
| * auth token endpoint. This class is currently used by Spark Client tests for commands that can not | |
| * be issued through spark command, such as createCatalog etc. | |
| */ | |
| /** | |
| * This class provides a REST client for the Polaris Management service endpoints and its auth-token endpoint, which used in | |
| * Spark client tests to run commands that Spark SQL can’t issue directly (e.g., createCatalog). | |
| */ |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
updated
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
+1 this provides more flexibility on the client side by minimizing dependencies, as it only depends on the Polaris REST spec now.
flyrain
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks a lot @gh-yzou for the fix! LGTM!
|
A minor suggestion, would you mind sharing the error msg/stack trace of the issue the PR trying to fix? This will provide more context people who track this issue. |
|
I raised some concerns about this PR on the dev ML: https://lists.apache.org/thread/0z30f3cfvm41hxlbxgp4fqdpv7mfgnv8 Let's resolve that discussion before merging. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'm not sure this approach is ideal for making exceptions to Checkstyle rules.
@adutra : WDYT?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
switched to use suppress rule now
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
cc @dimas-b, it doesn't depend on iceberg-core anymore, not leaking of transitive dependencies.
|
@flyrain i added the stacktrace in the PR description. |
codestyle/checkstyle.xml
Outdated
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Is this not the same as lines 41-42?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
it is not the same, actually line 41-42 is not doing anything, the config says look for config org.checkstyle.google.suppressionfilter.config, which i don't think we configure it anywhere in our project, if not configured, use file checkstyle-suppressions.xml, which we do not have it anywhere in the project also, the check style job doesn't fail, because it sets optional to true.
Now since we do have the suppressions file, i made it non-optional, and removed the other configuration. One thing i did notice here is that if i just use the relative path regarding to root project path, gradlew build seems not able to found the path, the recommended way seems using the config, and then get the absolute path, which is the way i am going with now
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
i moved the suppress file to the polaris-spark project, and reuses the original configuration
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Is it possible to keep this file under the Spark module as opposed to the global codestyle dir?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Actually, i think it make sense to have it at the top level, in the future, if there are other suppress rule we want to add, they can all be added to this file, and we will have centralized file to mange all checkstyle and suppress rule, which i think can make things much easier to mange.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Agree with @dimas-b , let's not add module specifics here globally - that does affect all modules
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Moved the suppress config to spark client side.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
thx!
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Please do not add a project specific change to all projects?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
moved to project specific
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Agree with @dimas-b , let's not add module specifics here globally - that does affect all modules
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Why have two Jackson in tests - the relocated one and this?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
That one is used by a polaris client test utility that is used to talk to the polaris management API for test setup, such as createCatalog. This test utility has nothing to do with Iceberg, and is not suppose to rely on the iceberg spark client library.
…ache#1857)" This reverts commit 1f7f127.
* fix spark client * fix test failure and address feedback * fix error * update regression test * update classifier name * address comment * add change * update doc * update build and readme * add back jr * udpate dependency * add change * update * update tests * remove merge service file * update readme * update readme
…ache#1857)" This reverts commit 40f4d36.
flyrain
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks for keeping working on it, @gh-yzou !
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I still think that compiling against shaded jackson classes is sub-optimal and can be avoided, but I will not hold this PR because of that. Other changes LGTM.
…untime to avoid spark compatibilities issue (#1908) * add change * add comment * update change * add comment * add change * add tests * add comment * clean up style check * update build * Revert "Reuse shadowJar for spark client bundle jar maven publish (#1857)" This reverts commit 1f7f127. * Reuse shadowJar for spark client bundle jar maven publish (#1857) * fix spark client * fix test failure and address feedback * fix error * update regression test * update classifier name * address comment * add change * update doc * update build and readme * add back jr * udpate dependency * add change * update * update tests * remove merge service file * update readme * update readme * update checkstyl * rebase with main * Revert "Reuse shadowJar for spark client bundle jar maven publish (#1857)" This reverts commit 40f4d36. * update checkstyle * revert change * address comments * trigger tests
We run into an issue when testing the spark client using --packages (the jar path works correctly), where the iceberg requires avro 1.12.0, but the one provide by spark is 1.11.4. Even though the dependency is downloaded, the one used is still the one provided by spark. There for we see error like following when do select * from iceberg_tb:
A quick mitigation is to remove the default one used by spark. However, to avoid future conflict with other library, we switch to use the shaded library shipped in iceberg-spark-runtime.
The major change is that instead using com.fasterxml.jackson.annotation, we uses org.apache.iceberg.shaded.com.fasterxml.jackson.annotation for all request and response to ensure they can be serialized and deserialized correctly by Iceberg RESTClient.
Since the classes now looks different than the one used at server side now, we actually let client uses its own request and response classes. Those classes are added manually at this moment, will follow up to autogenerate them based on spec directly #1909
Since we are using all libraries shipped along with the iceberg-spark-runtime, it simplifies the build for client significantly