-
Notifications
You must be signed in to change notification settings - Fork 332
Package polaris client as python package #2049
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
|
@eric-maynard / @HonahX may I have a review here? This will get us a proper packaging for polaris client as well as to help us to remove dependency on the bash script (and use it only for repair or initial setup) |
|
Thanks @MonkeyCanCode for working on this! This is a great step towards polaris client packaging and publication. One small question to confirm, at this stage users will still need to run I once had an experimental PR to use poetry's custom build file to auto-generate the open api client code when building the package. But I did not have time to fix all the CI for that PR. Do you think that could be the next step of the python packaging? |
So the bash script polaris already does it. Also, I added those to the new Makefile PR for adding python client related operations in. But yes, if that is more preferred, I can update poetry to do so as opposed to using an external bash script for those. But ideally, people should be using pip install Polaris from a python repo instead. From the changes introduced in this PR, those generated files will be included. Let me know which route is more preferred. @HonahX |
I also chatted with Eric offline, and we both think it would be ideal to auto-generate those files as part of the packaging or installation, rather than relying on a separate script. Like you mentioned, the bash script isn’t very helpful for users who “just want to use the CLI” without having to download the entire codebase. Ideally, users should be able to pip install polaris with a plain clone of the repo, without running any one-time setup script first. I think we can merge this first as it already made a huge step to set CLI as the entrypoint |
* chore(deps): update dependency mypy to >=1.17, <=1.17.0 (apache#2114) * Spark 3.5.6 and Iceberg 1.9.1 (apache#1960) * Spark 3.5.6 and Iceberg 1.9.1 * Cleanup * Add `pathStyleAccess` to AwsStorageConfigInfo (apache#2012) * Add `pathStyleAccess` to AwsStorageConfigInfo This change allows configuring the "path-style" access mode in S3 clients (both in Polaris Servers and Iceberg REST Catalog API clients). This change is applicable both to AWS storage and to non-AWS S3-compatible storage (apache#1530). * Add TestFileIOFactory helper (apache#2105) * Add FileIOFactory.wrapExisting helper * fix(deps): update dependency gradle.plugin.org.jetbrains.gradle.plugin.idea-ext:gradle-idea-ext to v1.2 (apache#2125) * fix(deps): update dependency boto3 to v1.39.7 (apache#2124) * Abstract polaris-runtime-service tests for all persistence implementations (apache#2106) The NoSQL persistence implementation has to run the Iceberg table & view catalog plus the Polaris specific tests as well. Reusing existing tests is beneficial to avoid a lot of code duplcation. This change moves the actual tests to `Abstract*` classes and refactors the existing tests to extend those. The NoSQL persistence work extends the same `Abstract*` classes but runs with different Quarkus test profiles. * Add IMPLICIT authentication support to the CLI (apache#2121) PRs apache#1925 and apache#1912 were merged around the same time. This PR connects the two changes and enables the CLI to accept IMPLICIT authentication type. Since Hadoop federated catalogs rely purely on IMPLICIT authentication, the CLI parsing test has been updated to reflect the same. * feat(helm): Add support for external authentication (apache#2104) * fix(deps): update dependency org.apache.iceberg:iceberg-bom to v1.9.2 (apache#2126) * fix(deps): update quarkus platform and group to v3.24.4 (apache#2128) * fix(deps): update dependency boto3 to v1.39.8 (apache#2129) * fix(deps): update dependency io.smallrye.config:smallrye-config-core to v3.13.3 (apache#2130) * Add newIcebergCatalog helper (apache#2134) creation of `IcebergCatalog` instances was quite redundant as tests mostly use the same parameters most of the time. also remove an unused field in 2 other tests. * Add server and client support for the new generic table `baseLocation` field (apache#2122) * Use Makefile to simplify setup and commands (apache#2027) * Use Makefile to simplify setup and commands * Add targets for minikube state management * Add podman support and spark plugin build * Add version target * Update README.md for Makefile usage and relation to the project * Fix nit * Package polaris client as python package (apache#2049) * Package polaris client as python package * Package polaris client as python package * Change owner to spark when copying files from local into Dockerfile * CI: Address failure from accessing GH API (apache#2132) CI sometimes fails with this failure: ``` * What went wrong: Execution failed for task ':generatePomFileForMavenPublication'. > Unable to process url: https://api.github.com/repos/apache/polaris/contributors?per_page=1000 ``` The sometimes failing request fetches the list of contributors to be published in the "root" POM. Unauthorized GH API requests have an hourly(?) limit of 60 requests per source IP. Authorized requests have a much higher rate limit. We do have a GitHub token available in every CI run, which can be used in GH API requests. This change adds the `Authorization` header for the failing GH API request to leverage the higher rate limit and let CI not fail (that often). * fix(deps): update dependency com.nimbusds:nimbus-jose-jwt to v10.4 (apache#2139) * fix(deps): update dependency com.diffplug.spotless:spotless-plugin-gradle to v7.2.0 (apache#2142) * fix(deps): update dependency software.amazon.awssdk:bom to v2.32.4 (apache#2146) * fix(deps): update dependency org.xerial.snappy:snappy-java to v1.1.10.8 (apache#2138) * fix(deps): update dependency org.junit:junit-bom to v5.13.4 (apache#2147) * fix(deps): update dependency boto3 to v1.39.9 (apache#2137) * fix(deps): update dependency com.fasterxml.jackson:jackson-bom to v2.19.2 (apache#2136) * Python client: add support for endpoint, sts-endpoint, path-style-access (apache#2127) This change adds support for endpoint, sts-endpoint, path-style-access to the Polaris Python client. Amends apache#1913 and apache#2012 * Remove PolarisEntityManager.getCredentialCache (apache#2133) `PolarisEntityManager` itself is not using the `StorageCredentialCache` but just hands it out via `getCredentialCache`. the only caller of `getCredentialCache` is `FileIOUtil.refreshAccessConfig`, which in in turn is only called by `DefaultFileIOFactory` and `IcebergCatalog`. note that in a follow-up we will likely be able to remove `PolarisEntityManager` usage completely from `IcebergCatalog`. additional cleanups: - use `StorageCredentialCache` injection in tests (but we need to invalidate all entries on test start) - remove unused `UserSecretsManagerFactory` from `PolarisCallContextCatalogFactory` * chore(deps): update registry.access.redhat.com/ubi9/openjdk-21-runtime docker tag to v1.22-1.1752676419 (apache#2150) * fix(deps): update dependency com.diffplug.spotless:spotless-plugin-gradle to v7.2.1 (apache#2152) * fix(deps): update dependency boto3 to v1.39.10 (apache#2151) * chore: fix class reference in the javadoc of TableLikeEntity (apache#2157) * fix(deps): update dependency commons-codec:commons-codec to v1.19.0 (apache#2160) * fix(deps): update dependency boto3 to v1.39.11 (apache#2159) * Last merged commit 395459f --------- Co-authored-by: Mend Renovate <[email protected]> Co-authored-by: Yong Zheng <[email protected]> Co-authored-by: Dmitri Bourlatchkov <[email protected]> Co-authored-by: Christopher Lambert <[email protected]> Co-authored-by: Pooja Nilangekar <[email protected]> Co-authored-by: Alexandre Dutra <[email protected]> Co-authored-by: Yun Zou <[email protected]>
Currently we are using
<git_root_dir>/polarisas the entrypoint to interact with polaris CLI. This is a okay for local usage but it is not ideal for others who doesn't want to downlaod the entire code base to just use the CLI as well as making polaris client distributable as part of release. To address this issue, we will need to public polaris as an CLI without depends on the bash script.This PR addressed this problem with following changes:
polaris_cli.pyas that will be the entrypoint for us to useclient/python/polaisis auto generated from OpenAPI andpoetry.core.masonry.apidoe build backend (withpoetry.core.masonry.apias build backend, it only includes files that are tracked by git. in this case, then auto generated files won't be track without including adding an include clause)polariswithin bash scriptpolarisHere is a quick demo on the packaging:
setup poetry then perform a poetry build (which we can use later to distribute the client code as well as push to Python Package Index):
now assuming I am an user who doesn't want to download the code base but want to use the cli:
Here is a quick demo on the updated bash script:
By doing so, we really don't need bash script
polaris(at least the name is confusing with this change...instead, it should be client_setup.sh or something similar) and end-users can do regular pip install to setup Polaris CLI. To avoid too much change, I will keep the bash scriptpolarisfor now (this script is still useful for initial setup as well as refresh dependencies etc. but with proper Python Package Index, we should just do pip upgrade instead).