diff --git a/1.2.0/_index.md b/1.2.0/_index.md new file mode 100644 index 0000000000..a55fc149e6 --- /dev/null +++ b/1.2.0/_index.md @@ -0,0 +1,186 @@ +--- +# +# Licensed to the Apache Software Foundation (ASF) under one +# or more contributor license agreements. See the NOTICE file +# distributed with this work for additional information +# regarding copyright ownership. The ASF licenses this file +# to you under the Apache License, Version 2.0 (the +# "License"); you may not use this file except in compliance +# with the License. You may obtain a copy of the License at +# +# http://www.apache.org/licenses/LICENSE-2.0 +# +# Unless required by applicable law or agreed to in writing, +# software distributed under the License is distributed on an +# "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY +# KIND, either express or implied. See the License for the +# specific language governing permissions and limitations +# under the License. +# +linkTitle: 'In Development' +title: 'Overview' +type: docs +weight: 200 +params: + top_hidden: true + show_page_toc: false +cascade: + type: docs + params: + show_page_toc: true +# This file will NOT be copied into a new release's versioned docs folder. +--- + +{{< alert warning >}} +These pages refer to the current state of the main branch, which is still under active development. + +Functionalities can be changed, removed or added without prior notice. +{{< /alert >}} + +Apache Polaris (Incubating) is a catalog implementation for Apache Iceberg™ tables and is built on the open source Apache Iceberg™ REST protocol. + +With Polaris, you can provide centralized, secure read and write access to your Iceberg tables across different REST-compatible query engines. + +![Conceptual diagram of Apache Polaris (Incubating).](/img/overview.svg "Apache Polaris (Incubating) overview") + +## Key concepts + +This section introduces key concepts associated with using Apache Polaris (Incubating). + +In the following diagram, a sample [Apache Polaris (Incubating) structure](#catalog) with nested [namespaces](#namespace) is shown for Catalog1. No tables +or namespaces have been created yet for Catalog2 or Catalog3. + +![Diagram that shows an example Apache Polaris (Incubating) structure.](/img/sample-catalog-structure.svg "Sample Apache Polaris (Incubating) structure") + +### Catalog + +In Polaris, you can create one or more catalog resources to organize Iceberg tables. + +Configure your catalog by setting values in the storage configuration for S3, Azure, or Google Cloud Storage. An Iceberg catalog enables a +query engine to manage and organize tables. The catalog forms the first architectural layer in the [Apache Iceberg™ table specification](https://iceberg.apache.org/spec/#overview) and must support the following tasks: + +- Storing the current metadata pointer for one or more Iceberg tables. A metadata pointer maps a table name to the location of that table's + current metadata file. + +- Performing atomic operations so that you can update the current metadata pointer for a table to the metadata pointer of a new version of + the table. + +To learn more about Iceberg REST catalogs, see the [Apache Iceberg™ REST catalog specification](https://iceberg.apache.org/rest-catalog-spec/). + +#### Catalog types + +A catalog can be one of the following two types: + +- Internal: The catalog is managed by Polaris. Tables from this catalog can be read and written in Polaris. + +- External: The catalog is externally managed by another Iceberg catalog provider (for example, Snowflake, Glue, Dremio Arctic). Tables from + this catalog are synced to Polaris. These tables are read-only in Polaris. + +A catalog is configured with a storage configuration that can point to S3, Azure storage, or GCS. + +### Namespace + +You create *namespaces* to logically group Iceberg tables within a catalog. A catalog can have multiple namespaces. You can also create +nested namespaces. Iceberg tables belong to namespaces. + +{{< alert important >}} +For the access privileges defined for a catalog to be enforced correctly, the following conditions must be met: + +- The directory only contains the data files that belong to a single table. +- The directory hierarchy matches the namespace hierarchy for the catalog. + +For example, if a catalog includes the following items: + +- Top-level namespace namespace1 +- Nested namespace namespace1a +- A customers table, which is grouped under nested namespace namespace1a +- An orders table, which is grouped under nested namespace namespace1a + +The directory hierarchy for the catalog must follow this structure: + +- /namespace1/namespace1a/customers/ +- /namespace1/namespace1a/orders/ +{{< /alert >}} + +### Storage configuration + +A storage configuration stores a generated identity and access management (IAM) entity for your cloud storage and is created +when you create a catalog. The storage configuration is used to set the values to connect Polaris to your cloud storage. During the +catalog creation process, an IAM entity is generated and used to create a trust relationship between the cloud storage provider and Polaris +Catalog. + +When you create a catalog, you supply the following information about your cloud storage: + +| Cloud storage provider | Information | +| -----------------------| ----------- | +| Amazon S3 | | +| Google Cloud Storage (GCS) | | +| Azure | | + +## Example workflow + +In the following example workflow, Bob creates an Apache Iceberg™ table named Table1 and Alice reads data from Table1. + +1. Bob uses Apache Spark™ to create the Table1 table under the + Namespace1 namespace in the Catalog1 catalog and insert values into + Table1. + + Bob can create Table1 and insert data into it because he is using a + service connection with a service principal that has + the privileges to perform these actions. + +2. Alice uses Snowflake to read data from Table1. + + Alice can read data from Table1 because she is using a service + connection with a service principal with a catalog integration that + has the privileges to perform this action. Alice + creates an unmanaged table in Snowflake to read data from Table1. + +![Diagram that shows an example workflow for Apache Polaris (Incubating)](/img/example-workflow.svg "Example workflow for Apache Polaris (Incubating)") + +## Security and access control + +### Credential vending + +To secure interactions with service connections, Polaris vends temporary storage credentials to the query engine during query +execution. These credentials allow the query engine to run the query without requiring access to your cloud storage for +Iceberg tables. This process is called credential vending. + +As of now, the following limitation is known regarding Apache Iceberg support: + +- **remove_orphan_files:** Apache Spark can't use credential vending + for this due to a known issue. See [apache/iceberg#7914](https://github.com/apache/iceberg/pull/7914) for details. + +### Identity and access management (IAM) + +Polaris uses the identity and access management (IAM) entity to securely connect to your storage for accessing table data, Iceberg +metadata, and manifest files that store the table schema, partitions, and other metadata. Polaris retains the IAM entity for your +storage location. + +### Access control + +Polaris enforces the access control that you configure across all tables registered with the service and governs security for all +queries from query engines in a consistent manner. + +Polaris uses a role-based access control (RBAC) model that lets you centrally configure access for Polaris service principals to catalogs, +namespaces, and tables. + +Polaris RBAC uses two different role types to delegate privileges: + +- **Principal roles:** Granted to Polaris service principals and + analogous to roles in other access control systems that you grant to + service principals. + +- **Catalog roles:** Configured with certain privileges on Polaris + catalog resources and granted to principal roles. + +For more information, see [Access control]({{% ref "managing-security/access-control" %}}). + +## Legal Notices + +Apache®, Apache Iceberg™, Apache Spark™, Apache Flink®, and Flink® are either registered trademarks or trademarks of the Apache Software Foundation in the United States and/or other countries. + + + diff --git a/1.2.0/admin-tool.md b/1.2.0/admin-tool.md new file mode 100644 index 0000000000..accfdcd525 --- /dev/null +++ b/1.2.0/admin-tool.md @@ -0,0 +1,143 @@ +--- +# +# Licensed to the Apache Software Foundation (ASF) under one +# or more contributor license agreements. See the NOTICE file +# distributed with this work for additional information +# regarding copyright ownership. The ASF licenses this file +# to you under the Apache License, Version 2.0 (the +# "License"); you may not use this file except in compliance +# with the License. You may obtain a copy of the License at +# +# http://www.apache.org/licenses/LICENSE-2.0 +# +# Unless required by applicable law or agreed to in writing, +# software distributed under the License is distributed on an +# "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY +# KIND, either express or implied. See the License for the +# specific language governing permissions and limitations +# under the License. +# +title: Admin Tool +type: docs +weight: 300 +--- + +Polaris includes a tool for administrators to manage the metastore. + +The tool must be built with the necessary JDBC drivers to access the metastore database. For +example, to build the tool with support for Postgres, run the following: + +```shell +./gradlew \ + :polaris-admin:assemble \ + :polaris-admin:quarkusAppPartsBuild --rerun \ + -Dquarkus.container-image.build=true +``` + +The above command will generate: + +- One Fast-JAR in `runtime/admin/build/quarkus-app/quarkus-run.jar` +- Two Docker images named `apache/polaris-admin-tool:latest` and `apache/polaris-admin-tool:` + +## Usage + +Please make sure the admin tool and Polaris server are with the same version before using it. +To run the standalone JAR, use the following command: + +```shell +java -jar runtime/admin/build/quarkus-app/quarkus-run.jar --help +``` + +To run the Docker image, use the following command: + +```shell +docker run apache/polaris-admin-tool:latest --help +``` + +The basic usage of the Polaris Admin Tool is outlined below: + +``` +Usage: polaris-admin-runner.jar [-hV] [COMMAND] +Polaris Admin Tool + -h, --help Show this help message and exit. + -V, --version Print version information and exit. +Commands: + help Display help information about the specified command. + bootstrap Bootstraps realms and principal credentials. + purge Purge principal credentials. +``` + +## Configuration + +The Polaris Admin Tool must be executed with the same configuration as the Polaris server. The +configuration can be done via environment variables or system properties. + +At a minimum, it is necessary to configure the Polaris Admin Tool to connect to the same database +used by the Polaris server. + +See the [metastore documentation]({{% ref "metastores" %}}) for more information on configuring the +database connection. + +Note: Polaris will always create schema 'polaris_schema' during bootstrap under the configured database. + +## Bootstrapping Realms and Principal Credentials + +The `bootstrap` command is used to bootstrap realms and create the necessary principal credentials +for the Polaris server. This command is idempotent and can be run multiple times without causing any +issues. If a realm is already bootstrapped, running the `bootstrap` command again will not have any +effect on that realm. + +```shell +java -jar runtime/admin/build/quarkus-app/quarkus-run.jar bootstrap --help +``` + +The basic usage of the `bootstrap` command is outlined below: + +``` +Usage: polaris-admin-runner.jar bootstrap [-hV] [-c=]... -r= [-r=]... +Bootstraps realms and root principal credentials. + -c, --credential= + Root principal credentials to bootstrap. Must be of the form + 'realm,clientId,clientSecret'. + -h, --help Show this help message and exit. + -r, --realm= The name of a realm to bootstrap. + -V, --version Print version information and exit. +``` + +For example, to bootstrap the `realm1` realm and create its root principal credential with the +client ID `admin` and client secret `admin`, you can run the following command: + +```shell +java -jar runtime/admin/build/quarkus-app/quarkus-run.jar bootstrap -r realm1 -c realm1,admin,admin +``` + +## Purging Realms and Principal Credentials + +The `purge` command is used to remove realms and principal credentials from the Polaris server. + +{{< alert warning >}} +Running the `purge` command will remove all data associated with the specified realms! +This includes all entities (catalogs, namespaces, tables, views, roles), all principal +credentials, grants, and any other data associated with the realms. +{{< /alert >}} + +```shell +java -jar runtime/admin/build/quarkus-app/quarkus-run.jar purge --help +``` + +The basic usage of the `purge` command is outlined below: + +``` +Usage: polaris-admin-runner.jar purge [-hV] -r= [-r=]... +Purge realms and all associated entities. + -h, --help Show this help message and exit. + -r, --realm= The name of a realm to purge. + -V, --version Print version information and exit. +``` + +For example, to purge the `realm1` realm, you can run the following command: + +```shell +java -jar runtime/admin/build/quarkus-app/quarkus-run.jar purge -r realm1 +``` diff --git a/1.2.0/command-line-interface.md b/1.2.0/command-line-interface.md new file mode 100644 index 0000000000..c455ec80f2 --- /dev/null +++ b/1.2.0/command-line-interface.md @@ -0,0 +1,1477 @@ +--- +# +# Licensed to the Apache Software Foundation (ASF) under one +# or more contributor license agreements. See the NOTICE file +# distributed with this work for additional information +# regarding copyright ownership. The ASF licenses this file +# to you under the Apache License, Version 2.0 (the +# "License"); you may not use this file except in compliance +# with the License. You may obtain a copy of the License at +# +# http://www.apache.org/licenses/LICENSE-2.0 +# +# Unless required by applicable law or agreed to in writing, +# software distributed under the License is distributed on an +# "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY +# KIND, either express or implied. See the License for the +# specific language governing permissions and limitations +# under the License. +# +title: Command Line Interface +type: docs +weight: 300 +--- + +In order to help administrators quickly set up and manage their Polaris server, Polaris provides a simple command-line interface (CLI) for common tasks. + +The basic syntax of the Polaris CLI is outlined below: + +``` +polaris [options] COMMAND ... + +options: +--host +--port +--base-url +--client-id +--client-secret +--access-token +--realm +--header +--profile +--proxy +``` + +`COMMAND` must be one of the following: +1. catalogs +2. principals +3. principal-roles +4. catalog-roles +5. namespaces +6. privileges +7. profiles +8. policies +9. repair + +Each _command_ supports several _subcommands_, and some _subcommands_ have _actions_ that come after the subcommand in turn. Finally, _arguments_ follow to form a full invocation. Within a set of named arguments at the end of an invocation ordering is generally not important. Many invocations also have a required positional argument of the type that the _command_ refers to. Again, the ordering of this positional argument relative to named arguments is not important. + +Some example full invocations: + +``` +polaris principals list +polaris catalogs delete some_catalog_name +polaris catalogs update --property foo=bar some_other_catalog +polaris catalogs update another_catalog --property k=v +polaris privileges namespace grant --namespace some.schema --catalog fourth_catalog --catalog-role some_catalog_role TABLE_READ_DATA +polaris profiles list +polaris policies list --catalog some_catalog --namespace some.schema +polaris repair +``` + +### Authentication + +As outlined above, the Polaris CLI may take credentials using the `--client-id` and `--client-secret` options. For example: + +``` +polaris --client-id 4b5ed1ca908c3cc2 --client-secret 07ea8e4edefb9a9e57c247e8d1a4f51c principals ... +``` + +If `--client-id` and `--client-secret` are not provided, the Polaris CLI will try to read the client ID and client secret from environment variables called `CLIENT_ID` and `CLIENT_SECRET` respectively. If these flags are not provided and the environment variables are not set, the CLI will fail. + +Alternatively, the `--access-token` option can be used instead of `--client-id` and `--client-secret`, but both authentication methods cannot be used simultaneously. + +Additionally, the `--profile` option can be used to specify a saved profile instead of providing authentication details directly. If `--profile` is not provided, the CLI will check the `CLIENT_PROFILE` environment variable. Profiles store authentication details and connection settings, simplifying repeated CLI usage. + +If the `--host` and `--port` options are not provided, the CLI will default to communicating with `localhost:8181`. + +Alternatively, the `--base-url` option can be used instead of `--host` and `--port`, but both options cannot be used simultaneously. This allows specifying arbitrary Polaris URLs, including HTTPS ones, that have additional base prefixes before the `/api/*/v1` subpaths. + +If your Polaris server is configured to use a realm other than the default, you can use the `--realm` option to specify a realm. If `--realm` is not provided, the CLI will check the `REALM` environment variable. If neither is provided, the CLI will not send the realm context header. +Also, if your Polaris server uses a custom realm header name, you can use the `--header` option to specify it. If `--header` is not provided, the CLI will check the `HEADER` environment variable. If neither is provided, the CLI will use default header name `Polaris-Realm`. + +Read [here]({{% ref "./configuration.md" %}}) more about configuring polaris server to work with multiple realms. + +### PATH + +These examples assume the Polaris CLI is on the PATH and so can be invoked just by the command `polaris`. You can add the CLI to your PATH environment variable with a command like the following: + +``` +export PATH="~/polaris:$PATH" +``` + +Alternatively, you can run the CLI by providing a path to it, such as with the following invocation: + +``` +~/polaris principals list +``` + +## Commands + +Each of the commands `catalogs`, `principals`, `principal-roles`, `catalog-roles`, and `privileges` is used to manage a different type of entity within Polaris. + +In addition to these, the `profiles` command is available for managing stored authentication profiles, allowing login credentials to be configured for reuse. This provides an alternative to passing authentication details with every command. By default, profiles are stored in a `.polaris.json` file within the `~/.polaris` directory. The location of this directory can be overridden by setting the `POLARIS_HOME` environment variable. + +To find details on the options that can be provided to a particular command or subcommand ad-hoc, you may wish to use the `--help` flag. For example: + +``` +polaris catalogs --help +polaris principals create --help +polaris profiles --help +``` + +### catalogs + +The `catalogs` command is used to create, discover, and otherwise manage catalogs within Polaris. + +`catalogs` supports the following subcommands: + +1. create +2. delete +3. get +4. list +5. update + +#### create + +The `create` subcommand is used to create a catalog. + +``` +input: polaris catalogs create --help +options: + create + Named arguments: + --type The type of catalog to create in [INTERNAL, EXTERNAL]. INTERNAL by default. + --storage-type (Required) The type of storage to use for the catalog + --default-base-location (Required) Default base location of the catalog + --endpoint (Only for S3) The S3 endpoint to use when connecting to S3 + --endpoint-internal (Only for S3) The S3 endpoint used by Polaris to use when connecting to S3, if different from the one that clients use + --sts-endpoint (Only for S3) The STS endpoint to use when connecting to STS + --path-style-access (Only for S3) Whether to use path-style-access for S3 + --allowed-location An allowed location for files tracked by the catalog. Multiple locations can be provided by specifying this option more than once. + --role-arn (Only for AWS S3) A role ARN to use when connecting to S3 + --region (Only for S3) The region to use when connecting to S3 + --external-id (Only for S3) The external ID to use when connecting to S3 + --tenant-id (Required for Azure) A tenant ID to use when connecting to Azure Storage + --multi-tenant-app-name (Only for Azure) The app name to use when connecting to Azure Storage + --consent-url (Only for Azure) A consent URL granting permissions for the Azure Storage location + --service-account (Only for GCS) The service account to use when connecting to GCS + --property A key/value pair such as: tag=value. Multiple can be provided by specifying this option more than once + --catalog-connection-type The type of external catalog in [iceberg-rest, hadoop]. + --iceberg-remote-catalog-name The remote catalog name when federating to an Iceberg REST catalog + --hadoop-warehouse The warehouse to use when federating to a HADOOP catalog + --catalog-authentication-type The type of authentication in [OAUTH, BEARER, SIGV4, IMPLICIT] + --catalog-service-identity-type The type of service identity in [AWS_IAM] + --catalog-service-identity-iam-arn When using the AWS_IAM service identity type, this is the ARN of the IAM user or IAM role Polaris uses to assume roles and then access external resources. + --catalog-uri The URI of the external catalog + --catalog-token-uri (For authentication type OAUTH) Token server URI + --catalog-client-id (For authentication type OAUTH) oauth client id + --catalog-client-secret (For authentication type OAUTH) oauth client secret (input-only) + --catalog-client-scope (For authentication type OAUTH) oauth scopes to specify when exchanging for a short-lived access token. Multiple can be provided by specifying this option more than once + --catalog-bearer-token (For authentication type BEARER) Bearer token (input-only) + --catalog-role-arn (For authentication type SIGV4) The aws IAM role arn assumed by polaris userArn when signing requests + --catalog-role-session-name (For authentication type SIGV4) The role session name to be used by the SigV4 protocol for signing requests + --catalog-external-id (For authentication type SIGV4) An optional external id used to establish a trust relationship with AWS in the trust policy + --catalog-signing-region (For authentication type SIGV4) Region to be used by the SigV4 protocol for signing requests + --catalog-signing-name (For authentication type SIGV4) The service name to be used by the SigV4 protocol for signing requests, the default signing name is "execute-api" is if not provided + Positional arguments: + catalog +``` + +##### Examples + +``` +polaris catalogs create \ + --storage-type s3 \ + --default-base-location s3://example-bucket/my_data \ + --role-arn ${ROLE_ARN} \ + my_catalog + +polaris catalogs create \ + --storage-type s3 \ + --default-base-location s3://example-bucket/my_other_data \ + --allowed-location s3://example-bucket/second_location \ + --allowed-location s3://other-bucket/third_location \ + --role-arn ${ROLE_ARN} \ + my_other_catalog + +polaris catalogs create \ + --storage-type file \ + --default-base-location file:///example/tmp \ + quickstart_catalog +``` + +#### delete + +The `delete` subcommand is used to delete a catalog. + +``` +input: polaris catalogs delete --help +options: + delete + Positional arguments: + catalog +``` + +##### Examples + +``` +polaris catalogs delete some_catalog +``` + +#### get + +The `get` subcommand is used to retrieve details about a catalog. + +``` +input: polaris catalogs get --help +options: + get + Positional arguments: + catalog +``` + +##### Examples + +``` +polaris catalogs get some_catalog + +polaris catalogs get another_catalog +``` + +#### list + +The `list` subcommand is used to show details about all catalogs, or those that a certain principal role has access to. The principal used to perform this operation must have the `CATALOG_LIST` privilege. + +``` +input: polaris catalogs list --help +options: + list + Named arguments: + --principal-role The name of a principal role +``` + +##### Examples + +``` +polaris catalogs list + +polaris catalogs list --principal-role some_user +``` + +#### update + +The `update` subcommand is used to update a catalog. Currently, this command supports changing the properties of a catalog or updating its storage configuration. + +``` +input: polaris catalogs update --help +options: + update + Named arguments: + --default-base-location A new default base location for the catalog + --allowed-location An allowed location for files tracked by the catalog. Multiple locations can be provided by specifying this option more than once. + --region (Only for S3) The region to use when connecting to S3 + --set-property A key/value pair such as: tag=value. Merges the specified key/value into an existing properties map by updating the value if the key already exists or creating a new entry if not. Multiple can be provided by specifying this option more than once + --remove-property A key to remove from a properties map. If the key already does not exist then no action is takn for the specified key. If properties are also being set in the same update command then the list of removals is applied last. Multiple can be provided by specifying this option more than once + Positional arguments: + catalog +``` + +##### Examples + +``` +polaris catalogs update --property tag=new_value my_catalog + +polaris catalogs update --default-base-location s3://new-bucket/my_data my_catalog +``` + +### Principals + +The `principals` command is used to manage principals within Polaris. + +`principals` supports the following subcommands: + +1. create +2. delete +3. get +4. list +5. rotate-credentials +6. update +7. access +8. reset + +#### create + +The `create` subcommand is used to create a new principal. + +``` +input: polaris principals create --help +options: + create + Named arguments: + --type The type of principal to create in [SERVICE] + --property A key/value pair such as: tag=value. Multiple can be provided by specifying this option more than once + Positional arguments: + principal +``` + +##### Examples + +``` +polaris principals create some_user + +polaris principals create --client-id ${CLIENT_ID} --property admin=true some_admin_user +``` + +#### delete + +The `delete` subcommand is used to delete a principal. + +``` +input: polaris principals delete --help +options: + delete + Positional arguments: + principal +``` + +##### Examples + +``` +polaris principals delete some_user + +polaris principals delete some_admin_user +``` + +#### get + +The `get` subcommand retrieves details about a principal. + +``` +input: polaris principals get --help +options: + get + Positional arguments: + principal +``` + +##### Examples + +``` +polaris principals get some_user + +polaris principals get some_admin_user +``` + +#### list + +The `list` subcommand shows details about all principals. + +##### Examples + +``` +polaris principals list +``` + +#### rotate-credentials + +The `rotate-credentials` subcommand is used to update the credentials used by a principal. After this command runs successfully, the new credentials will be printed to stdout. + +``` +input: polaris principals rotate-credentials --help +options: + rotate-credentials + Positional arguments: + principal +``` + +##### Examples + +``` +polaris principals rotate-credentials some_user + +polaris principals rotate-credentials some_admin_user +``` + +#### update + +The `update` subcommand is used to update a principal. Currently, this supports rewriting the properties associated with a principal. + +``` +input: polaris principals update --help +options: + update + Named arguments: + --set-property A key/value pair such as: tag=value. Merges the specified key/value into an existing properties map by updating the value if the key already exists or creating a new entry if not. Multiple can be provided by specifying this option more than once + --remove-property A key to remove from a properties map. If the key already does not exist then no action is takn for the specified key. If properties are also being set in the same update command then the list of removals is applied last. Multiple can be provided by specifying this option more than once + Positional arguments: + principal +``` + +##### Examples + +``` +polaris principals update --property key=value --property other_key=other_value some_user + +polaris principals update --property are_other_keys_removed=yes some_user +``` + +#### access + +The `access` subcommand retrieves entities relation about a principal. + +``` +input: polaris principals access --help +options: + access + Positional arguments: + principal +``` + +##### Examples + +``` +polaris principals access quickstart_user +``` + +#### reset + +The `reset` subcommand is used to reset principal credentials. + +``` +input: polaris principals reset --help +options: + reset + Named arguments: + --new-client-id The new client ID for the principal + --new-client-secret The new client secret for the principal + Positional arguments: + principal +``` + +##### Examples + +``` +polaris principals create some_user + +polaris principals reset some_user +polaris principals reset --new-client-id ${NEW_CLIENT_ID} some_user +polaris principals reset --new-client-secret ${NEW_CLIENT_SECRET} some_user +polaris principals reset --new-client-id ${NEW_CLIENT_ID} --new-client-secret ${NEW_CLIENT_SECRET} some_user +``` + +### Principal Roles + +The `principal-roles` command is used to create, discover, and manage principal roles within Polaris. Additionally, this command can identify principals or catalog roles associated with a principal role, and can be used to grant a principal role to a principal. + +`principal-roles` supports the following subcommands: + +1. create +2. delete +3. get +4. list +5. update +6. grant +7. revoke + +#### create + +The `create` subcommand is used to create a new principal role. + +``` +input: polaris principal-roles create --help +options: + create + Named arguments: + --property A key/value pair such as: tag=value. Multiple can be provided by specifying this option more than once + Positional arguments: + principal_role +``` + +##### Examples + +``` +polaris principal-roles create data_engineer + +polaris principal-roles create --property key=value data_analyst +``` + +#### delete + +The `delete` subcommand is used to delete a principal role. + +``` +input: polaris principal-roles delete --help +options: + delete + Positional arguments: + principal_role +``` + +##### Examples + +``` +polaris principal-roles delete data_engineer + +polaris principal-roles delete data_analyst +``` + +#### get + +The `get` subcommand retrieves details about a principal role. + +``` +input: polaris principal-roles get --help +options: + get + Positional arguments: + principal_role +``` + +##### Examples + +``` +polaris principal-roles get data_engineer + +polaris principal-roles get data_analyst +``` + +#### list + +The list subcommand is used to print out all principal roles or, alternatively, to list all principal roles associated with a given principal or with a given catalog role. + +``` +input: polaris principal-roles list --help +options: + list + Named arguments: + --catalog-role The name of a catalog role. If provided, show only principal roles assigned to this catalog role. + --principal The name of a principal. If provided, show only principal roles assigned to this principal. +``` + +##### Examples + +``` +polaris principal-roles list + +polaris principal-roles --principal d.knuth + +polaris principal-roles --catalog-role super_secret_data +``` + +#### update + +The `update` subcommand is used to update a principal role. Currently, this supports updating the properties tied to a principal role. + +``` +input: polaris principal-roles update --help +options: + update + Named arguments: + --set-property A key/value pair such as: tag=value. Merges the specified key/value into an existing properties map by updating the value if the key already exists or creating a new entry if not. Multiple can be provided by specifying this option more than once + --remove-property A key to remove from a properties map. If the key already does not exist then no action is takn for the specified key. If properties are also being set in the same update command then the list of removals is applied last. Multiple can be provided by specifying this option more than once + Positional arguments: + principal_role +``` + +##### Examples + +``` +polaris principal-roles update --property key=value2 data_engineer + +polaris principal-roles update data_analyst --property key=value3 +``` + +#### grant + +The `grant` subcommand is used to grant a principal role to a principal. + +``` +input: polaris principal-roles grant --help +options: + grant + Named arguments: + --principal A principal to grant this principal role to + Positional arguments: + principal_role +``` + +##### Examples + +``` +polaris principal-roles grant --principal d.knuth data_engineer + +polaris principal-roles grant data_scientist --principal a.ng +``` + +#### revoke + +The `revoke` subcommand is used to revoke a principal role from a principal. + +``` +input: polaris principal-roles revoke --help +options: + revoke + Named arguments: + --principal A principal to revoke this principal role from + Positional arguments: + principal_role +``` + +##### Examples + +``` +polaris principal-roles revoke --principal former.employee data_engineer + +polaris principal-roles revoke data_scientist --principal changed.role +``` + +### Catalog Roles + +The catalog-roles command is used to create, discover, and manage catalog roles within Polaris. Additionally, this command can be used to grant a catalog role to a principal role. + +`catalog-roles` supports the following subcommands: + +1. create +2. delete +3. get +4. list +5. update +6. grant +7. revoke + +#### create + +The `create` subcommand is used to create a new catalog role. + +``` +input: polaris catalog-roles create --help +options: + create + Named arguments: + --catalog The name of an existing catalog + --property A key/value pair such as: tag=value. Multiple can be provided by specifying this option more than once + Positional arguments: + catalog_role +``` + +##### Examples + +``` +polaris catalog-roles create --property key=value --catalog some_catalog sales_data + +polaris catalog-roles create --catalog other_catalog sales_data +``` + +#### delete + +The `delete` subcommand is used to delete a catalog role. + +``` +input: polaris catalog-roles delete --help +options: + delete + Named arguments: + --catalog The name of an existing catalog + Positional arguments: + catalog_role +``` + +##### Examples + +``` +polaris catalog-roles delete --catalog some_catalog sales_data + +polaris catalog-roles delete --catalog other_catalog sales_data +``` + +#### get + +The `get` subcommand retrieves details about a catalog role. + +``` +input: polaris catalog-roles get --help +options: + get + Named arguments: + --catalog The name of an existing catalog + Positional arguments: + catalog_role +``` + +##### Examples + +``` +polaris catalog-roles get --catalog some_catalog inventory_data + +polaris catalog-roles get --catalog other_catalog inventory_data +``` + +#### list + +The `list` subcommand is used to print all catalog roles. Alternatively, if a principal role is provided, only catalog roles associated with that principal are shown. + +``` +input: polaris catalog-roles list --help +options: + list + Named arguments: + --principal-role The name of a principal role + Positional arguments: + catalog +``` + +##### Examples + +``` +polaris catalog-roles list + +polaris catalog-roles list --principal-role data_engineer +``` + +#### update + +The `update` subcommand is used to update a catalog role. Currently, only updating properties associated with the catalog role is supported. + +``` +input: polaris catalog-roles update --help +options: + update + Named arguments: + --catalog The name of an existing catalog + --set-property A key/value pair such as: tag=value. Merges the specified key/value into an existing properties map by updating the value if the key already exists or creating a new entry if not. Multiple can be provided by specifying this option more than once + --remove-property A key to remove from a properties map. If the key already does not exist then no action is takn for the specified key. If properties are also being set in the same update command then the list of removals is applied last. Multiple can be provided by specifying this option more than once + Positional arguments: + catalog_role +``` + +##### Examples + +``` +polaris catalog-roles update --property contains_pii=true --catalog some_catalog sales_data + +polaris catalog-roles update sales_data --catalog some_catalog --property key=value +``` + +#### grant + +The `grant` subcommand is used to grant a catalog role to a principal role. + +``` +input: polaris catalog-roles grant --help +options: + grant + Named arguments: + --catalog The name of an existing catalog + --principal-role The name of a catalog role + Positional arguments: + catalog_role +``` + +##### Examples + +``` +polaris catalog-roles grant sensitive_data --catalog some_catalog --principal-role power_user + +polaris catalog-roles grant --catalog sales_data contains_cc_info_catalog_role --principal-role financial_analyst_role +``` + +#### revoke + +The `revoke` subcommand is used to revoke a catalog role from a principal role. + +``` +input: polaris catalog-roles revoke --help +options: + revoke + Named arguments: + --catalog The name of an existing catalog + --principal-role The name of a catalog role + Positional arguments: + catalog_role +``` + +##### Examples + +``` +polaris catalog-roles revoke sensitive_data --catalog some_catalog --principal-role power_user + +polaris catalog-roles revoke --catalog sales_data contains_cc_info_catalog_role --principal-role financial_analyst_role +``` + +### Namespaces + +The `namespaces` command is used to manage namespaces within Polaris. + +`namespaces` supports the following subcommands: + +1. create +2. delete +3. get +4. list + +#### create + +The `create` subcommand is used to create a new namespace. + +When creating a namespace with an explicit location, that location must reside within the parent catalog or namespace. + +``` +input: polaris namespaces create --help +options: + create + Named arguments: + --catalog The name of an existing catalog + --location If specified, the location at which to store the namespace and entities inside it + --property A key/value pair such as: tag=value. Multiple can be provided by specifying this option more than once + Positional arguments: + namespace +``` + +##### Examples + +``` +polaris namespaces create --catalog my_catalog outer + +polaris namespaces create --catalog my_catalog --location 's3://bucket/outer/inner_SUFFIX' outer.inner +``` + +#### delete + +The `delete` subcommand is used to delete a namespace. + +``` +input: polaris namespaces delete --help +options: + delete + Named arguments: + --catalog The name of an existing catalog + Positional arguments: + namespace +``` + +##### Examples + +``` +polaris namespaces delete outer_namespace.inner_namespace --catalog my_catalog + +polaris namespaces delete --catalog my_catalog outer_namespace +``` + +#### get + +The `get` subcommand retrieves details about a namespace. + +``` +input: polaris namespaces get --help +options: + get + Named arguments: + --catalog The name of an existing catalog + Positional arguments: + namespace +``` + +##### Examples + +``` +polaris namespaces get --catalog some_catalog a.b + +polaris namespaces get a.b.c --catalog some_catalog +``` + +#### list + +The `list` subcommand shows details about all namespaces directly within a catalog or, optionally, within some parent prefix in that catalog. + +``` +input: polaris namespaces list --help +options: + list + Named arguments: + --catalog The name of an existing catalog + --parent If specified, list namespaces inside this parent namespace +``` + +##### Examples + +``` +polaris namespaces list --catalog my_catalog + +polaris namespaces list --catalog my_catalog --parent a + +polaris namespaces list --catalog my_catalog --parent a.b +``` + +### Privileges + +The `privileges` command is used to grant various privileges to a catalog role, or to revoke those privileges. Privileges can be on the level of a catalog, a namespace, a table, or a view. For more information on privileges, please refer to the [docs]({{% ref "entities#privilege" %}}). + +Note that when using the `privileges` command, the user specifies the relevant catalog and catalog role before selecting a subcommand. + +`privileges` supports the following subcommands: + +1. list +2. catalog +3. namespace +4. table +5. view + +Each of these subcommands, except `list`, supports the `grant` and `revoke` actions and requires an action to be specified. + +Note that each subcommand's `revoke` action always accepts the same options that the corresponding `grant` action does, but with the addition of the `cascade` option. `cascade` is used to revoke all other privileges that depend on the specified privilege. + +#### list + +The `list` subcommand shows details about all privileges for a catalog role. + +``` +input: polaris privileges list --help +options: + list + Named arguments: + --catalog The name of an existing catalog + --catalog-role The name of a catalog role +``` + +##### Examples + +``` +polaris privileges list --catalog my_catalog --catalog-role my_role + +polaris privileges my_role list --catalog-role my_other_role --catalog my_catalog +``` + +#### catalog + +The `catalog` subcommand manages privileges at the catalog level. `grant` is used to grant catalog privileges to the specified catalog role, and `revoke` is used to revoke them. + +``` +input: polaris privileges catalog --help +options: + catalog + grant + Named arguments: + --catalog The name of an existing catalog + --catalog-role The name of a catalog role + Positional arguments: + privilege + revoke + Named arguments: + --cascade When revoking privileges, additionally revoke privileges that depend on the specified privilege + --catalog The name of an existing catalog + --catalog-role The name of a catalog role + Positional arguments: + privilege +``` + +##### Examples + +``` +polaris privileges \ + catalog \ + grant \ + --catalog my_catalog \ + --catalog-role catalog_role \ + TABLE_CREATE + +polaris privileges \ + catalog \ + revoke \ + --catalog my_catalog \ + --catalog-role catalog_role \ + --cascade \ + TABLE_CREATE +``` + +#### namespace + +The `namespace` subcommand manages privileges at the namespace level. + +``` +input: polaris privileges namespace --help +options: + namespace + grant + Named arguments: + --namespace A period-delimited namespace + --catalog The name of an existing catalog + --catalog-role The name of a catalog role + Positional arguments: + privilege + revoke + Named arguments: + --namespace A period-delimited namespace + --cascade When revoking privileges, additionally revoke privileges that depend on the specified privilege + --catalog The name of an existing catalog + --catalog-role The name of a catalog role + Positional arguments: + privilege +``` + +##### Examples + +``` +polaris privileges \ + namespace \ + grant \ + --catalog my_catalog \ + --catalog-role catalog_role \ + --namespace a.b \ + TABLE_LIST + +polaris privileges \ + namespace \ + revoke \ + --catalog my_catalog \ + --catalog-role catalog_role \ + --namespace a.b \ + TABLE_LIST +``` + +#### table + +The `table` subcommand manages privileges at the table level. + +``` +input: polaris privileges table --help +options: + table + grant + Named arguments: + --namespace A period-delimited namespace + --table The name of a table + --catalog The name of an existing catalog + --catalog-role The name of a catalog role + Positional arguments: + privilege + revoke + Named arguments: + --namespace A period-delimited namespace + --table The name of a table + --cascade When revoking privileges, additionally revoke privileges that depend on the specified privilege + --catalog The name of an existing catalog + --catalog-role The name of a catalog role + Positional arguments: + privilege +``` + +##### Examples + +``` +polaris privileges \ + table \ + grant \ + --catalog my_catalog \ + --catalog-role catalog_role \ + --namespace a.b \ + --table t \ + TABLE_DROP + +polaris privileges \ + table \ + grant \ + --catalog my_catalog \ + --catalog-role catalog_role \ + --namespace a.b \ + --table t \ + --cascade \ + TABLE_DROP +``` + +#### view + +The `view` subcommand manages privileges at the view level. + +``` +input: polaris privileges view --help +options: + view + grant + Named arguments: + --namespace A period-delimited namespace + --view The name of a view + --catalog The name of an existing catalog + --catalog-role The name of a catalog role + Positional arguments: + privilege + revoke + Named arguments: + --namespace A period-delimited namespace + --view The name of a view + --cascade When revoking privileges, additionally revoke privileges that depend on the specified privilege + --catalog The name of an existing catalog + --catalog-role The name of a catalog role + Positional arguments: + privilege +``` + +##### Examples + +``` +polaris privileges \ + view \ + grant \ + --catalog my_catalog \ + --catalog-role catalog_role \ + --namespace a.b.c \ + --view v \ + VIEW_FULL_METADATA + +polaris privileges \ + view \ + grant \ + --catalog my_catalog \ + --catalog-role catalog_role \ + --namespace a.b.c \ + --view v \ + --cascade \ + VIEW_FULL_METADATA +``` + +### profiles + +The `profiles` command is used to manage stored authentication profiles in Polaris. Profiles allow authentication credentials to be saved and reused, eliminating the need to pass credentials with every command. + +`profiles` supports the following subcommands: + +1. create +2. delete +3. get +4. list +5. update + +#### create + +The `create` subcommand is used to create a new authentication profile. + +``` +input: polaris profiles create --help +options: + create + Positional arguments: + profile +``` + +##### Examples + +``` +polaris profiles create dev +``` + +#### delete + +The `delete` subcommand removes a stored profile. + +``` +input: polaris profiles delete --help +options: + delete + Positional arguments: + profile +``` + +##### Examples + +``` +polaris profiles delete dev +``` + +#### get + +The `get` subcommand removes a stored profile. + +``` +input: polaris profiles get --help +options: + get + Positional arguments: + profile +``` + +##### Examples + +``` +polaris profiles get dev +``` + +#### list + +The `list` subcommand displays all stored profiles. + +``` +input: polaris profiles list --help +options: + list +``` + +##### Examples + +``` +polaris profiles list +``` + +#### update + +The `update` subcommand modifies an existing profile. + +``` +input: polaris profiles update --help +options: + update + Positional arguments: + profile +``` + +##### Examples + +``` +polaris profiles update dev +``` + +### Policies + +The `policies` command is used to manage policies within Polaris. + +`policies` supports the following subcommands: + +1. attach +2. create +3. delete +4. detach +5. get +6. list +7. update + +#### attach + +The `attach` subcommand is used to create a mapping between a policy and a resource entity. + +``` +input: polaris policies attach --help +options: + attach + Named arguments: + --catalog The name of an existing catalog + --namespace A period-delimited namespace + --attachment-type The type of entity to attach the policy to, e.g., 'catalog', 'namespace', or table-like. + --attachment-path The path of the entity to attach the policy to, e.g., 'ns1.tb1'. Not required for catalog-level attachment. + --parameters Optional key-value pairs for the attachment/detachment, e.g., key=value. Can be specified multiple times. + Positional arguments: + policy +``` + +##### Examples + +``` +polaris policies attach --catalog some_catalog --namespace some.schema --attachment-type namespace --attachment-path some.schema some_policy + +polaris policies attach --catalog some_catalog --namespace some.schema --attachment-type table-like --attachment-path some.schema.t some_table_policy +``` + +#### create + +The `create` subcommand is used to create a policy. + +``` +input: polaris policies create --help +options: + create + Named arguments: + --catalog The name of an existing catalog + --namespace A period-delimited namespace + --policy-file The path to a JSON file containing the policy definition + --policy-type The type of the policy, e.g., 'system.data-compaction' + --policy-description An optional description for the policy. + Positional arguments: + policy +``` + +##### Examples + +``` +polaris policies create --catalog some_catalog --namespace some.schema --policy-file some_policy.json --policy-type system.data-compaction some_policy + +polaris policies create --catalog some_catalog --namespace some.schema --policy-file some_snapshot_expiry_policy.json --policy-type system.snapshot-expiry some_snapshot_expiry_policy +``` + +#### delete + +The `delete` subcommand is used to delete a policy. + +``` +input: polaris policies delete --help +options: + delete + Named arguments: + --catalog The name of an existing catalog + --namespace A period-delimited namespace + --detach-all When set to true, the policy will be deleted along with all its attached mappings. + Positional arguments: + policy +``` + +##### Examples + +``` +polaris policies delete --catalog some_catalog --namespace some.schema some_policy + +polaris policies delete --catalog some_catalog --namespace some.schema --detach-all some_policy +``` + +#### detach + +The `detach` subcommand is used to remove a mapping between a policy and a target entity + +``` +input: polaris policies detach --help +options: + detach + Named arguments: + --catalog The name of an existing catalog + --namespace A period-delimited namespace + --attachment-type The type of entity to attach the policy to, e.g., 'catalog', 'namespace', or table-like. + --attachment-path The path of the entity to attach the policy to, e.g., 'ns1.tb1'. Not required for catalog-level attachment. + --parameters Optional key-value pairs for the attachment/detachment, e.g., key=value. Can be specified multiple times. + Positional arguments: + policy +``` + +##### Examples + +``` +polaris policies detach --catalog some_catalog --namespace some.schema --attachment-type namespace --attachment-path some.schema some_policy + +polaris policies detach --catalog some_catalog --namespace some.schema --attachment-type catalog --attachment-path some_catalog some_policy +``` + +#### get + +The `get` subcommand is used to load a policy from the catalog. + +``` +input: polaris policies get --help +options: + get + Named arguments: + --catalog The name of an existing catalog + --namespace A period-delimited namespace + Positional arguments: + policy +``` + +##### Examples + +``` +polaris policies get --catalog some_catalog --namespace some.schema some_policy +``` + +#### list + +The `list` subcommand is used to get all policy identifiers under this namespace and all applicable policies for a specified entity. + +``` +input: polaris policies list --help +options: + list + Named arguments: + --catalog The name of an existing catalog + --namespace A period-delimited namespace + --target-name The name of the target entity (e.g., table name, namespace name). + --applicable When set, lists policies applicable to the target entity (considering inheritance) instead of policies defined directly in the target. + --policy-type The type of the policy, e.g., 'system.data-compaction' +``` + +##### Examples + +``` +polaris policies list --catalog some_catalog + +polaris policies list --catalog some_catalog --applicable +``` + +#### update + +The `update` subcommand is used to update a policy. + +``` +input: polaris policies update --help +options: + update + Named arguments: + --catalog The name of an existing catalog + --namespace A period-delimited namespace + --policy-file The path to a JSON file containing the policy definition + --policy-description An optional description for the policy. + Positional arguments: + policy +``` + +##### Examples + +``` +polaris policies update --catalog some_catalog --namespace some.schema --policy-file my_updated_policy.json my_policy + +polaris policies update --catalog some_catalog --namespace some.schema --policy-file my_updated_policy.json --policy-description "Updated policy description" my_policy +``` + +### repair + +The `repair` command is a bash script wrapper used to regenerate Python client code and update necessary dependencies, ensuring the Polaris client remains up-to-date and functional. **Please note that this command does not support any options and its usage information is not available via a `--help` flag.** + +## Examples + +This section outlines example code for a few common operations as well as for some more complex ones. + +For especially complex operations, you may wish to instead directly use the Python API. + +### Creating a principal and a catalog + +``` +polaris principals create my_user + +polaris catalogs create \ + --type internal \ + --storage-type s3 \ + --default-base-location s3://iceberg-bucket/polaris-base \ + --role-arn arn:aws:iam::111122223333:role/ExampleCorpRole \ + --allowed-location s3://iceberg-bucket/polaris-alt-location-1 \ + --allowed-location s3://iceberg-bucket/polaris-alt-location-2 \ + my_catalog +``` + +### Granting a principal the ability to manage the content of a catalog + +``` +polaris principal-roles create power_user +polaris principal-roles grant --principal my_user power_user + +polaris catalog-roles create --catalog my_catalog my_catalog_role +polaris catalog-roles grant \ + --catalog my_catalog \ + --principal-role power_user \ + my_catalog_role + +polaris privileges \ + catalog \ + --catalog my_catalog \ + --catalog-role my_catalog_role \ + grant \ + CATALOG_MANAGE_CONTENT +``` + +### Identifying the tables a given principal has been granted explicit access to read + +_Note that some other privileges, such as `CATALOG_MANAGE_CONTENT`, subsume `TABLE_READ_DATA` and would not be discovered here._ + +``` +principal_roles=$(polaris principal-roles list --principal my_principal) +for principal_role in ${principal_roles}; do + catalog_roles=$(polaris catalog-roles --list --principal-role "${principal_role}") + for catalog_role in ${catalog_roles}; do + grants=$(polaris privileges list --catalog-role "${catalog_role}" --catalog "${catalog}") + for grant in $(echo "${grants}" | jq -c '.[] | select(.privilege == "TABLE_READ_DATA")'); do + echo "${grant}" + done + done +done +``` diff --git a/1.2.0/configuration.md b/1.2.0/configuration.md new file mode 100644 index 0000000000..78fd9cedbc --- /dev/null +++ b/1.2.0/configuration.md @@ -0,0 +1,201 @@ +--- +# +# Licensed to the Apache Software Foundation (ASF) under one +# or more contributor license agreements. See the NOTICE file +# distributed with this work for additional information +# regarding copyright ownership. The ASF licenses this file +# to you under the Apache License, Version 2.0 (the +# "License"); you may not use this file except in compliance +# with the License. You may obtain a copy of the License at +# +# http://www.apache.org/licenses/LICENSE-2.0 +# +# Unless required by applicable law or agreed to in writing, +# software distributed under the License is distributed on an +# "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY +# KIND, either express or implied. See the License for the +# specific language governing permissions and limitations +# under the License. +# +title: Configuring Polaris +type: docs +weight: 550 +--- + +## Overview + +This page provides information on how to configure Apache Polaris (Incubating). Unless stated +otherwise, this information is valid both for Polaris Docker images (and Kubernetes deployments) as +well as for Polaris binary distributions. + +{{< alert note >}} +For Production tips and best practices, refer to [Configuring Polaris for Production]({{% ref "configuring-polaris-for-production.md" %}}). +{{< /alert >}} + +First off, Polaris server runs on Quarkus, and uses its configuration mechanisms. Read Quarkus +[configuration guide](https://quarkus.io/guides/config) to get familiar with the basics. + +Quarkus aggregates configuration properties from multiple sources, applying them in a specific order +of precedence. When a property is defined in multiple sources, the value from the source with the +higher priority overrides those from lower-priority sources. + +The sources are listed below, from highest to lowest priority: + +1. System properties: properties set via the Java command line using `-Dproperty.name=value`. +2. Environment variables (see below for important details). +3. Settings in `$PWD/config/application.properties` file. +4. The `application.properties` files packaged in Polaris. +5. Default values: hardcoded defaults within the application. + +When using environment variables, there are two naming conventions: + +1. If possible, just use the property name as the environment variable name. This works fine in most + cases, e.g. in Kubernetes deployments. For example, `polaris.realm-context.realms` can be + included as is in a container YAML definition: + ```yaml + env: + - name: "polaris.realm-context.realms" + value: "realm1,realm2" + ``` + +2. If running from a script or shell prompt, however, stricter naming rules apply: variable names + can consist solely of uppercase letters, digits, and the `_` (underscore) sign. In such + situations, the environment variable name must be derived from the property name, by using + uppercase letters, and replacing all dots, dashes and quotes by underscores. For example, + `polaris.realm-context.realms` becomes `POLARIS_REALM_CONTEXT_REALMS`. See + [here](https://smallrye.io/smallrye-config/Main/config/environment-variables/) for more details. + +{{< alert important >}} +While convenient, uppercase-only environment variables can be problematic for complex property +names. In these situations, it's preferable to use system properties or a configuration file. +{{< /alert >}} + + +As stated above, a configuration file can also be provided at runtime; it should be available +(mounted) at `$PWD/config/application.properties` for Polaris server to recognize it. In Polaris +official Docker images, this location is `/deployment/config/application.properties`. + +For Kubernetes deployments, the configuration file is typically defined as a `ConfigMap`, then +mounted in the container at `/deployment/config/application.properties`. It can be mounted in +read-only mode, as Polaris only reads the configuration file once, at startup. + +## Polaris Configuration Options Reference + +| Configuration Property | Default Value | Description | +|----------------------------------------------------------------------------------------|------------------------------------|---------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------| +| `polaris.persistence.type` | `relational-jdbc` | Define the persistence backend used by Polaris (`in-memory`, `relational-jdbc`, `eclipse-link` (deprecated)). See [Configuring Apache Polaris for Production)[{{% ref "configuring-polaris-for-production.md" %}}) | +| `polaris.persistence.relational.jdbc.max-retries` | `1` | Total number of retries JDBC persistence will attempt on connection resets or serialization failures before giving up. | +| `polaris.persistence.relational.jdbc.max_duaration_in_ms` | `5000 ms` | Max time interval (ms) since the start of a transaction when retries can be attempted. | +| `polaris.persistence.relational.jdbc.initial_delay_in_ms` | `100 ms` | Initial delay before retrying. The delay is doubled after each retry. | +| `polaris.persistence.eclipselink.configurationFile` | | Define the location of the `persistence.xml`. By default, it's the built-in `persistence.xml` in use. | +| `polaris.persistence.eclipselink.persistenceUnit` | `polaris` | Define the name of the persistence unit to use, as defined in the `persistence.xml`. | +| `polaris.realm-context.type` | `default` | Define the type of the Polaris realm to use. | +| `polaris.realm-context.realms` | `POLARIS` | Define the list of realms to use. | +| `polaris.realm-context.header-name` | `Polaris-Realm` | Define the header name defining the realm context. | +| `polaris.features."ENFORCE_PRINCIPAL_CREDENTIAL_ROTATION_REQUIRED_CHECKING"` | `false` | Flag to enforce check if credential rotation. | +| `polaris.features."SUPPORTED_CATALOG_STORAGE_TYPES"` | `S3`, `GCS`, `AZURE` | Define the catalog supported storage. Supported values are `S3`, `GCS`, `AZURE`, `FILE`. | +| `polaris.features.realm-overrides."my-realm"."SKIP_CREDENTIAL_SUBSCOPING_INDIRECTION"` | `true` | "Override" realm features, here the skip credential subscoping indirection flag. | +| `polaris.authentication.authenticator.type` | `default` | Define the Polaris authenticator type. | +| `polaris.authentication.token-service.type` | `default` | Define the Polaris token service type. | +| `polaris.authentication.token-broker.type` | `rsa-key-pair` | Define the Polaris token broker type. Also configure the location of the key files. For RSA: if the locations of the key files are not configured, an ephemeral key-pair will be created on each Polaris server instance startup, which breaks existing tokens after server restarts and is also incompatible with running multiple Polaris server instances. | +| `polaris.authentication.token-broker.max-token-generation` | `PT1H` | Define the max token generation policy on the token broker. | +| `polaris.authentication.token-broker.rsa-key-pair.private-key-file` | | Define the location of the RSA-256 private key file, if present the `public-key` file must be specified, too. | +| `polaris.authentication.token-broker.rsa-key-pair.public-key-file` | | Define the location of the RSA-256 public key file, if present the `private-key` file must be specified, too. | +| `polaris.authentication.token-broker.symmetric-key.secret` | `secret` | Define the secret of the symmetric key. | +| `polaris.authentication.token-broker.symmetric-key.file` | `/tmp/symmetric.key` | Define the location of the symmetric key file. | +| `polaris.storage.aws.access-key` | `accessKey` | Define the AWS S3 access key. If unset, the default credential provider chain will be used. | +| `polaris.storage.aws.secret-key` | `secretKey` | Define the AWS S3 secret key. If unset, the default credential provider chain will be used. | +| `polaris.storage.gcp.token` | `token` | Define the Google Cloud Storage token. If unset, the default credential provider chain will be used. | +| `polaris.storage.gcp.lifespan` | `PT1H` | Define the Google Cloud Storage lifespan type. If unset, the default credential provider chain will be used. | +| `polaris.log.request-id-header-name` | `Polaris-Request-Id` | Define the header name to match request ID in the log. | +| `polaris.log.mdc.aid` | `polaris` | Define the log context (e.g. MDC) AID. | +| `polaris.log.mdc.sid` | `polaris-service` | Define the log context (e.g. MDC) SID. | +| `polaris.rate-limiter.filter.type` | `no-op` | Define the Polaris rate limiter. Supported values are `no-op`, `token-bucket`. | +| `polaris.rate-limiter.token-bucket.type` | `default` | Define the token bucket rate limiter. | +| `polaris.rate-limiter.token-bucket.requests-per-second` | `9999` | Define the number of requests per second for the token bucket rate limiter. | +| `polaris.rate-limiter.token-bucket.window` | `PT10S` | Define the window type for the token bucket rate limiter. | +| `polaris.metrics.tags.=` | `application=Polaris` | Define arbitrary metric tags to include in every request. | +| `polaris.metrics.realm-id-tag.api-metrics-enabled` | `false` | Whether to enable the `realm_id` metric tag in API metrics. | +| `polaris.metrics.realm-id-tag.http-metrics-enabled` | `false` | Whether to enable the `realm_id` metric tag in HTTP request metrics. | +| `polaris.metrics.realm-id-tag.http-metrics-max-cardinality` | `100` | The maximum cardinality for the `realm_id` tag in HTTP request metrics. | +| `polaris.tasks.max-concurrent-tasks` | `100` | Define the max number of concurrent tasks. | +| `polaris.tasks.max-queued-tasks` | `1000` | Define the max number of tasks in queue. | +| `polaris.config.rollback.compaction.on-conflicts.enabled` | `false` | When set to true Polaris will apply the deconfliction by rollbacking those REPLACE operations snapshots which have the property of `polaris.internal.rollback.compaction.on-conflict` in their snapshot summary set to `rollback`, to resolve conflicts at the server end. | +| `polaris.event-listener.type` | `no-op` | Define the Polaris event listener type. Supported values are `no-op`, `aws-cloudwatch`. | +| `polaris.event-listener.aws-cloudwatch.log-group` | `polaris-cloudwatch-default-group` | Define the AWS CloudWatch log group name for the event listener. | +| `polaris.event-listener.aws-cloudwatch.log-stream` | `polaris-cloudwatch-default-stream`| Define the AWS CloudWatch log stream name for the event listener. Ensure that Polaris' IAM credentials have the following actions: "PutLogEvents", "DescribeLogStreams", and "DescribeLogGroups" on the specified log stream/group. If the specified log stream/group does not exist, then "CreateLogStream" and "CreateLogGroup" will also be required. | +| `polaris.event-listener.aws-cloudwatch.region` | `us-east-1` | Define the AWS region for the CloudWatch event listener. | +| `polaris.event-listener.aws-cloudwatch.synchronous-mode` | `false` | Define whether log events are sent to CloudWatch synchronously. When set to true, events are sent synchronously which may impact performance but ensures immediate delivery. When false (default), events are sent asynchronously for better performance. | + +There are non Polaris configuration properties that can be useful: + +| Configuration Property | Default Value | Description | +|------------------------------------------------------|---------------------------------|-----------------------------------------------------------------------------| +| `quarkus.log.level` | `INFO` | Define the root log level. | +| `quarkus.log.category."org.apache.polaris".level` | | Define the log level for a specific category. | +| `quarkus.default-locale` | System locale | Force the use of a specific locale, for instance `en_US`. | +| `quarkus.http.port` | `8181` | Define the HTTP port number. | +| `quarkus.http.auth.basic` | `false` | Enable the HTTP basic authentication. | +| `quarkus.http.limits.max-body-size` | `10240K` | Define the HTTP max body size limit. | +| `quarkus.http.cors.origins` | | Define the HTTP CORS origins. | +| `quarkus.http.cors.methods` | `PATCH, POST, DELETE, GET, PUT` | Define the HTTP CORS covered methods. | +| `quarkus.http.cors.headers` | `*` | Define the HTTP CORS covered headers. | +| `quarkus.http.cors.exposed-headers` | `*` | Define the HTTP CORS covered exposed headers. | +| `quarkus.http.cors.access-control-max-age` | `PT10M` | Define the HTTP CORS access control max age. | +| `quarkus.http.cors.access-control-allow-credentials` | `true` | Define the HTTP CORS access control allow credentials flag. | +| `quarkus.management.enabled` | `true` | Enable the management server. | +| `quarkus.management.port` | `8182` | Define the port number of the Polaris management server. | +| `quarkus.management.root-path` | | Define the root path where `/metrics` and `/health` endpoints are based on. | +| `quarkus.otel.sdk.disabled` | `true` | Enable the OpenTelemetry layer. | + +{{< alert note >}} +This section is only relevant for Polaris Docker images and Kubernetes deployments. +{{< /alert >}} + +There are many other actionable environment variables available in the official Polaris Docker +image; they come from the base image used by Polaris, [ubi9/openjdk-21-runtime]. They should be used +to fine-tune the Java runtime directly, e.g. to enable debugging or to set the heap size. These +variables are not specific to Polaris, but are inherited from the base image. If in doubt, leave +everything at its default! + +[ubi9/openjdk-21-runtime]: https://catalog.redhat.com/software/containers/ubi9/openjdk-21-runtime/6501ce769a0d86945c422d5f + +| Environment variable | Description | +|----------------------------------|----------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------| +| `JAVA_OPTS` or `JAVA_OPTIONS` | **NOT RECOMMENDED**. JVM options passed to the `java` command (example: "-verbose:class"). Setting this variable will override all options set by any of the other variables in this table. To pass extra settings, use `JAVA_OPTS_APPEND` instead. | +| `JAVA_OPTS_APPEND` | User specified Java options to be appended to generated options in `JAVA_OPTS` (example: "-Dsome.property=foo"). | +| `JAVA_TOOL_OPTIONS` | This variable is defined and honored by all OpenJDK distros, see [here](https://bugs.openjdk.org/browse/JDK-4971166). Options defined here take precedence over all else; using this variable is generally not necessary, but can be useful e.g. to enforce JVM startup parameters, to set up remote debug, or to define JVM agents. | +| `JAVA_MAX_MEM_RATIO` | Is used to calculate a default maximal heap memory based on a containers restriction. If used in a container without any memory constraints for the container then this option has no effect. If there is a memory constraint then `-XX:MaxRAMPercentage` is set to a ratio of the container available memory as set here. The default is `80` which means 80% of the available memory is used as an upper boundary. You can skip this mechanism by setting this value to `0` in which case no `-XX:MaxRAMPercentage` option is added. | +| `JAVA_DEBUG` | If set remote debugging will be switched on. Disabled by default (example: true"). | +| `JAVA_DEBUG_PORT` | Port used for remote debugging. Defaults to "5005" (tip: use "*:5005" to enable debugging on all network interfaces). | +| `GC_MIN_HEAP_FREE_RATIO` | Minimum percentage of heap free after GC to avoid expansion. Default is 10. | +| `GC_MAX_HEAP_FREE_RATIO` | Maximum percentage of heap free after GC to avoid shrinking. Default is 20. | +| `GC_TIME_RATIO` | Specifies the ratio of the time spent outside the garbage collection. Default is 4. | +| `GC_ADAPTIVE_SIZE_POLICY_WEIGHT` | The weighting given to the current GC time versus previous GC times. Default is 90. | +| `GC_METASPACE_SIZE` | The initial metaspace size. There is no default (example: "20"). | +| `GC_MAX_METASPACE_SIZE` | The maximum metaspace size. There is no default (example: "100"). | +| `GC_CONTAINER_OPTIONS` | Specify Java GC to use. The value of this variable should contain the necessary JRE command-line options to specify the required GC, which will override the default of `-XX:+UseParallelGC` (example: `-XX:+UseG1GC`). | +Here are some examples: + +| Example | `docker run` option | +|--------------------------------------------|---------------------------------------------------------------------------------------------------------------------| +| Using another GC | `-e GC_CONTAINER_OPTIONS="-XX:+UseShenandoahGC"` lets Polaris use Shenandoah GC instead of the default parallel GC. | +| Set the Java heap size to a _fixed_ amount | `-e JAVA_OPTS_APPEND="-Xms8g -Xmx8g"` lets Polaris use a Java heap of 8g. | +| Set the maximum heap percentage | `-e JAVA_MAX_MEM_RATIO="70"` lets Polaris use 70% percent of the available memory. | + + +## Troubleshooting Configuration Issues + +If you encounter issues with the configuration, you can ask Polaris to print out the configuration it +is using. To do this, set the log level for the `io.smallrye.config` category to `DEBUG`, and also +set the console appender level to `DEBUG`: + +```properties +quarkus.log.console.level=DEBUG +quarkus.log.category."io.smallrye.config".level=DEBUG +``` + +{{< alert important >}} +This will print out all configuration values, including sensitive ones like +passwords. Don't do this in production, and don't share this output with anyone you don't trust! +{{< /alert >}} diff --git a/1.2.0/configuring-polaris-for-production.md b/1.2.0/configuring-polaris-for-production.md new file mode 100644 index 0000000000..928d8115f1 --- /dev/null +++ b/1.2.0/configuring-polaris-for-production.md @@ -0,0 +1,223 @@ +--- +# +# Licensed to the Apache Software Foundation (ASF) under one +# or more contributor license agreements. See the NOTICE file +# distributed with this work for additional information +# regarding copyright ownership. The ASF licenses this file +# to you under the Apache License, Version 2.0 (the +# "License"); you may not use this file except in compliance +# with the License. You may obtain a copy of the License at +# +# http://www.apache.org/licenses/LICENSE-2.0 +# +# Unless required by applicable law or agreed to in writing, +# software distributed under the License is distributed on an +# "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY +# KIND, either express or implied. See the License for the +# specific language governing permissions and limitations +# under the License. +# +title: Configuring Polaris for Production +linkTitle: Production Configuration +type: docs +weight: 600 +--- + +The default server configuration is intended for development and testing. When you deploy Polaris in production, +review and apply the following checklist: +- [ ] Configure OAuth2 keys +- [ ] Enforce realm header validation (`require-header=true`) +- [ ] Use a durable metastore (JDBC + PostgreSQL) +- [ ] Bootstrap valid realms in the metastore +- [ ] Disable local FILE storage + +### Configure OAuth2 + +Polaris authentication requires specifying a token broker factory type. Two implementations are +supported out of the box: + +- [rsa-key-pair] uses a pair of public and private keys; +- [symmetric-key] uses a shared secret. + +[rsa-key-pair]: https://github.com/apache/polaris/blob/390f1fa57bb1af24a21aa95fdbff49a46e31add7/service/common/src/main/java/org/apache/polaris/service/auth/JWTRSAKeyPairFactory.java +[symmetric-key]: https://github.com/apache/polaris/blob/390f1fa57bb1af24a21aa95fdbff49a46e31add7/service/common/src/main/java/org/apache/polaris/service/auth/JWTSymmetricKeyFactory.java + +By default, Polaris uses `rsa-key-pair`, with randomly generated keys. + +{{< alert important >}} +The default `rsa-key-pair` configuration is not suitable when deploying many replicas of Polaris, +as each replica will have its own set of keys. This will cause token validation to fail when a +request is routed to a different replica than the one that issued the token. +{{< /alert >}} + +It is highly recommended to configure Polaris with previously-generated RSA keys. This can be done +by setting the following properties: + +```properties +polaris.authentication.token-broker.type=rsa-key-pair +polaris.authentication.token-broker.rsa-key-pair.public-key-file=/tmp/public.key +polaris.authentication.token-broker.rsa-key-pair.private-key-file=/tmp/private.key +``` + +To generate an RSA key pair in PKCS#8 format, you can use the following commands: + +```shell +openssl genpkey -algorithm RSA -out private.key -pkeyopt rsa_keygen_bits:2048 +openssl rsa -in private.key -pubout -out public.key +``` + +Alternatively, you can use a symmetric key by setting the following properties: + +```properties +polaris.authentication.token-broker.type=symmetric-key +polaris.authentication.token-broker.symmetric-key.file=/tmp/symmetric.key +``` + +Note: it is also possible to set the symmetric key secret directly in the configuration file. If +possible, pass the secret as an environment variable to avoid storing sensitive information in the +configuration file: + +```properties +polaris.authentication.token-broker.symmetric-key.secret=${POLARIS_SYMMETRIC_KEY_SECRET} +``` + +Finally, you can also configure the token broker to use a maximum lifespan by setting the following +property: + +```properties +polaris.authentication.token-broker.max-token-generation=PT1H +``` + +Typically, in Kubernetes, you would define the keys as a `Secret` and mount them as files in the +container. + +### Realm Context Resolver + +By default, Polaris resolves realms based on incoming request headers. You can configure the realm +context resolver by setting the following properties in `application.properties`: + +```properties +polaris.realm-context.realms=POLARIS,MY-REALM +polaris.realm-context.header-name=Polaris-Realm +``` + +Where: + +- `realms` is a comma-separated list of allowed realms. This setting _must_ be correctly configured. + At least one realm must be specified. +- `header-name` is the name of the header used to resolve the realm; by default, it is + `Polaris-Realm`. + +If a request contains the specified header, Polaris will use the realm specified in the header. If +the realm is not in the list of allowed realms, Polaris will return a `404 Not Found` response. + +If a request _does not_ contain the specified header, however, by default Polaris will use the first +realm in the list as the default realm. In the above example, `POLARIS` is the default realm and +would be used if the `Polaris-Realm` header is not present in the request. + +This is not recommended for production use, as it may lead to security vulnerabilities. To avoid +this, set the following property to `true`: + +```properties +polaris.realm-context.require-header=true +``` + +This will cause Polaris to also return a `404 Not Found` response if the realm header is not present +in the request. + +### Metastore Configuration + +A metastore should be configured with an implementation that durably persists Polaris entities. By +default, Polaris uses an in-memory metastore. + +{{< alert important >}} +The default in-memory metastore is not suitable for production use, as it will lose all data +when the server is restarted; it is also unusable when multiple Polaris replicas are used. +{{< /alert >}} + +To enable a durable metastore, configure your system to use the Relational JDBC-backed metastore. +This implementation leverages Quarkus for datasource management and supports configuration through +environment variables or JVM -D flags at startup. For more information, refer to the [Quarkus configuration reference](https://quarkus.io/guides/config-reference#env-file). + +Configure the metastore by setting the following ENV variables: + +``` +POLARIS_PERSISTENCE_TYPE=relational-jdbc + +QUARKUS_DATASOURCE_USERNAME= +QUARKUS_DATASOURCE_PASSWORD= +QUARKUS_DATASOURCE_JDBC_URL= +``` + + +The relational JDBC metastore is a Quarkus-managed datasource and only supports Postgres and H2 as of now. +Please refer to the documentation here: +[Configure data sources in Quarkus](https://quarkus.io/guides/datasource) + +{{< alert important >}} +Be sure to secure your metastore backend since it will be storing sensitive data and catalog +metadata. +{{< /alert >}} + +Note: Polaris will always create schema 'polaris_schema' during bootstrap under the configured database. + +### Bootstrapping + +Before using Polaris, you must **bootstrap** the metastore. This is a manual operation that must be +performed **only once** for each realm in order to prepare the metastore to integrate with Polaris. + +By default, when bootstrapping a new realm, Polaris will create randomised `CLIENT_ID` and +`CLIENT_SECRET` for the `root` principal and store their hashes in the metastore backend. + +Depending on your database, this may not be convenient as the generated credentials are not stored +in clear text in the database. + +In order to provide your own credentials for `root` principal (so you can request tokens via +`api/catalog/v1/oauth/tokens`), use the [Polaris Admin Tool]({{% ref "admin-tool" %}}) + +You can verify the setup by attempting a token issue for the `root` principal: + +```bash +curl -X POST http://localhost:8181/api/catalog/v1/oauth/tokens \ + -d "grant_type=client_credentials" \ + -d "client_id=my-client-id" \ + -d "client_secret=my-client-secret" \ + -d "scope=PRINCIPAL_ROLE:ALL" +``` + +Which should return an access token: + +```json +{ + "access_token": "...", + "token_type": "bearer", + "issued_token_type": "urn:ietf:params:oauth:token-type:access_token", + "expires_in": 3600 +} +``` + +If you used a non-default realm name, add the appropriate request header to the `curl` command, +otherwise Polaris will resolve the realm to the first one in the configuration +`polaris.realm-context.realms`. Here is an example to set realm header: + +```bash +curl -X POST http://localhost:8181/api/catalog/v1/oauth/tokens \ + -H "Polaris-Realm: my-realm" \ + -d "grant_type=client_credentials" \ + -d "client_id=my-client-id" \ + -d "client_secret=my-client-secret" \ + -d "scope=PRINCIPAL_ROLE:ALL" +``` + +### Disable FILE Storage Type +By default, Polaris allows using the local file system (`FILE`) for catalog storage. This is fine for testing, +but **not recommended for production**. To disable it, set the supported storage types like this: +```hocon +polaris.features."SUPPORTED_CATALOG_STORAGE_TYPES" = [ "S3", "Azure" ] +``` +Leave out `FILE` to prevent its use. Only include the storage types your setup needs. + +### Upgrade Considerations + +The [Polaris Evolution](../evolution) page discusses backward compatibility and +upgrade concerns. diff --git a/1.2.0/entities.md b/1.2.0/entities.md new file mode 100644 index 0000000000..df53a0787f --- /dev/null +++ b/1.2.0/entities.md @@ -0,0 +1,91 @@ +--- +# +# Licensed to the Apache Software Foundation (ASF) under one +# or more contributor license agreements. See the NOTICE file +# distributed with this work for additional information +# regarding copyright ownership. The ASF licenses this file +# to you under the Apache License, Version 2.0 (the +# "License"); you may not use this file except in compliance +# with the License. You may obtain a copy of the License at +# +# http://www.apache.org/licenses/LICENSE-2.0 +# +# Unless required by applicable law or agreed to in writing, +# software distributed under the License is distributed on an +# "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY +# KIND, either express or implied. See the License for the +# specific language governing permissions and limitations +# under the License. +# +Title: Entities +type: docs +weight: 400 +--- + +This page documents various entities that can be managed in Apache Polaris (Incubating). + +## Catalog + +A catalog is a top-level entity in Polaris that may contain other entities like [namespaces](#namespace) and [tables](#table). These map directly to [Apache Iceberg catalogs](https://iceberg.apache.org/terms/#catalog). + +For information on managing catalogs with the REST API or for more information on what data can be associated with a catalog, see [the CreateCatalogRequest OpenAPI](https://github.com/apache/polaris/blob/main/spec/polaris-management-service.yml). + +### Storage Type + +All catalogs in Polaris are associated with a _storage type_. Valid Storage Types are `S3`, `Azure`, and `GCS`. The `FILE` type is also additionally available for testing. Each of these types relates to a different storage provider where data within the catalog may reside. Depending on the storage type, various other configurations may be set for a catalog including credentials to be used when accessing data inside the catalog. + +For details on how to use Storage Types in the REST API, see [the StorageConfigInfo OpenAPI](https://github.com/apache/polaris/blob/main/spec/polaris-management-service.yml). + +For usage examples of storage types, see [docs]({{% ref "command-line-interface" %}}). + +## Namespace + +A namespace is a logical entity that resides within a [catalog](#catalog) and can contain other entities such as [tables](#table) or [views](#view). Some other systems may refer to namespaces as _schemas_ or _databases_. + +In Polaris, namespaces can be nested. For example, `a.b.c.d.e.f.g` is a valid namespace. `b` is said to reside within `a`, and so on. + +For information on managing namespaces with the REST API or for more information on what data can be associated with a namespace, see [the CreateNamespaceRequest OpenAPI](https://github.com/apache/polaris/blob/main/spec/polaris-management-service.yml). + +## Table + +Polaris tables are entities that map to [Apache Iceberg tables](https://iceberg.apache.org/docs/nightly/configuration/), [Delta tables](https://docs.databricks.com/aws/en/delta/table-properties), or [Hudi tables](https://hudi.apache.org/docs/next/configurations#TABLE_CONFIG). + +For information on managing tables with the REST API or for more information on what data can be associated with a table, see [the CreateTableRequest OpenAPI](https://github.com/apache/polaris/blob/main/spec/polaris-management-service.yml). + +## View + +Polaris views are entities that map to [Apache Iceberg views](https://iceberg.apache.org/view-spec/). + +For information on managing views with the REST API or for more information on what data can be associated with a view, see [the CreateViewRequest OpenAPI](https://github.com/apache/polaris/blob/main/spec/polaris-management-service.yml). + +## Principal + +Polaris principals are unique identities that can be used to represent users or services. Each principal may have one or more [principal roles](#principal-role) assigned to it for the purpose of accessing catalogs and the entities within them. + +For information on managing principals with the REST API or for more information on what data can be associated with a principal, see [the CreatePrincipalRequest OpenAPI](https://github.com/apache/polaris/blob/main/spec/polaris-management-service.yml). + +## Principal Role + +Polaris principal roles are labels that may be granted to [principals](#principal). Each principal may have one or more principal roles, and the same principal role may be granted to multiple principals. Principal roles may be assigned based on the persona or responsibilities of a given principal, or on how that principal will need to access different entities within Polaris. + +For information on managing principal roles with the REST API or for more information on what data can be associated with a principal role, see [the CreatePrincipalRoleRequest OpenAPI](https://github.com/apache/polaris/blob/main/spec/polaris-management-service.yml) + +## Catalog Role + +Polaris catalog roles are labels that may be granted to [catalogs](#catalog). Each catalog may have one or more catalog roles, and the same catalog role may be granted to multiple catalogs. Catalog roles may be assigned based on the nature of data that will reside in a catalog, or by the groups of users and services that might need to access that data. + +Each catalog role may have multiple [privileges](#privilege) granted to it, and each catalog role can be granted to one or more [principal roles](#principal-role). This is the mechanism by which principals are granted access to entities inside a catalog such as namespaces and tables. + +## Policy + +Polaris policy is a set of rules governing actions on specified resources under predefined conditions. Polaris support policy for Iceberg table compaction, snapshot expiry, row-level access control, and custom policy definitions. + +Policy can be applied at catalog level, namespace level, or table level. Policy inheritance can be achieved by attaching one to a higher-level scope, such as namespace or catalog. As a result, tables registered under those entities do not need to be declared individually for the same policy. If a table or a namespace requires a different policy, user can assign a different policy, hence overriding policy of the same type declared at the higher level entities. + +## Privilege + +Polaris privileges are granted to [catalog roles](#catalog-role) in order to grant principals with a given principal role some degree of access to catalogs with a given catalog role. When a privilege is granted to a catalog role, any principal roles granted that catalog role receive the privilege. In turn, any principals who are granted that principal role receive it. + +A privilege can be scoped to any entity inside a catalog, including the catalog itself. + +For a list of supported privileges for each privilege class, see [the OpenAPI](https://github.com/apache/polaris/blob/main/spec/polaris-management-service.yml) (TablePrivilege, ViewPrivilege, NamespacePrivilege, CatalogPrivilege). diff --git a/1.2.0/evolution.md b/1.2.0/evolution.md new file mode 100644 index 0000000000..b3a57c7525 --- /dev/null +++ b/1.2.0/evolution.md @@ -0,0 +1,115 @@ +--- +# +# Licensed to the Apache Software Foundation (ASF) under one +# or more contributor license agreements. See the NOTICE file +# distributed with this work for additional information +# regarding copyright ownership. The ASF licenses this file +# to you under the Apache License, Version 2.0 (the +# "License"); you may not use this file except in compliance +# with the License. You may obtain a copy of the License at +# +# http://www.apache.org/licenses/LICENSE-2.0 +# +# Unless required by applicable law or agreed to in writing, +# software distributed under the License is distributed on an +# "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY +# KIND, either express or implied. See the License for the +# specific language governing permissions and limitations +# under the License. +# +title: Polaris Evolution +type: docs +weight: 1000 +--- + +This page discusses what can be expected from Apache Polaris as the project evolves. + +## Using Polaris as a Catalog + +Polaris is primarily intended to be used as a Catalog of Tables and Views. As such, +it implements the Iceberg REST Catalog API and its own REST APIs. + +Revisions of the Iceberg REST Catalog API are controlled by the [Apache Iceberg](https://iceberg.apache.org/) +community. Polaris attempts to accurately implement this specification. Nonetheless, +optional REST Catalog features may or may not be supported immediately. In general, +there is no guarantee that Polaris releases always implement the latest version of +the Iceberg REST Catalog API. + +Any API under Polaris control that is not in an "experimental" or "beta" state +(e.g. the Management API) is maintained as a versioned REST API. New releases of Polaris +may include changes to the current version of the API. When that happens those changes +are intended to be compatible with prior versions of Polaris clients. Certain endpoints +and parameters may be deprecated. + +In case a major change is required to an API that cannot be implemented in a +backward-compatible way, new endpoints (URI paths) may be introduced. New URI "roots" may +be introduced too (e.g. `api/catalog/v2`). + +Note that those "v1", "v2", etc. URI path segments are not meant to be 1:1 with Polaris +releases or Polaris project version numbers (e.g. a "v2" path segment does not mean that +it is added in Polaris 2.0). + +Polaris servers will support deprecated API endpoints / parameters / versions / etc. +for some transition period to allow clients to migrate. + +### Managing Polaris Database + +Polaris stores its data in a database, which is sometimes referred to as "Metastore" or +"Persistence" in other docs. + +Each Polaris release may support multiple Persistence [implementations](../metastores), +for example, "EclipseLink" (deprecated) and "JDBC" (current). + +Each type of Persistence evolves individually. Within each Persistence type, Polaris +attempts to support rolling upgrades (both version X and X + 1 servers running at the +same time). + +However, migrating between different Persistence types is not supported in a rolling +upgrade manner (for example, migrating from "EclipseLink" to "JDBC"). Polaris provides +[tools](https://github.com/apache/polaris-tools/) for migrating between different +catalogs and those tools may be used to migrate between different Persistence types +as well. Service interruption (downtime) should be expected in those cases. + +## Using Polaris as a Build-Time Dependency + +Polaris produces several jars. These jars or custom builds of Polaris code may be used in +downstream projects according to the terms of the license included into Polaris distributions. + +The minimal version of the JRE required by Polaris code (compilation target) may be updated in +any release. Different Polaris jars may have different minimal JRE version requirements. + +Changes in Java class should be expected at any time regardless of the module name or +whether the class / method is `public` or not. + +This approach is not meant to discourage the use of Polaris code in downstream projects, but +to allow more flexibility in evolving the codebase to support new catalog-level features +and improve code efficiency. Maintainers of downstream projects are encouraged to join Polaris +mailing lists to monitor project changes, suggest improvements, and engage with the Polaris +community in case of specific compatibility concerns. + +## Semantic Versioning + +Polaris strives to follow [Semantic Versioning](https://semver.org/) conventions both with +respect to REST APIs (beta and experimental APIs excepted), [Polaris Policies](../policy/) +and user-facing [configuration](../configuration/). + +The following are some examples of Polaris approach to SemVer in REST APIs / configuration. +These examples are for illustration purposes and should not be considered to be +exhaustive. + +* Polaris implementing an optional Iceberg REST Catalog feature that was unimplemented +in the previous release is not considered a major change. + +* Supporting a new revision of the Iceberg REST Catalog spec in a backward-compatible way +is not considered a major change. Specifically, supporting new REST API prefixes (e.g. `v2`) +is not a major change because it does not affect older clients. + +* Changing the implementation of an Iceberg REST Catalog feature / endpoint in a non-backward +compatible way (e.g. removing or renaming a request parameter) is a major change. + +* Dropping support for a configuration property with the `polaris.` name prefix is a major change. + +* Dropping support for any previously defined [Policy](../policy/) type or property is a major change. + +* Upgrading Quarkus Runtime to its next major version is a major change (because +Quarkus-managed configuration may change). diff --git a/1.2.0/federation/_index.md b/1.2.0/federation/_index.md new file mode 100644 index 0000000000..e4fbe261a0 --- /dev/null +++ b/1.2.0/federation/_index.md @@ -0,0 +1,26 @@ +--- +# +# Licensed to the Apache Software Foundation (ASF) under one +# or more contributor license agreements. See the NOTICE file +# distributed with this work for additional information +# regarding copyright ownership. The ASF licenses this file +# to you under the Apache License, Version 2.0 (the +# "License"); you may not use this file except in compliance +# with the License. You may obtain a copy of the License at +# +# http://www.apache.org/licenses/LICENSE-2.0 +# +# Unless required by applicable law or agreed to in writing, +# software distributed under the License is distributed on an +# "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY +# KIND, either express or implied. See the License for the +# specific language governing permissions and limitations +# under the License. +# +title: Federation +type: docs +weight: 703 +--- + +Guides for federating Polaris with existing metadata services. Expand this section to select a +specific integration. diff --git a/1.2.0/federation/hive-metastore-federation.md b/1.2.0/federation/hive-metastore-federation.md new file mode 100644 index 0000000000..0d39a5e4a0 --- /dev/null +++ b/1.2.0/federation/hive-metastore-federation.md @@ -0,0 +1,125 @@ +--- +# +# Licensed to the Apache Software Foundation (ASF) under one +# or more contributor license agreements. See the NOTICE file +# distributed with this work for additional information +# regarding copyright ownership. The ASF licenses this file +# to you under the Apache License, Version 2.0 (the +# "License"); you may not use this file except in compliance +# with the License. You may obtain a copy of the License at +# +# http://www.apache.org/licenses/LICENSE-2.0 +# +# Unless required by applicable law or agreed to in writing, +# software distributed under the License is distributed on an +# "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY +# KIND, either express or implied. See the License for the +# specific language governing permissions and limitations +# under the License. +# +title: Hive Metastore Federation +type: docs +weight: 705 +--- + +Polaris can federate catalog operations to an existing Hive Metastore (HMS). This lets an external +HMS remain the source of truth for table metadata while Polaris brokers access, policies, and +multi-engine connectivity. + +## Build-time enablement + +The Hive factory is packaged as an optional extension and is not baked into default server builds. +Include it when assembling the runtime or container images by setting the `NonRESTCatalogs` Gradle +property to include `HIVE` (and any other non-REST backends you need): + +```bash +./gradlew :polaris-server:assemble :polaris-server:quarkusAppPartsBuild --rerun \ + -DNonRESTCatalogs=HIVE -Dquarkus.container-image.build=true +``` + +`runtime/server/build.gradle.kts` wires the extension in only when this flag is present, so binaries +built without it will reject Hive federation requests. + +## Runtime requirements + +- **Metastore connectivity:** Expose the HMS Thrift endpoint (`thrift://host:port`) to the Polaris + deployment. +- **Configuration discovery:** Iceberg’s `HiveCatalog` loads Hadoop/Hive client settings from the + classpath. Provide `hive-site.xml` (and `core-site.xml` if needed) via + `HADOOP_CONF_DIR`/`HIVE_CONF_DIR` or an image layer. +- **Authentication:** Hive federation only supports `IMPLICIT` authentication, meaning Polaris uses + the operating-system or Kerberos identity of the running process (no stored secrets). Ensure the + service principal is logged in or holds a valid keytab/TGT before starting Polaris. +- **Object storage role:** Configure `polaris.service-identity..aws-iam.*` (or the default + realm) so the server can assume the AWS role referenced by the catalog. The IAM role must allow + STS access from the Polaris service identity and grant permissions to the table locations. + +### Kerberos setup example + +If your Hive Metastore enforces Kerberos, stage the necessary configuration alongside Polaris: + +```bash +export KRB5_CONFIG=/etc/polaris/krb5.conf +export HADOOP_CONF_DIR=/etc/polaris/hadoop-conf # contains hive-site.xml with HMS principal +export HADOOP_OPTS="-Djava.security.auth.login.config=/etc/polaris/jaas.conf" +kinit -kt /etc/polaris/keytabs/polaris.keytab polaris/service@EXAMPLE.COM +``` + +- `hive-site.xml` must define `hive.metastore.sasl.enabled=true`, the metastore principal, and + client principal pattern (for example `hive.metastore.client.kerberos.principal=polaris/_HOST@REALM`). +- The JAAS entry (referenced by `java.security.auth.login.config`) should use `useKeyTab=true` and + point to the same keytab shown above so the Polaris JVM can refresh credentials automatically. +- Keep the keytab readable solely by the Polaris service user; the implicit authenticator consumes + the TGT at startup and for periodic renewal. + +## Creating a federated catalog + +Use the Management API (or the Python CLI) to create an external catalog whose connection type is +`HIVE`. The following request registers a catalog that proxies to an HMS running on +`thrift://hms.example.internal:9083`: + +```bash +curl -X POST https:///management/v1/catalogs \ + -H "Authorization: Bearer $TOKEN" \ + -H "Content-Type: application/json" \ + -d '{ + "type": "EXTERNAL", + "name": "analytics_hms", + "storageConfigInfo": { + "storageType": "S3", + "roleArn": "arn:aws:iam::123456789012:role/polaris-warehouse-access", + "region": "us-east-1" + }, + "properties": { "default-base-location": "s3://analytics-bucket/warehouse/" }, + "connectionConfigInfo": { + "connectionType": "HIVE", + "uri": "thrift://hms.example.internal:9083", + "warehouse": "s3://analytics-bucket/warehouse/", + "authenticationParameters": { "authenticationType": "IMPLICIT" } + } + }' +``` + +Grant catalog roles to principal roles exactly as you would for internal catalogs so engines can +obtain tokens that authorize against the federated metadata. + +`default-base-location` is required; it tells Polaris and Iceberg where to place new metadata files. +`allowedLocations` is optional—supply it only when you want to restrict writers to a specific set of +prefixes. If your IAM trust policy requires an `externalId` or explicit `userArn`, include those +optional fields in `storageConfigInfo`. Polaris persists them and supplies them when assuming the +role cited by `roleArn` during metadata commits. + +## Limitations and operational notes + +- **Single identity:** Because only `IMPLICIT` authentication is permitted, Polaris cannot mix + multiple Hive identities in a single deployment (`HiveFederatedCatalogFactory` rejects other auth + types). Plan a deployment topology that aligns the Polaris process identity with the target HMS. +- **Generic tables:** The Hive extension exposes Iceberg tables registered in HMS. Generic table + federation is not implemented (`HiveFederatedCatalogFactory#createGenericCatalog` throws + `UnsupportedOperationException`). +- **Configuration caching:** Atlas-style catalog failover and multi-HMS routing are not yet handled; + Polaris initializes one `HiveCatalog` per connection and relies on the underlying Iceberg client + for retries. + +With these constraints satisfied, Polaris can sit in front of an HMS so that Iceberg tables managed +there gain OAuth-protected, multi-engine access through the Polaris REST APIs. diff --git a/1.2.0/federation/iceberg-rest-federation.md b/1.2.0/federation/iceberg-rest-federation.md new file mode 100644 index 0000000000..8318f45095 --- /dev/null +++ b/1.2.0/federation/iceberg-rest-federation.md @@ -0,0 +1,71 @@ +--- +# +# Licensed to the Apache Software Foundation (ASF) under one +# or more contributor license agreements. See the NOTICE file +# distributed with this work for additional information +# regarding copyright ownership. The ASF licenses this file +# to you under the Apache License, Version 2.0 (the +# "License"); you may not use this file except in compliance +# with the License. You may obtain a copy of the License at +# +# http://www.apache.org/licenses/LICENSE-2.0 +# +# Unless required by applicable law or agreed to in writing, +# software distributed under the License is distributed on an +# "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY +# KIND, either express or implied. See the License for the +# specific language governing permissions and limitations +# under the License. +# +title: Iceberg REST Federation +type: docs +weight: 704 +--- + +Polaris can federate an external Iceberg REST catalog (e.g., another Polaris deployment, AWS Glue, or a custom Iceberg +REST implementation), enabling a Polaris service to access table and view entities managed by remote Iceberg REST Catalogs. + +## Runtime requirements + +- **REST endpoint:** The remote service must expose the Iceberg REST specification. Configure + firewalls so Polaris can reach the base URI you provide in the connection config. +- **Authentication:** Polaris forwards requests using the credentials defined in + `ConnectionConfigInfo.AuthenticationParameters`. OAuth2 client credentials, bearer tokens, and AWS + SigV4 are supported; choose the scheme the remote service expects. + +## Creating a federated REST catalog + +The snippet below registers an external catalog that forwards to a remote Polaris server using OAuth2 +client credentials. `iceberg-remote-catalog-name` is optional; supply it when the remote server multiplexes +multiple logical catalogs under one URI. + +```bash +polaris catalogs create \ + --type EXTERNAL \ + --storage-type s3 \ + --role-arn "arn:aws:iam::123456789012:role/polaris-warehouse-access" \ + --default-base-location "s3://analytics-bucket/warehouse/" \ + --catalog-connection-type iceberg-rest \ + --iceberg-remote-catalog-name analytics \ + --catalog-uri "https://remote-polaris.example.com/catalog/v1" \ + --catalog-authentication-type OAUTH \ + --catalog-token-uri "https://remote-polaris.example.com/catalog/v1/oauth/tokens" \ + --catalog-client-id "" \ + --catalog-client-secret "" \ + --catalog-client-scopes "PRINCIPAL_ROLE:ALL" \ + analytics_rest +``` + +Refer to the [CLI documentation](../command-line-interface.md#catalogs) for details on alternative authentication types such as BEARER or SIGV4. + +Grant catalog roles to principal roles the same way you do for internal catalogs so compute engines +receive tokens with access to the federated namespace. + +## Operational notes + +- **Connectivity checks:** Polaris does not lazily probe the remote service; catalog creation fails if + the REST endpoint is unreachable or authentication is rejected. +- **Feature parity:** Federation exposes whatever table/namespace operations the remote service + implements. Unsupported features return the remote error directly to callers. +- **Generic tables:** The REST federation path currently surfaces Iceberg tables only; generic table + federation is not implemented. diff --git a/1.2.0/generic-table.md b/1.2.0/generic-table.md new file mode 100644 index 0000000000..63ef38a1da --- /dev/null +++ b/1.2.0/generic-table.md @@ -0,0 +1,169 @@ +--- +# +# Licensed to the Apache Software Foundation (ASF) under one +# or more contributor license agreements. See the NOTICE file +# distributed with this work for additional information +# regarding copyright ownership. The ASF licenses this file +# to you under the Apache License, Version 2.0 (the +# "License"); you may not use this file except in compliance +# with the License. You may obtain a copy of the License at +# +# http://www.apache.org/licenses/LICENSE-2.0 +# +# Unless required by applicable law or agreed to in writing, +# software distributed under the License is distributed on an +# "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY +# KIND, either express or implied. See the License for the +# specific language governing permissions and limitations +# under the License. +# +title: Generic Table (Beta) +type: docs +weight: 435 +--- + +The Generic Table in Apache Polaris is designed to provide support for non-Iceberg tables across different table formats includes delta, csv etc. It currently provides the following capabilities: +- Create a generic table under a namespace +- Load a generic table +- Drop a generic table +- List all generic tables under a namespace + +**NOTE** The current generic table is in beta release. Please use it with caution and report any issue if encountered. + +## What is a Generic Table? + +A generic table in Polaris is an entity that defines the following fields: + +- **name** (required): A unique identifier for the table within a namespace +- **format** (required): The format for the generic table, i.e. "delta", "csv" +- **base-location** (optional): Table base location in URI format. For example: s3:///path/to/table + - The table base location is a location that includes all files for the table + - A table with multiple disjoint locations (i.e. containing files that are outside the configured base location) is not compliant with the current generic table support in Polaris. + - If no location is provided, clients or users are responsible for managing the location. +- **properties** (optional): Properties for the generic table passed on creation. + - Currently, there is no reserved property key defined. + - The property definition and interpretation is delegated to client or engine implementations. +- **doc** (optional): Comment or description for the table + +## Generic Table API Vs. Iceberg Table API + +Generic Table provides a different set of APIs to operate on the generic table entities while Iceberg APIs operates on +the Iceberg table entities. + +| Operations | **Iceberg Table API** | **Generic Table API** | +|--------------|---------------------------------------------------------------------------------------------------------------------------------------------------------------------|-------------------------------------------------------------------------------------------------------------------------------| +| Create Table | Create an Iceberg table | Create a generic table | +| Load Table | Load an Iceberg table. If the table to load is a generic table, you need to call the Generic Table loadTable API, otherwise a TableNotFoundException will be thrown | Load a generic table. Similarly, try to load an Iceberg table through Generic Table API will thrown a TableNotFoundException. | +| Drop Table | Drop an Iceberg table. Similar as load table, if the table to drop is a Generic table, a tableNotFoundException will be thrown. | Drop a generic table. Drop an Iceberg table through Generic table endpoint will thrown an TableNotFound Exception | +| List Table | List all Iceberg tables | List all generic tables | + +Note that generic table shares the same namespace with Iceberg tables, the table name has to be unique under the same namespace. Furthermore, since +there is currently no support for Update Generic Table, any update to the existing table requires a drop and re-create. + +## Working with Generic Table + +There are two ways to work with Polaris Generic Tables today: +1) Directly communicate with Polaris through REST API calls using tools such as `curl`. Details will be described in the later section. +2) Use the Spark client provided if you are working with Spark. Please refer to [Polaris Spark Client]({{% ref "polaris-spark-client" %}}) for detailed instructions. + +### Create a Generic Table + +To create a generic table, you need to provide the corresponding fields as described in [What is a Generic Table](#what-is-a-generic-table). + +The REST API for creating a generic Table is `POST /polaris/v1/{prefix}/namespaces/{namespace}/generic-tables`, and the +request body looks like the following: + +```json +{ + "name": "", + "format": "", + "base-location": "", + "doc": "", + "properties": { + "": "" + } +} +``` + +Here is an example to create a generic table with name `delta_table` and format as `delta` under a namespace `delta_ns` +for catalog `delta_catalog` using curl: + +```shell +curl -X POST http://localhost:8181/api/catalog/polaris/v1/delta_catalog/namespaces/delta_ns/generic-tables \ + -H "Content-Type: application/json" \ + -d '{ + "name": "delta_table", + "format": "delta", + "base-location": "s3:///path/to/table", + "doc": "delta table example", + "properties": { + "key1": "value1" + } + }' +``` + +### Load a Generic Table +The REST endpoint for load a generic table is `GET /polaris/v1/{prefix}/namespaces/{namespace}/generic-tables/{generic-table}`. + +Here is an example to load the table `delta_table` using curl: +```shell +curl -X GET http://localhost:8181/api/catalog/polaris/v1/delta_catalog/namespaces/delta_ns/generic-tables/delta_table +``` +And the response looks like the following: +```json +{ + "table": { + "name": "delta_table", + "format": "delta", + "base-location": "s3:///path/to/table", + "doc": "delta table example", + "properties": { + "key1": "value1" + } + } +} +``` + +### List Generic Tables +The REST endpoint for listing the generic tables under a given +namespace is `GET /polaris/v1/{prefix}/namespaces/{namespace}/generic-tables/`. + +Following curl command lists all tables under namespace delta_namespace: +```shell +curl -X GET http://localhost:8181/api/catalog/polaris/v1/delta_catalog/namespaces/delta_ns/generic-tables/ +``` +Example Response: +```json +{ + "identifiers": [ + { + "namespace": ["delta_ns"], + "name": "delta_table" + } + ], + "next-page-token": null +} +``` + +### Drop a Generic Table +The drop generic table REST endpoint is `DELETE /polaris/v1/{prefix}/namespaces/{namespace}/generic-tables/{generic-table}` + +The following curl call drops the table `delat_table`: +```shell +curl -X DELETE http://localhost:8181/api/catalog/polaris/v1/delta_catalog/namespaces/delta_ns/generic-tables/{generic-table} +``` + +### API Reference + +For the complete and up-to-date API specification, see the [Catalog API Spec](https://editor-next.swagger.io/?url=https://raw.githubusercontent.com/apache/polaris/refs/heads/main/spec/generated/bundled-polaris-catalog-service.yaml). + +## Limitations + +Current limitations of Generic Table support: +1) Limited spec information. Currently, there is no spec for information like Schema, Partition etc. +2) No commit coordination or update capability provided at the catalog service level. + +Therefore, the catalog itself is unaware of anything about the underlying table except some of the loosely defined metadata. +It is the responsibility of the engine (and plugins used by the engine) to determine exactly how loading or commiting data +should look like based on the metadata. For example, with the delta support, th delta log serialization, deserialization +and update all happens at client side. diff --git a/1.2.0/getting-started/_index.md b/1.2.0/getting-started/_index.md new file mode 100644 index 0000000000..1707ceacd2 --- /dev/null +++ b/1.2.0/getting-started/_index.md @@ -0,0 +1,39 @@ +--- +# +# Licensed to the Apache Software Foundation (ASF) under one +# or more contributor license agreements. See the NOTICE file +# distributed with this work for additional information +# regarding copyright ownership. The ASF licenses this file +# to you under the Apache License, Version 2.0 (the +# "License"); you may not use this file except in compliance +# with the License. You may obtain a copy of the License at +# +# http://www.apache.org/licenses/LICENSE-2.0 +# +# Unless required by applicable law or agreed to in writing, +# software distributed under the License is distributed on an +# "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY +# KIND, either express or implied. See the License for the +# specific language governing permissions and limitations +# under the License. +# +title: Getting Started with Apache Polaris +linkTitle: Getting Started +type: docs +weight: 101 +--- + +The fastest way to get started is with our Docker Compose examples. Each example provides a complete working environment with detailed instructions. + +## Next Steps + +1. Check/Install dependencies +2. Choose the way you want to deploy Polaris +3. Create a catalog +4. Check Using polaris page + +## Getting Help + +- Documentation: https://polaris.apache.org +- GitHub Issues: https://github.com/apache/polaris/issues +- Slack: [Join Apache Polaris Community](https://join.slack.com/t/apache-polaris/shared_invite/zt-2y3l3r0fr-VtoW42ltir~nSzCYOrQgfw) diff --git a/1.2.0/getting-started/creating-a-catalog/_index.md b/1.2.0/getting-started/creating-a-catalog/_index.md new file mode 100644 index 0000000000..eeaf431733 --- /dev/null +++ b/1.2.0/getting-started/creating-a-catalog/_index.md @@ -0,0 +1,54 @@ +--- +# +# Licensed to the Apache Software Foundation (ASF) under one +# or more contributor license agreements. See the NOTICE file +# distributed with this work for additional information +# regarding copyright ownership. The ASF licenses this file +# to you under the Apache License, Version 2.0 (the +# "License"); you may not use this file except in compliance +# with the License. You may obtain a copy of the License at +# +# http://www.apache.org/licenses/LICENSE-2.0 +# +# Unless required by applicable law or agreed to in writing, +# software distributed under the License is distributed on an +# "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY +# KIND, either express or implied. See the License for the +# specific language governing permissions and limitations +# under the License. +# +title: Creating a Catalog +linkTitle: Creating a Catalog +type: docs +weight: 300 +--- + +The following Object Storage providers can be configured as storage backends for your Polaris catalog: + +- [S3 compatible object stores]({{< ref "s3.md" >}}) +- [Google Cloud Storage]({{< ref "catalog-gcs.md" >}}) +- [Azure Blob Storage]({{< ref "catalog-azure.md" >}}) +- Local file system (By default for testing only) + + +## Create a catalog using polaris CLI + +Check full list of options for the `polaris catalogs create` command [here]({{% ref "../../command-line-interface#create" %}}) + +### Example + +```shell +CLIENT_ID=root \ +CLIENT_SECRET=s3cr3t \ +DEFAULT_BASE_LOCATION=s3://example-bucket/my_data \ +ROLE_ARN=arn:aws:iam::111122223333:role/ExampleCorpRole \ +./polaris \ + --client-id ${CLIENT_ID} \ + --client-secret ${CLIENT_SECRET} \ + catalogs \ + create \ + --storage-type s3 \ + --default-base-location ${DEFAULT_BASE_LOCATION} \ + --role-arn ${ROLE_ARN} \ + my_catalog +``` diff --git a/1.2.0/getting-started/creating-a-catalog/catalog-azure.md b/1.2.0/getting-started/creating-a-catalog/catalog-azure.md new file mode 100644 index 0000000000..8666f28876 --- /dev/null +++ b/1.2.0/getting-started/creating-a-catalog/catalog-azure.md @@ -0,0 +1,55 @@ +--- +# +# Licensed to the Apache Software Foundation (ASF) under one +# or more contributor license agreements. See the NOTICE file +# distributed with this work for additional information +# regarding copyright ownership. The ASF licenses this file +# to you under the Apache License, Version 2.0 (the +# "License"); you may not use this file except in compliance +# with the License. You may obtain a copy of the License at +# +# http://www.apache.org/licenses/LICENSE-2.0 +# +# Unless required by applicable law or agreed to in writing, +# software distributed under the License is distributed on an +# "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY +# KIND, either express or implied. See the License for the +# specific language governing permissions and limitations +# under the License. +# +title: Creating a catalog on Azure +linkTitle: Azure +type: docs +weight: 300 +--- + +For the `polaris catalogs create` [command]({{% ref "../../command-line-interface#create" %}}) there are few `azure` only options + +```text +--storage-type azure +--tenant-id (Required for Azure) A tenant ID to use when connecting to Azure Storage +--multi-tenant-app-name (Only for Azure) The app name to use when connecting to Azure Storage +--consent-url (Only for Azure) A consent URL granting permissions for the Azure Storage location +``` + +### example + +```shell +CLIENT_ID=root \ +CLIENT_SECRET=s3cr3t \ +DEFAULT_BASE_LOCATION=abfss://tenant123@blob.core.windows.net \ +TENANT_ID=tenant123.onmicrosoft.com \ +MULTI_TENANT_APP_NAME=myapp \ +CONSENT_URL=https://myapp.com/consent +./polaris \ + --client-id ${CLIENT_ID} \ + --client-secret ${CLIENT_SECRET} \ + catalogs \ + create \ + --storage-type azure \ + --tenant-id ${TENANT_ID} \ + --multi-tenant-app-name ${MULTI_TENANT_APP_NAME} \ + --consent-url ${CONSENT_URL} \ + --default-base-location ${DEFAULT_BASE_LOCATION} \ + my_azure_catalog +``` \ No newline at end of file diff --git a/1.2.0/getting-started/creating-a-catalog/catalog-gcs.md b/1.2.0/getting-started/creating-a-catalog/catalog-gcs.md new file mode 100644 index 0000000000..db6214e38c --- /dev/null +++ b/1.2.0/getting-started/creating-a-catalog/catalog-gcs.md @@ -0,0 +1,49 @@ +--- +# +# Licensed to the Apache Software Foundation (ASF) under one +# or more contributor license agreements. See the NOTICE file +# distributed with this work for additional information +# regarding copyright ownership. The ASF licenses this file +# to you under the Apache License, Version 2.0 (the +# "License"); you may not use this file except in compliance +# with the License. You may obtain a copy of the License at +# +# http://www.apache.org/licenses/LICENSE-2.0 +# +# Unless required by applicable law or agreed to in writing, +# software distributed under the License is distributed on an +# "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY +# KIND, either express or implied. See the License for the +# specific language governing permissions and limitations +# under the License. +# +title: Creating a catalog on Google Cloud Storage (GCS) +linkTitle: GCS +type: docs +weight: 200 +--- + +For the `polaris catalogs create` [command]({{% ref "../../command-line-interface#create" %}}) there are few `gcs` only options + +```text +--storage-type gcs +--service-account (Only for GCS) The service account to use when connecting to GCS +``` + +### example + +```shell +CLIENT_ID=root \ +CLIENT_SECRET=s3cr3t \ +DEFAULT_BASE_LOCATION=gs://my-ml-bucket/predictions/ \ +SERVICE_ACCOUNT=serviceAccount:my-service-account@my-project.iam.gserviceaccount.com \ +./polaris \ + --client-id ${CLIENT_ID} \ + --client-secret ${CLIENT_SECRET} \ + catalogs \ + create \ + --storage-type gcs \ + --service-account ${SERVICE_ACCOUNT} \ + --default-base-location ${DEFAULT_BASE_LOCATION} \ + my_gcs_catalog +``` \ No newline at end of file diff --git a/1.2.0/getting-started/creating-a-catalog/s3/_index.md b/1.2.0/getting-started/creating-a-catalog/s3/_index.md new file mode 100644 index 0000000000..538bca17a9 --- /dev/null +++ b/1.2.0/getting-started/creating-a-catalog/s3/_index.md @@ -0,0 +1,38 @@ +--- +# +# Licensed to the Apache Software Foundation (ASF) under one +# or more contributor license agreements. See the NOTICE file +# distributed with this work for additional information +# regarding copyright ownership. The ASF licenses this file +# to you under the Apache License, Version 2.0 (the +# "License"); you may not use this file except in compliance +# with the License. You may obtain a copy of the License at +# +# http://www.apache.org/licenses/LICENSE-2.0 +# +# Unless required by applicable law or agreed to in writing, +# software distributed under the License is distributed on an +# "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY +# KIND, either express or implied. See the License for the +# specific language governing permissions and limitations +# under the License. +# +title: Creating a catalog on S3 compatible cloud providers +linkTitle: S3 +type: docs +weight: 100 +--- + +The following S3 compatible cloud providers can be configured as storage backends for your Polaris catalog: + +- [AWS S3]({{< ref "catalog-aws.md" >}}) +- [MinIO]({{< ref "catalog-minio.md" >}}) + +For the `polaris catalogs create` [command]({{% ref "../../../command-line-interface#create" %}}) there are few `s3` only options + +```text +--storage-type s3 +--role-arn (Only for AWS S3) A role ARN to use when connecting to S3 +--region (Only for S3) The region to use when connecting to S3 +--external-id (Only for S3) The external ID to use when connecting to S3 +``` diff --git a/1.2.0/getting-started/creating-a-catalog/s3/catalog-aws.md b/1.2.0/getting-started/creating-a-catalog/s3/catalog-aws.md new file mode 100644 index 0000000000..b86ac874f8 --- /dev/null +++ b/1.2.0/getting-started/creating-a-catalog/s3/catalog-aws.md @@ -0,0 +1,52 @@ +--- +# +# Licensed to the Apache Software Foundation (ASF) under one +# or more contributor license agreements. See the NOTICE file +# distributed with this work for additional information +# regarding copyright ownership. The ASF licenses this file +# to you under the Apache License, Version 2.0 (the +# "License"); you may not use this file except in compliance +# with the License. You may obtain a copy of the License at +# +# http://www.apache.org/licenses/LICENSE-2.0 +# +# Unless required by applicable law or agreed to in writing, +# software distributed under the License is distributed on an +# "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY +# KIND, either express or implied. See the License for the +# specific language governing permissions and limitations +# under the License. +# +title: Creating a catalog on AWS S3 +linkTitle: AWS +type: docs +weight: 100 +--- + +When creating a catalog based on AWS S3 storage only the `role-arn` is a required parameter. However, usually +one also provides the `region` and +[external-id](https://docs.aws.amazon.com/IAM/latest/UserGuide/id_roles_common-scenarios_third-party.html). + +Note: the name `quickstart_catalog` from the example below is referenced in other Getting Started examples, +but of course, it can be any valid catalog name. + +```shell +CLIENT_ID=root +CLIENT_SECRET=s3cr3t +DEFAULT_BASE_LOCATION=s3://example-bucket/my_data +ROLE_ARN=arn:aws:iam::111122223333:role/ExampleCorpRole +REGION=us-west-2 +EXTERNAL_ID=12345678901234567890 + +./polaris \ + --client-id ${CLIENT_ID} \ + --client-secret ${CLIENT_SECRET} \ + catalogs \ + create \ + --storage-type s3 \ + --default-base-location ${DEFAULT_BASE_LOCATION} \ + --role-arn ${ROLE_ARN} \ + --region ${REGION} \ + --external-id ${EXTERNAL_ID} \ + quickstart_catalog +``` \ No newline at end of file diff --git a/1.2.0/getting-started/creating-a-catalog/s3/catalog-minio.md b/1.2.0/getting-started/creating-a-catalog/s3/catalog-minio.md new file mode 100644 index 0000000000..cdeeb12775 --- /dev/null +++ b/1.2.0/getting-started/creating-a-catalog/s3/catalog-minio.md @@ -0,0 +1,63 @@ +--- +# +# Licensed to the Apache Software Foundation (ASF) under one +# or more contributor license agreements. See the NOTICE file +# distributed with this work for additional information +# regarding copyright ownership. The ASF licenses this file +# to you under the Apache License, Version 2.0 (the +# "License"); you may not use this file except in compliance +# with the License. You may obtain a copy of the License at +# +# http://www.apache.org/licenses/LICENSE-2.0 +# +# Unless required by applicable law or agreed to in writing, +# software distributed under the License is distributed on an +# "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY +# KIND, either express or implied. See the License for the +# specific language governing permissions and limitations +# under the License. +# +title: Creating a catalog on MinIO +linkTitle: MinIO +type: docs +weight: 200 +--- + +When creating a catalog based on MinIO storage it is important to configure the `endpoint` property to point +to your own MinIO cluster. If the `endpoint` property is not set, Polaris will attempt to contact AWS +storage services (which is certain to fail in this case). + +Note: the region setting is not required by MinIO, but it is set in this example for the sake of +simplicity as it is usually required by the AWS SDK (used internally by Polaris). One can also +set the `AWS_REGION` environment variable in the Polaris server process and avoid setting region +as a catalog property. + +Note: the name `quickstart_catalog` from the example below is referenced in other Getting Started examples, +but of course, it can be any valid catalog name. + +```shell +CLIENT_ID=root +CLIENT_SECRET=s3cr3t +DEFAULT_BASE_LOCATION=s3://example-bucket/my_data +REGION=us-west-2 + +./polaris \ + --client-id ${CLIENT_ID} \ + --client-secret ${CLIENT_SECRET} \ + catalogs \ + create \ + --storage-type s3 \ + --endpoint http://127.0.0.1:9100 + --default-base-location ${DEFAULT_BASE_LOCATION} \ + --region ${REGION} \ + quickstart_catalog +``` + +In more complex deployments it may be necessary to configure different endpoints for S3 requests +and for STS (AssumeRole) requests. This can be achieved via the `--sts-endpoint` CLI option. + +Additionally, the `--endpoint-internal` CLI option cane be used to set the S3 endpoint for use by +the Polaris Server itself, if it needs to be different from the endpoint used by clients / engines. + +A usable MinIO example for `docker-compose` is available in the Polaris source code under the +[getting-started/minio](https://github.com/apache/polaris/tree/main/getting-started/minio) module. diff --git a/1.2.0/getting-started/deploying-polaris/_index.md b/1.2.0/getting-started/deploying-polaris/_index.md new file mode 100644 index 0000000000..e975f69274 --- /dev/null +++ b/1.2.0/getting-started/deploying-polaris/_index.md @@ -0,0 +1,29 @@ +--- +# +# Licensed to the Apache Software Foundation (ASF) under one +# or more contributor license agreements. See the NOTICE file +# distributed with this work for additional information +# regarding copyright ownership. The ASF licenses this file +# to you under the Apache License, Version 2.0 (the +# "License"); you may not use this file except in compliance +# with the License. You may obtain a copy of the License at +# +# http://www.apache.org/licenses/LICENSE-2.0 +# +# Unless required by applicable law or agreed to in writing, +# software distributed under the License is distributed on an +# "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY +# KIND, either express or implied. See the License for the +# specific language governing permissions and limitations +# under the License. +# +title: Deploying Polaris +linkTitle: Deploying Polaris +type: docs +weight: 200 +--- + +Here you can find the guides of how to deploy Polaris locally, as well as with all supported Cloud Providers: Amazon Web Services (AWS), Azure, and Google Cloud Platform (GCP). + +Locally, Polaris can be deployed using both docker and local build. +On the cloud, the following tutorials will deploy Polaris using docker environment. diff --git a/1.2.0/getting-started/deploying-polaris/cloud-deploy/_index.md b/1.2.0/getting-started/deploying-polaris/cloud-deploy/_index.md new file mode 100644 index 0000000000..56626bc2c3 --- /dev/null +++ b/1.2.0/getting-started/deploying-polaris/cloud-deploy/_index.md @@ -0,0 +1,27 @@ +--- +# +# Licensed to the Apache Software Foundation (ASF) under one +# or more contributor license agreements. See the NOTICE file +# distributed with this work for additional information +# regarding copyright ownership. The ASF licenses this file +# to you under the Apache License, Version 2.0 (the +# "License"); you may not use this file except in compliance +# with the License. You may obtain a copy of the License at +# +# http://www.apache.org/licenses/LICENSE-2.0 +# +# Unless required by applicable law or agreed to in writing, +# software distributed under the License is distributed on an +# "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY +# KIND, either express or implied. See the License for the +# specific language governing permissions and limitations +# under the License. +# +title: Deploying Polaris on Cloud Providers +linkTitle: Cloud Providers +type: docs +weight: 300 +--- + +Polaris can be deployed on various cloud providers, including Amazon Web Services (AWS), Azure, and Google Cloud Platform (GCP). +In the following guides, we will walk you through the process of deploying Polaris on each of these cloud providers. diff --git a/1.2.0/getting-started/deploying-polaris/cloud-deploy/deploy-aws.md b/1.2.0/getting-started/deploying-polaris/cloud-deploy/deploy-aws.md new file mode 100644 index 0000000000..d62e13e093 --- /dev/null +++ b/1.2.0/getting-started/deploying-polaris/cloud-deploy/deploy-aws.md @@ -0,0 +1,60 @@ +--- +# +# Licensed to the Apache Software Foundation (ASF) under one +# or more contributor license agreements. See the NOTICE file +# distributed with this work for additional information +# regarding copyright ownership. The ASF licenses this file +# to you under the Apache License, Version 2.0 (the +# "License"); you may not use this file except in compliance +# with the License. You may obtain a copy of the License at +# +# http://www.apache.org/licenses/LICENSE-2.0 +# +# Unless required by applicable law or agreed to in writing, +# software distributed under the License is distributed on an +# "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY +# KIND, either express or implied. See the License for the +# specific language governing permissions and limitations +# under the License. +# +title: Deploying Polaris on Amazon Web Services (AWS) +linkTitle: AWS +type: docs +weight: 310 +--- + +Build and launch Polaris using the AWS Startup Script at the location provided in the command below. This script will start an [Amazon RDS for PostgreSQL](https://aws.amazon.com/rds/postgresql/) instance, which will be used as the backend Postgres instance holding all Polaris data. +Additionally, Polaris will be bootstrapped to use this database and Docker containers will be spun up for Spark SQL and Trino. + +The requirements to run the script below are: +* There must be at least two subnets created in the VPC and region in which your EC2 instance reside. The span of subnets MUST include at least 2 availability zones (AZs) within the same region. +* Your EC2 instance must be enabled with [IMDSv1 or IMDSv2 with 2+ hop limit](https://docs.aws.amazon.com/AWSEC2/latest/UserGuide/configuring-IMDS-new-instances.html#configure-IMDS-new-instances-instance-settings). +* The AWS identity that you will use to run this script must have the following AWS permissions: + * "ec2:DescribeInstances" + * "rds:CreateDBInstance" + * "rds:DescribeDBInstances" + * "rds:CreateDBSubnetGroup" + * "sts:AssumeRole" on the same role as the Instance Profile role of the EC2 instance on which you are running this script. Additionally, you should ensure that the Instance Profile contains a trust policy that allows the role to trust itself to be assumed. + +```shell +chmod +x getting-started/assets/cloud_providers/deploy-aws.sh +export ASSETS_PATH=$(pwd)/getting-started/assets/ +export CLIENT_ID=root +export CLIENT_SECRET=s3cr3t +./getting-started/assets/cloud_providers/deploy-aws.sh +``` + +## Next Steps +Congrats, you now have a running instance of Polaris! For further information regarding how to use Polaris, +check out the [Creating a Catalog]({{% ref "../../creating-a-catalog" %}}) and +[Using Polaris]({{% relref "../../using-polaris" %}}) pages. + +## Cleanup Instructions +To shut down the Polaris server, run the following commands: + +```shell +export ASSETS_PATH=$(pwd)/getting-started/assets/ +docker compose -p polaris -f getting-started/eclipselink/docker-compose.yml down +``` + +To deploy Polaris in a production setting, please review further recommendations at the [Configuring Polaris for Production]({{% relref "../../../configuring-polaris-for-production" %}}) page. diff --git a/1.2.0/getting-started/deploying-polaris/cloud-deploy/deploy-azure.md b/1.2.0/getting-started/deploying-polaris/cloud-deploy/deploy-azure.md new file mode 100644 index 0000000000..4d25f86af0 --- /dev/null +++ b/1.2.0/getting-started/deploying-polaris/cloud-deploy/deploy-azure.md @@ -0,0 +1,55 @@ +--- +# +# Licensed to the Apache Software Foundation (ASF) under one +# or more contributor license agreements. See the NOTICE file +# distributed with this work for additional information +# regarding copyright ownership. The ASF licenses this file +# to you under the Apache License, Version 2.0 (the +# "License"); you may not use this file except in compliance +# with the License. You may obtain a copy of the License at +# +# http://www.apache.org/licenses/LICENSE-2.0 +# +# Unless required by applicable law or agreed to in writing, +# software distributed under the License is distributed on an +# "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY +# KIND, either express or implied. See the License for the +# specific language governing permissions and limitations +# under the License. +# +title: Deploying Polaris on Azure +linkTitle: Azure +type: docs +weight: 320 +--- + +Build and launch Polaris using the Azure Startup Script at the location provided in the command below. This script will start an [Azure Database for PostgreSQL - Flexible Server](https://learn.microsoft.com/en-us/azure/postgresql/flexible-server/overview) instance, which will be used as the backend Postgres instance holding all Polaris data. +Additionally, Polaris will be bootstrapped to use this database and Docker containers will be spun up for Spark SQL and Trino. + +The requirements to run the script below are: +* Install the AZ CLI, if it is not already installed on the Azure VM. Instructions to download the AZ CLI can be found [here](https://learn.microsoft.com/en-us/cli/azure/install-azure-cli). +* You must be logged into the AZ CLI. Please run `az account show` to ensure that you are logged in prior to running this script. +* Assign a System-Assigned Managed Identity to the Azure VM. + +```shell +chmod +x getting-started/assets/cloud_providers/deploy-azure.sh +export ASSETS_PATH=$(pwd)/getting-started/assets/ +export CLIENT_ID=root +export CLIENT_SECRET=s3cr3t +./getting-started/assets/cloud_providers/deploy-azure.sh +``` + +## Next Steps +Congrats, you now have a running instance of Polaris! For further information regarding how to use Polaris, +check out the [Creating a Catalog]({{% ref "../../creating-a-catalog" %}}) and +[Using Polaris]({{% relref "../../using-polaris" %}}) pages. + +## Cleanup Instructions +To shut down the Polaris server, run the following commands: + +```shell +export ASSETS_PATH=$(pwd)/getting-started/assets/ +docker compose -p polaris -f getting-started/eclipselink/docker-compose.yml down +``` + +To deploy Polaris in a production setting, please review further recommendations at the [Configuring Polaris for Production]({{% relref "../../../configuring-polaris-for-production" %}}) page. diff --git a/1.2.0/getting-started/deploying-polaris/cloud-deploy/deploy-gcp.md b/1.2.0/getting-started/deploying-polaris/cloud-deploy/deploy-gcp.md new file mode 100644 index 0000000000..384433d83c --- /dev/null +++ b/1.2.0/getting-started/deploying-polaris/cloud-deploy/deploy-gcp.md @@ -0,0 +1,55 @@ +--- +# +# Licensed to the Apache Software Foundation (ASF) under one +# or more contributor license agreements. See the NOTICE file +# distributed with this work for additional information +# regarding copyright ownership. The ASF licenses this file +# to you under the Apache License, Version 2.0 (the +# "License"); you may not use this file except in compliance +# with the License. You may obtain a copy of the License at +# +# http://www.apache.org/licenses/LICENSE-2.0 +# +# Unless required by applicable law or agreed to in writing, +# software distributed under the License is distributed on an +# "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY +# KIND, either express or implied. See the License for the +# specific language governing permissions and limitations +# under the License. +# +title: Deploying Polaris on Google Cloud Platform (GCP) +linkTitle: GCP +type: docs +weight: 330 +--- + +Build and launch Polaris using the GCP Startup Script at the location provided in the command below. This script will start a [Cloud SQL for PostgreSQL](https://cloud.google.com/sql/docs/postgres) instance, which will be used as the backend Postgres instance holding all Polaris data. +Additionally, Polaris will be bootstrapped to use this database and Docker containers will be spun up for Spark SQL and Trino. + +The requirements to run the script below are: +* Install the `gcloud` CLI, if it is not already installed on the GCP VM. Instructions to download the `gcloud` CLI can be found [here](https://cloud.google.com/sdk/docs/install). +* Ensure the `Cloud SQL Admin API` has been enabled in your project and that your VM's Principal has access to the correct role: `roles/cloudsql.admin`. +* Ensure the VM's Principal has access to at least Read-only scope on Compute Engine: `compute.readonly`. + +```shell +chmod +x getting-started/assets/cloud_providers/deploy-gcp.sh +export ASSETS_PATH=$(pwd)/getting-started/assets/ +export CLIENT_ID=root +export CLIENT_SECRET=s3cr3t +./getting-started/assets/cloud_providers/deploy-gcp.sh +``` + +## Next Steps +Congrats, you now have a running instance of Polaris! For further information regarding how to use Polaris, +check out the [Creating a Catalog]({{% ref "../../creating-a-catalog" %}}) and +[Using Polaris]({{% relref "../../using-polaris" %}}) pages. + +## Cleanup Instructions +To shut down the Polaris server, run the following commands: + +```shell +export ASSETS_PATH=$(pwd)/getting-started/assets/ +docker compose -p polaris -f getting-started/eclipselink/docker-compose.yml down +``` + +To deploy Polaris in a production setting, please review further recommendations at the [Configuring Polaris for Production]({{% relref "../../../configuring-polaris-for-production" %}}) page. diff --git a/1.2.0/getting-started/deploying-polaris/local-deploy.md b/1.2.0/getting-started/deploying-polaris/local-deploy.md new file mode 100644 index 0000000000..c2b7b41743 --- /dev/null +++ b/1.2.0/getting-started/deploying-polaris/local-deploy.md @@ -0,0 +1,119 @@ +--- +# +# Licensed to the Apache Software Foundation (ASF) under one +# or more contributor license agreements. See the NOTICE file +# distributed with this work for additional information +# regarding copyright ownership. The ASF licenses this file +# to you under the Apache License, Version 2.0 (the +# "License"); you may not use this file except in compliance +# with the License. You may obtain a copy of the License at +# +# http://www.apache.org/licenses/LICENSE-2.0 +# +# Unless required by applicable law or agreed to in writing, +# software distributed under the License is distributed on an +# "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY +# KIND, either express or implied. See the License for the +# specific language governing permissions and limitations +# under the License. +# +title: Deploying Polaris locally +linkTitle: Local deployment +type: docs +weight: 200 +--- + +Polaris can be deployed via a docker image or as a standalone process. Before starting, be sure that you've satisfied the relevant prerequisites detailed in the previous page. + +## Common Setup +Before running Polaris, ensure you have completed the following setup steps: + +1. **Build Polaris** +```shell +cd ~/polaris +./gradlew \ + :polaris-server:assemble \ + :polaris-server:quarkusAppPartsBuild --rerun \ + :polaris-admin:assemble \ + :polaris-admin:quarkusAppPartsBuild --rerun \ + -Dquarkus.container-image.build=true +``` +- **For standalone**: Omit the `-Dquarkus.container-image.tag` and `-Dquarkus.container-image.build` options if you do not need to build a Docker image. + +## Running Polaris with Docker + +To start using Polaris in Docker and launch Polaris, which is packaged with a Postgres instance, Apache Spark, and Trino. + +```shell +export ASSETS_PATH=$(pwd)/getting-started/assets/ +export QUARKUS_DATASOURCE_JDBC_URL=jdbc:postgresql://postgres:5432/POLARIS +export QUARKUS_DATASOURCE_USERNAME=postgres +export QUARKUS_DATASOURCE_PASSWORD=postgres +export CLIENT_ID=root +export CLIENT_SECRET=s3cr3t +docker compose -p polaris -f getting-started/assets/postgres/docker-compose-postgres.yml \ + -f getting-started/jdbc/docker-compose-bootstrap-db.yml \ + -f getting-started/jdbc/docker-compose.yml up -d +``` + +You should see output for some time as Polaris, Spark, and Trino build and start up. Eventually, you won’t see any more logs and see some logs relating to Spark, resembling the following: + +``` +spark-sql-1 | Spark Web UI available at http://8bc4de8ed854:4040 +spark-sql-1 | Spark master: local[*], Application Id: local-1743745174604 +spark-sql-1 | 25/04/04 05:39:38 WARN SparkSQLCLIDriver: WARNING: Directory for Hive history file: /home/spark does not exist. History will not be available during this session. +spark-sql-1 | 25/04/04 05:39:39 WARN RESTSessionCatalog: Iceberg REST client is missing the OAuth2 server URI configuration and defaults to http://polaris:8181/api/catalogv1/oauth/tokens. This automatic fallback will be removed in a future Iceberg release.It is recommended to configure the OAuth2 endpoint using the 'oauth2-server-uri' property to be prepared. This warning will disappear if the OAuth2 endpoint is explicitly configured. See https://github.com/apache/iceberg/issues/10537 +``` + +The Docker image pre-configures a sample catalog called `quickstart_catalog` that uses a local file system. + +## Running Polaris as a Standalone Process + +You can also start Polaris through Gradle (packaged within the Polaris repository): + +1. **Start the Server** + +Run the following command to start Polaris: + +```shell +./gradlew run +``` + +You should see output for some time as Polaris builds and starts up. Eventually, you won’t see any more logs and should see messages that resemble the following: + +``` +INFO [io.quarkus] [,] [,,,] (main) Apache Polaris Server (incubating) on JVM (powered by Quarkus ) started in 1.911s. Listening on: http://0.0.0.0:8181. Management interface listening on http://0.0.0.0:8182. +INFO [io.quarkus] [,] [,,,] (main) Profile prod activated. +INFO [io.quarkus] [,] [,,,] (main) Installed features: [...] +``` + +At this point, Polaris is running. + +When using a Gradle-launched Polaris instance in this tutorial, we'll launch an instance of Polaris that stores entities only in-memory. This means that any entities that you define will be destroyed when Polaris is shut down. +For more information on how to configure Polaris for production usage, see the [docs]({{% relref "../../configuring-polaris-for-production" %}}). + +When Polaris is run using the `./gradlew run` command, the root principal credentials are `root` and `s3cr3t` for the `CLIENT_ID` and `CLIENT_SECRET`, respectively. + +### Installing Apache Spark and Trino Locally for Testing + +#### Apache Spark + +If you want to connect to Polaris with [Apache Spark](https://spark.apache.org/), you'll need to start by cloning Spark. As in the [prerequisites]({{% ref "../install-dependencies#git" %}}), make sure [git](https://git-scm.com/) is installed first. + +Then, clone Spark and check out a versioned branch. This guide uses [Spark 3.5](https://spark.apache.org/releases/spark-release-3-5-0.html). + +```shell +git clone --branch branch-3.5 https://github.com/apache/spark.git ~/spark +``` + +#### Trino +If you want to connect to Polaris with [Trino](https://trino.io/), it is recommended to set up a test instance of Trino using Docker. As in the [prerequisites]({{% ref "../install-dependencies#docker" %}}), make sure [Docker](https://www.docker.com/) is installed first + +```shell +docker run --name trino -d -p 8080:8080 trinodb/trino +``` + +## Next Steps +Congrats, you now have a running instance of Polaris! For further information regarding how to use Polaris, +check out the [Creating a Catalog]({{% ref "../creating-a-catalog" %}}) and +[Using Polaris]({{% ref "../using-polaris" %}}) pages. diff --git a/1.2.0/getting-started/install-dependencies.md b/1.2.0/getting-started/install-dependencies.md new file mode 100644 index 0000000000..66640104d4 --- /dev/null +++ b/1.2.0/getting-started/install-dependencies.md @@ -0,0 +1,120 @@ +--- +# +# Licensed to the Apache Software Foundation (ASF) under one +# or more contributor license agreements. See the NOTICE file +# distributed with this work for additional information +# regarding copyright ownership. The ASF licenses this file +# to you under the Apache License, Version 2.0 (the +# "License"); you may not use this file except in compliance +# with the License. You may obtain a copy of the License at +# +# http://www.apache.org/licenses/LICENSE-2.0 +# +# Unless required by applicable law or agreed to in writing, +# software distributed under the License is distributed on an +# "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY +# KIND, either express or implied. See the License for the +# specific language governing permissions and limitations +# under the License. +# +Title: Installing Dependencies +type: docs +weight: 100 +--- + +This guide serves as an introduction to several key entities that can be managed with Apache Polaris (Incubating), describes how to build and deploy Polaris locally, and finally includes examples of how to use Polaris with Apache Spark™. + +# Prerequisites + +This guide covers building Polaris, deploying it locally or via [Docker](https://www.docker.com/), and interacting with it using the command-line interface and [Apache Spark](https://spark.apache.org/). Before proceeding with Polaris, be sure to satisfy the relevant prerequisites listed here. + +## Git + +To get the latest Polaris code, you'll need to clone the repository using [git](https://git-scm.com/). You can install git using [homebrew](https://brew.sh/) on MacOS: + +```shell +brew install git +``` + +Please follow instructions from the [Git Documentation](https://git-scm.com/book/en/v2/Getting-Started-Installing-Git) for instructions on installing Git on other platforms. + +Then, use git to clone the Polaris repo: + +```shell +git clone https://github.com/apache/polaris.git ~/polaris +``` + +## Docker + +It is recommended to deploy Polaris inside [Docker](https://www.docker.com/) for the Quickstart workflow. Instructions for deploying the Quickstart workflow on the supported Cloud Providers (AWS, Azure, GCP) will be provided only with Docker. However, non-Docker deployment instructions for local deployments can also be followed on Cloud Providers. + +Instructions to install Docker can be found on the [Docker website](https://docs.docker.com/engine/install/). Ensure that Docker and the Docker Compose plugin are both installed. + +### Docker on MacOS +Docker can be installed using [homebrew](https://brew.sh/): + +```shell +brew install --cask docker +``` + +There could be a [Docker permission issues](https://github.com/apache/polaris/pull/971) related to seccomp configuration. To resolve these issues, set the `seccomp` profile to "unconfined" when running a container. For example: + +```shell +docker run --security-opt seccomp=unconfined apache/polaris:latest +``` + +Note: Setting the seccomp profile to "unconfined" disables the default system call filtering, which may pose security risks. Use this configuration with caution, especially in production environments. + +### Docker on Amazon Linux +Docker can be installed using a modification to the CentOS instructions. For example: + +```shell +sudo dnf update -y +# Remove old version +sudo dnf remove -y docker docker-client docker-client-latest docker-common docker-latest docker-latest-logrotate docker-logrotate docker-engine +# Install dnf plugin +sudo dnf -y install dnf-plugins-core +# Add CentOS repository +sudo dnf config-manager --add-repo https://download.docker.com/linux/centos/docker-ce.repo +# Adjust release server version in the path as it will not match with Amazon Linux 2023 +sudo sed -i 's/$releasever/9/g' /etc/yum.repos.d/docker-ce.repo +# Install as usual +sudo dnf -y install docker-ce docker-ce-cli containerd.io docker-buildx-plugin docker-compose-plugin +``` + +### Confirm Docker Installation + +Once installed, make sure that both Docker and the Docker Compose plugin are installed: + +```shell +docker version +docker compose version +``` + +Also make sure Docker is running and is able to run a sample Docker container: + +```shell +docker run hello-world +``` + +## Java + +If you plan to build Polaris from source yourself or using this tutorial's instructions on a Cloud Provider, you will need to satisfy a few prerequisites first. + +Polaris is built using [gradle](https://gradle.org/) and is compatible with Java 21. We recommend the use of [jenv](https://www.jenv.be/) to manage multiple Java versions. For example, to install Java 21 via [homebrew](https://brew.sh/) and configure it with jenv: + +```shell +cd ~/polaris +brew install openjdk@21 jenv +jenv add $(brew --prefix openjdk@21) +jenv local 21 +``` + +Ensure that `java --version` and `javac` both return non-zero responses. + +## jq + +Most Polaris Quickstart scripts require [jq]((https://jqlang.org/download/)). You can install jq using [homebrew](https://brew.sh/): +```shell +brew install jq +``` diff --git a/1.2.0/getting-started/using-polaris/_index.md b/1.2.0/getting-started/using-polaris/_index.md new file mode 100644 index 0000000000..a2e9f521a5 --- /dev/null +++ b/1.2.0/getting-started/using-polaris/_index.md @@ -0,0 +1,348 @@ +--- +# +# Licensed to the Apache Software Foundation (ASF) under one +# or more contributor license agreements. See the NOTICE file +# distributed with this work for additional information +# regarding copyright ownership. The ASF licenses this file +# to you under the Apache License, Version 2.0 (the +# "License"); you may not use this file except in compliance +# with the License. You may obtain a copy of the License at +# +# http://www.apache.org/licenses/LICENSE-2.0 +# +# Unless required by applicable law or agreed to in writing, +# software distributed under the License is distributed on an +# "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY +# KIND, either express or implied. See the License for the +# specific language governing permissions and limitations +# under the License. +# +Title: Using Polaris +type: docs +weight: 401 +--- + +## Setup + +Ensure your `CLIENT_ID` & `CLIENT_SECRET` variables are already defined, as they were required for starting the Polaris server earlier. + +```shell +export CLIENT_ID=YOUR_CLIENT_ID +export CLIENT_SECRET=YOUR_CLIENT_SECRET +``` + +Refer to the [Creating a Catalog]({{% ref "creating-a-catalog" %}}) page for instructions on defining a +catalog for your specific storage type. The following examples assume the catalog's name is `quickstart_catalog`. + +In Polaris, the [catalog]({{% relref "../../entities#catalog" %}}) is the top-level entity that objects like [tables]({{% relref "../../entities#table" %}}) and [views]({{% relref "../../entities#view" %}}) are organized under. + +The `DEFAULT_BASE_LOCATION` value you provided at catalog creation time will be the default location that objects in +this catalog should be stored in. + +Additionally, if Polaris is running somewhere other than `localhost:8181`, you can specify the correct hostname and port by providing `--host` and `--port` flags. For the full set of options supported by the CLI, please refer to the [docs]({{% relref "../../command-line-interface" %}}). + + +### Creating a Principal and Assigning it Privileges + +With a catalog created, we can create a [principal]({{% relref "../../entities#principal" %}}) that has access to manage that catalog. For details on how to configure the Polaris CLI, see [the section above](#defining-a-catalog) or refer to the [docs]({{% relref "../../command-line-interface" %}}). + +```shell +./polaris \ + --client-id ${CLIENT_ID} \ + --client-secret ${CLIENT_SECRET} \ + principals \ + create \ + quickstart_user + +./polaris \ + --client-id ${CLIENT_ID} \ + --client-secret ${CLIENT_SECRET} \ + principal-roles \ + create \ + quickstart_user_role + +./polaris \ + --client-id ${CLIENT_ID} \ + --client-secret ${CLIENT_SECRET} \ + catalog-roles \ + create \ + --catalog quickstart_catalog \ + quickstart_catalog_role +``` + +Be sure to provide the necessary credentials, hostname, and port as before. + +When the `principals create` command completes successfully, it will return the credentials for this new principal. Export them for future use. For example: + +```shell +./polaris ... principals create example +{"clientId": "XXXX", "clientSecret": "YYYY"} +export USER_CLIENT_ID=XXXX +export USER_CLIENT_SECRET=YYYY +``` + +Now, we grant the principal the [principal role]({{% relref "../../entities#principal-role" %}}) we created, and grant the [catalog role]({{% relref "../../entities#catalog-role" %}}) the principal role we created. For more information on these entities, please refer to the linked documentation. + +```shell +./polaris \ + --client-id ${CLIENT_ID} \ + --client-secret ${CLIENT_SECRET} \ + principal-roles \ + grant \ + --principal quickstart_user \ + quickstart_user_role + +./polaris \ + --client-id ${CLIENT_ID} \ + --client-secret ${CLIENT_SECRET} \ + catalog-roles \ + grant \ + --catalog quickstart_catalog \ + --principal-role quickstart_user_role \ + quickstart_catalog_role +``` + +Now, we’ve linked our principal to the catalog via roles like so: + +![Principal to Catalog](/img/quickstart/privilege-illustration-1.png "Principal to Catalog") + +In order to give this principal the ability to interact with the catalog, we must assign some [privileges]({{% relref "../../entities#privilege" %}}). For the time being, we will give this principal the ability to fully manage content in our new catalog. We can do this with the CLI like so: + +```shell +./polaris \ + --client-id ${CLIENT_ID} \ + --client-secret ${CLIENT_SECRET} \ + privileges \ + catalog \ + grant \ + --catalog quickstart_catalog \ + --catalog-role quickstart_catalog_role \ + CATALOG_MANAGE_CONTENT +``` + +This grants the [catalog privileges]({{% relref "../../entities#privilege" %}}) `CATALOG_MANAGE_CONTENT` to our catalog role, linking everything together like so: + +![Principal to Catalog with Catalog Role](/img/quickstart/privilege-illustration-2.png "Principal to Catalog with Catalog Role") + +`CATALOG_MANAGE_CONTENT` has create/list/read/write privileges on all entities within the catalog. The same privilege could be granted to a namespace, in which case the principal could create/list/read/write any entity under that namespace. + +## Using Iceberg & Polaris + +At this point, we’ve created a principal and granted it the ability to manage a catalog. We can now use an external engine to assume that principal, access our catalog, and store data in that catalog using [Apache Iceberg](https://iceberg.apache.org/). Polaris is compatible with any [Apache Iceberg](https://iceberg.apache.org/) client that supports the REST API. Depending on the client you plan to use, refer to the respective examples below. + +### Connecting with Spark + +#### Using a Local Build of Spark + +To use a Polaris-managed catalog in [Apache Spark](https://spark.apache.org/), we can configure Spark to use the Iceberg catalog REST API. + +This guide uses [Apache Spark 3.5](https://spark.apache.org/releases/spark-release-3-5-0.html), but be sure to find [the appropriate iceberg-spark package for your Spark version](https://mvnrepository.com/artifact/org.apache.iceberg/iceberg-spark). From a local Spark clone on the `branch-3.5` branch we can run the following: + +_Note: the credentials provided here are those for our principal, not the root credentials._ + +```shell +bin/spark-sql \ +--packages org.apache.iceberg:iceberg-spark-runtime-3.5_2.12:1.10.0,org.apache.iceberg:iceberg-aws-bundle:1.10.0 \ +--conf spark.sql.extensions=org.apache.iceberg.spark.extensions.IcebergSparkSessionExtensions \ +--conf spark.sql.catalog.quickstart_catalog.warehouse=quickstart_catalog \ +--conf spark.sql.catalog.quickstart_catalog.header.X-Iceberg-Access-Delegation=vended-credentials \ +--conf spark.sql.catalog.quickstart_catalog=org.apache.iceberg.spark.SparkCatalog \ +--conf spark.sql.catalog.quickstart_catalog.catalog-impl=org.apache.iceberg.rest.RESTCatalog \ +--conf spark.sql.catalog.quickstart_catalog.uri=http://localhost:8181/api/catalog \ +--conf spark.sql.catalog.quickstart_catalog.credential=${USER_CLIENT_ID}:${USER_CLIENT_SECRET} \ +--conf spark.sql.catalog.quickstart_catalog.scope='PRINCIPAL_ROLE:ALL' \ +--conf spark.sql.catalog.quickstart_catalog.token-refresh-enabled=true \ +--conf spark.sql.catalog.quickstart_catalog.client.region=us-west-2 +``` + +Similar to the CLI commands above, this configures Spark to use the Polaris running at `localhost:8181`. If your Polaris server is running elsewhere, but sure to update the configuration appropriately. + +Finally, note that we include the `iceberg-aws-bundle` package here. If your table is using a different filesystem, be sure to include the appropriate dependency. + +#### Using Spark SQL from a Docker container + +Refresh the Docker container with the user's credentials: +```shell +docker compose -p polaris -f getting-started/eclipselink/docker-compose.yml stop spark-sql +docker compose -p polaris -f getting-started/eclipselink/docker-compose.yml rm -f spark-sql +docker compose -p polaris -f getting-started/eclipselink/docker-compose.yml up -d --no-deps spark-sql +``` + +Attach to the running spark-sql container: + +```shell +docker attach $(docker ps -q --filter name=spark-sql) +``` + +#### Sample Commands + +Once the Spark session starts, we can create a namespace and table within the catalog: + +```sql +USE quickstart_catalog; +CREATE NAMESPACE IF NOT EXISTS quickstart_namespace; +CREATE NAMESPACE IF NOT EXISTS quickstart_namespace.schema; +USE NAMESPACE quickstart_namespace.schema; +CREATE TABLE IF NOT EXISTS quickstart_table (id BIGINT, data STRING) USING ICEBERG; +``` + +We can now use this table like any other: + +``` +INSERT INTO quickstart_table VALUES (1, 'some data'); +SELECT * FROM quickstart_table; +. . . ++---+---------+ +|id |data | ++---+---------+ +|1 |some data| ++---+---------+ +``` + +If at any time access is revoked... + +```shell +./polaris \ + --client-id ${CLIENT_ID} \ + --client-secret ${CLIENT_SECRET} \ + privileges \ + catalog \ + revoke \ + --catalog quickstart_catalog \ + --catalog-role quickstart_catalog_role \ + CATALOG_MANAGE_CONTENT +``` + +Spark will lose access to the table: + +``` +INSERT INTO quickstart_table VALUES (1, 'some data'); + +org.apache.iceberg.exceptions.ForbiddenException: Forbidden: Principal 'quickstart_user' with activated PrincipalRoles '[]' and activated grants via '[quickstart_catalog_role, quickstart_user_role]' is not authorized for op LOAD_TABLE_WITH_READ_DELEGATION +``` + +### Connecting with Trino + +Refresh the Docker container with the user's credentials: + +```shell +docker compose -p polaris -f getting-started/eclipselink/docker-compose.yml stop trino +docker compose -p polaris -f getting-started/eclipselink/docker-compose.yml rm -f trino +docker compose -p polaris -f getting-started/eclipselink/docker-compose.yml up -d --no-deps trino +``` + +Attach to the running Trino container: + +```shell +docker exec -it $(docker ps -q --filter name=trino) trino +``` + +You may not see Trino's prompt immediately, type ENTER to see it. A few commands that you can try: + +```sql +SHOW CATALOGS; +SHOW SCHEMAS FROM iceberg; +CREATE SCHEMA iceberg.quickstart_schema; +CREATE TABLE iceberg.quickstart_schema.quickstart_table AS SELECT 1 x; +SELECT * FROM iceberg.quickstart_schema.quickstart_table; +``` + +If at any time access is revoked... + +```shell +./polaris \ + --client-id ${CLIENT_ID} \ + --client-secret ${CLIENT_SECRET} \ + privileges \ + catalog \ + revoke \ + --catalog quickstart_catalog \ + --catalog-role quickstart_catalog_role \ + CATALOG_MANAGE_CONTENT +``` + +Trino will lose access to the table: + +```sql +SELECT * FROM iceberg.quickstart_schema.quickstart_table; + +org.apache.iceberg.exceptions.ForbiddenException: Forbidden: Principal 'quickstart_user' with activated PrincipalRoles '[]' and activated grants via '[quickstart_catalog_role, quickstart_user_role]' is not authorized for op LOAD_TABLE_WITH_READ_DELEGATION +``` + +### Connecting with PyIceberg + +#### Using Credentials + +```python +from pyiceberg.catalog import load_catalog + +catalog = load_catalog( + type='rest', + uri='http://localhost:8181/api/catalog', + warehouse='quickstart_catalog', + scope="PRINCIPAL_ROLE:ALL", + credential=f"{CLIENT_ID}:{CLIENT_SECRET}", +) +``` + +If the `load_catalog` function is used with credentials, then PyIceberg will automatically request an authorization token from the `v1/oauth/tokens` endpoint, and will later use this token to prove its identity to the Polaris Catalog. + +#### Using a Token + +```python +from pyiceberg.catalog import load_catalog +import requests + +# Step 1: Get OAuth token +response = requests.post( + "http://localhost:8181/api/catalog/v1/oauth/tokens", + auth =(CLIENT_ID, CLIENT_SECRET), + data = { + "grant_type": "client_credentials", + "scope": "PRINCIPAL_ROLE:ALL" + }) +token = response.json()["access_token"] + +# Step 2: Load the catalog using the token +catalog = load_catalog( + type='rest', + uri='http://localhost:8181/api/catalog', + warehouse='quickstart_catalog', + token=token, +) +``` + +It is possible to use `load_catalog` function by providing an authorization token directly. This method is useful when using an external identity provider (e.g. Google Identity). + +### Connecting Using REST APIs + +To access Polaris from the host machine, first request an access token: + +```shell +export POLARIS_TOKEN=$(curl -s http://polaris:8181/api/catalog/v1/oauth/tokens \ + --resolve polaris:8181:127.0.0.1 \ + --user ${CLIENT_ID}:${CLIENT_SECRET} \ + -d 'grant_type=client_credentials' \ + -d 'scope=PRINCIPAL_ROLE:ALL' | jq -r .access_token) +``` + +Then, use the access token in the Authorization header when accessing Polaris: + +```shell +curl -v http://127.0.0.1:8181/api/management/v1/principal-roles -H "Authorization: Bearer $POLARIS_TOKEN" +curl -v http://127.0.0.1:8181/api/management/v1/catalogs/quickstart_catalog -H "Authorization: Bearer $POLARIS_TOKEN" +``` + +## Next Steps +* Visit [Using Keycloak as the external identity provider]({{% relref "keycloak-idp" %}}). +* Visit [Using Polaris with telemetry tools]({{% relref "telemetry-tools" %}}). +* Visit [Configuring Polaris for Production]({{% relref "../../configuring-polaris-for-production" %}}). +* A Getting Started experience for using Spark with Jupyter Notebooks is documented [here](https://github.com/apache/polaris/blob/main/getting-started/spark/README.md). +* To shut down a locally-deployed Polaris server and clean up all related Docker containers, run the command listed below. Cloud Deployments have their respective termination commands on their Deployment page, while Polaris running on Gradle will terminate when the Gradle process terminates. +```shell +docker compose -p polaris \ + -f getting-started/assets/postgres/docker-compose-postgres.yml \ + -f getting-started/jdbc/docker-compose-bootstrap-db.yml \ + -f getting-started/jdbc/docker-compose.yml \ + down +``` diff --git a/1.2.0/getting-started/using-polaris/keycloak-idp.md b/1.2.0/getting-started/using-polaris/keycloak-idp.md new file mode 100644 index 0000000000..a0d27b7386 --- /dev/null +++ b/1.2.0/getting-started/using-polaris/keycloak-idp.md @@ -0,0 +1,212 @@ +--- +# +# Licensed to the Apache Software Foundation (ASF) under one +# or more contributor license agreements. See the NOTICE file +# distributed with this work for additional information +# regarding copyright ownership. The ASF licenses this file +# to you under the Apache License, Version 2.0 (the +# "License"); you may not use this file except in compliance +# with the License. You may obtain a copy of the License at +# +# http://www.apache.org/licenses/LICENSE-2.0 +# +# Unless required by applicable law or agreed to in writing, +# software distributed under the License is distributed on an +# "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY +# KIND, either express or implied. See the License for the +# specific language governing permissions and limitations +# under the License. +# +title: Getting Started with Apache Polaris, External Authentication and Keycloak +linkTitle: Using Keycloak IDP +type: docs +weight: 400 +--- + +## Overview + +This example uses Keycloak as an **external** identity provider for Polaris. The "iceberg" realm is automatically +created and configured from the `iceberg-realm.json` file. + +This Keycloak realm contains 1 client definition: `client1:s3cr3t`. It is configured to return tokens with the following +fixed claims: + +- `principal_id`: the principal ID of the user. It is always set to zero (0) in this example. +- `principal_name`: the principal name of the user. It is always set to "root" in this example. +- `principal_roles`: the principal roles of the user. It is always set to `["server_admin", "catalog_admin"]` in this + example. + +This is obviously not a realistic configuration. In a real-world scenario, you would configure Keycloak to return the +actual principal ID, name and roles of the user. Note that principals and principal roles must have been created in +Polaris beforehand, and the principal ID, name and roles must match the ones returned by Keycloak. + +Polaris is configured with 3 realms: + +- `realm-internal`: This is the default realm, and is configured to use the internal authentication only. It accepts + token issues by Polaris itself only. +- `realm-external`: This realm is configured to use an external identity provider (IDP) for authentication only. It + accepts tokens issued by Keycloak only. +- `realm-mixed`: This realm is configured to use both the internal and external authentication. It accepts tokens + issued by both Polaris and Keycloak. + +For more information about how to configure Polaris with external authentication, see the +[IDP integration documentation]({{% relref "../../managing-security/external-idp" %}}). + +## Starting the Example + +1. Build the Polaris server image if it's not already present locally: + + ```shell + ./gradlew \ + :polaris-server:assemble \ + :polaris-server:quarkusAppPartsBuild --rerun \ + -Dquarkus.container-image.build=true + ``` + +2. Start the docker compose group by running the following command from the root of the repository: + + ```shell + docker compose -f getting-started/keycloak/docker-compose.yml up + ``` + +## Requesting a Token + +Note: the commands below require `jq` to be installed on your machine. + +### From Polaris + +You can request a token from Polaris for realms `realm-internal` and `realm-mixed`: + +1. Open a terminal and run the following command to request an access token for the `realm-internal` realm: + + ```shell + polaris_token_realm_internal=$(curl -s http://localhost:8181/api/catalog/v1/oauth/tokens \ + --user root:s3cr3t \ + -H 'Polaris-Realm: realm-internal' \ + -d 'grant_type=client_credentials' \ + -d 'scope=PRINCIPAL_ROLE:ALL' | jq -r .access_token) + ``` + + This token is valid only for the `realm-internal` realm. + +2. Open a terminal and run the following command to request an access token for the `realm-mixed` realm: + + ```shell + polaris_token_realm_mixed=$(curl -s http://localhost:8181/api/catalog/v1/oauth/tokens \ + --user root:s3cr3t \ + -H 'Polaris-Realm: realm-mixed' \ + -d 'grant_type=client_credentials' \ + -d 'scope=PRINCIPAL_ROLE:ALL' | jq -r .access_token) + ``` + + This token is valid only for the `realm-mixed` realm. + +Polaris tokens are valid for 1 hour. + +Note: if you request a Polaris token for the `realm-external` realm, it will not work because Polaris won't issue tokens +for this realm: + +```shell +curl -v http://localhost:8181/api/catalog/v1/oauth/tokens \ + --user root:s3cr3t \ + -H 'Polaris-Realm: realm-external' \ + -d 'grant_type=client_credentials' \ + -d 'scope=PRINCIPAL_ROLE:ALL' +``` + +This will return a `501 Not Implemented` error because for this realm, the internal token endpoint has been deactivated. + +### From Keycloak + +You can request a token from Keycloak for the `realm-external` and `realm-mixed` realms: + +1. Open a terminal and run the following command to request an access token from Keycloak: + + ```shell + keycloak_token=$(curl -s http://keycloak:8080/realms/iceberg/protocol/openid-connect/token \ + --resolve keycloak:8080:127.0.0.1 \ + --user client1:s3cr3t \ + -d 'grant_type=client_credentials' | jq -r .access_token) + ``` + +Note the `--resolve` option: it is used to send the request with the `Host` header set to `keycloak`. This is necessary +because Keycloak issues tokens with the `iss` claim matching the request's `Host` header; without this, the token would +not be valid when used against Polaris because the `iss` claim would be `127.0.0.1`, but Polaris expects it to be +`keycloak`, since that's Keycloak's hostname within the Docker network. + +Tokens issued by Keycloak can be used to access Polaris with the `realm-external` or `realm-mixed` realms. Access tokens +are valid for 1 hour. + +You can also access the Keycloak admin console. Open a browser and go to [http://localhost:8080](http://localhost:8080), +then log in with the username `admin` and password `admin` (you can change this in the docker-compose file). + +## Accessing Polaris with the Tokens + +You can access Polaris using the tokens you obtained above. The following examples show how to use the tokens with +`curl`: + +### Using the Polaris Token + +1. Open a terminal and run the following command to list the principal roles in the `realm-internal` realm: + + ```shell + curl -v http://localhost:8181/api/management/v1/catalogs \ + -H "Authorization: Bearer $polaris_token_realm_internal" \ + -H 'Polaris-Realm: realm-internal' \ + -H 'Accept: application/json' + ``` + +2. Open a terminal and run the following command to list the principal roles in the `realm-mixed` realm: + + ```shell + curl -v http://localhost:8181/api/management/v1/catalogs \ + -H "Authorization: Bearer $polaris_token_realm_mixed" \ + -H 'Polaris-Realm: realm-mixed' \ + -H 'Accept: application/json' + ``` + +Note: you cannot mix tokens from different realms. For example, you cannot use a token from the `realm-internal` realm to access +the `realm-mixed` realm: + +```shell +curl -v http://localhost:8181/api/management/v1/catalogs \ + -H "Authorization: Bearer $polaris_token_realm_internal" \ + -H 'Polaris-Realm: realm-mixed' \ + -H 'Accept: application/json' +``` + +This will return a `401 Unauthorized` error because the token is not valid for the `realm-mixed` realm. + +### Using the Keycloak Token + +The same Keycloak token can be used to access both the `realm-external` and `realm-mixed` realms, as it is valid for +both (both realms share the same OIDC tenant configuration). + +1. Open a terminal and run the following command to list the principal roles in the `realm-external` realm: + + ```shell + curl -v http://localhost:8181/api/management/v1/catalogs \ + -H "Authorization: Bearer $keycloak_token" \ + -H 'Polaris-Realm: realm-external' \ + -H 'Accept: application/json' + ``` + +2. Open a terminal and run the following command to list the principal roles in the `realm-mixed` realm: + + ```shell + curl -v http://localhost:8181/api/management/v1/catalogs \ + -H "Authorization: Bearer $keycloak_token" \ + -H 'Polaris-Realm: realm-mixed' \ + -H 'Accept: application/json' + ``` + +Note: you cannot use a Keycloak token to access the `realm-internal` realm: + +```shell +curl -v http://localhost:8181/api/management/v1/catalogs \ + -H "Authorization: Bearer $keycloak_token" \ + -H 'Polaris-Realm: realm-internal' \ + -H 'Accept: application/json' +``` + +This will return a `401 Unauthorized` error because the token is not valid for the `realm-internal` realm. \ No newline at end of file diff --git a/1.2.0/getting-started/using-polaris/telemetry-tools.md b/1.2.0/getting-started/using-polaris/telemetry-tools.md new file mode 100644 index 0000000000..b6a9e8f8eb --- /dev/null +++ b/1.2.0/getting-started/using-polaris/telemetry-tools.md @@ -0,0 +1,70 @@ +--- +# +# Licensed to the Apache Software Foundation (ASF) under one +# or more contributor license agreements. See the NOTICE file +# distributed with this work for additional information +# regarding copyright ownership. The ASF licenses this file +# to you under the Apache License, Version 2.0 (the +# "License"); you may not use this file except in compliance +# with the License. You may obtain a copy of the License at +# +# http://www.apache.org/licenses/LICENSE-2.0 +# +# Unless required by applicable law or agreed to in writing, +# software distributed under the License is distributed on an +# "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY +# KIND, either express or implied. See the License for the +# specific language governing permissions and limitations +# under the License. +# +title: Getting Started with Apache Polaris, Prometheus and Jaeger +linkTitle: Using Polaris with telemetry tools +type: docs +weight: 401 +--- + +This example requires `jq` to be installed on your machine. + +1. Build the Polaris image if it's not already present locally: + + ```shell + ./gradlew \ + :polaris-server:assemble \ + :polaris-server:quarkusAppPartsBuild --rerun \ + -Dquarkus.container-image.build=true + ``` + +2. Start the docker compose group by running the following command from the root of the repository: + + ```shell + export ASSETS_PATH=$(pwd)/getting-started/assets/ + export CLIENT_ID=root + export CLIENT_SECRET=s3cr3t + docker compose -f getting-started/telemetry/docker-compose.yml up + ``` + +3. To access Polaris from the host machine, first request an access token: + + ```shell + export POLARIS_TOKEN=$(curl -s http://localhost:8181/api/catalog/v1/oauth/tokens \ + --user root:s3cr3t \ + -d 'grant_type=client_credentials' \ + -d 'scope=PRINCIPAL_ROLE:ALL' | jq -r .access_token) + ``` + +4. Then, use the access token in the Authorization header when accessing Polaris; you can also test + the `Polaris-Request-Id` header; you should see it in all logs and traces: + + ```shell + curl -v 'http://localhost:8181/api/management/v1/principal-roles' \ + -H "Authorization: Bearer $POLARIS_TOKEN" \ + -H "Polaris-Request-Id: 1234" + curl -v 'http://localhost:8181/api/catalog/v1/config?warehouse=quickstart_catalog' \ + -H "Authorization: Bearer $POLARIS_TOKEN" \ + -H "Polaris-Request-Id: 5678" + ``` + +5. Access the following services: + + - Prometheus UI: browse to http://localhost:9093 to view metrics. + - Jaeger UI: browse to http://localhost:16686 to view traces. diff --git a/1.2.0/helm.md b/1.2.0/helm.md new file mode 100644 index 0000000000..ef82e8e675 --- /dev/null +++ b/1.2.0/helm.md @@ -0,0 +1,371 @@ +--- +# +# Licensed to the Apache Software Foundation (ASF) under one +# or more contributor license agreements. See the NOTICE file +# distributed with this work for additional information +# regarding copyright ownership. The ASF licenses this file +# to you under the Apache License, Version 2.0 (the +# "License"); you may not use this file except in compliance +# with the License. You may obtain a copy of the License at +# +# http://www.apache.org/licenses/LICENSE-2.0 +# +# Unless required by applicable law or agreed to in writing, +# software distributed under the License is distributed on an +# "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY +# KIND, either express or implied. See the License for the +# specific language governing permissions and limitations +# under the License. +# +Title: Polaris Helm Chart +type: docs +weight: 675 +--- + + + +![Version: 1.2.0-incubating-SNAPSHOT](https://img.shields.io/badge/Version-1.2.0--incubating--SNAPSHOT-informational?style=flat-square) ![Type: application](https://img.shields.io/badge/Type-application-informational?style=flat-square) ![AppVersion: 1.2.0-incubating-SNAPSHOT](https://img.shields.io/badge/AppVersion-1.2.0--incubating--SNAPSHOT-informational?style=flat-square) + +A Helm chart for Apache Polaris (incubating). + +**Homepage:** + +## Source Code + +* + +## Installation + +### Running locally with a Minikube cluster + +The below instructions assume Minikube and Helm are installed. + +Start the Minikube cluster, build and load image into the Minikube cluster: + +```bash +minikube start +eval $(minikube docker-env) + +./gradlew \ + :polaris-server:assemble \ + :polaris-server:quarkusAppPartsBuild --rerun \ + :polaris-admin:assemble \ + :polaris-admin:quarkusAppPartsBuild --rerun \ + -Dquarkus.container-image.build=true +``` + +### Installing the chart locally + +The below instructions assume a local Kubernetes cluster is running and Helm is installed. + +#### Common setup + +Create the target namespace: +```bash +kubectl create namespace polaris +``` + +Create all the required resources in the `polaris` namespace. This usually includes a Postgres +database, Kubernetes secrets, and service accounts. The Polaris chart does not create +these resources automatically, as they are not required for all Polaris deployments. The chart will +fail if these resources are not created beforehand. You can find some examples in the +`helm/polaris/ci/fixtures` directory, but beware that these are primarily intended for tests. + +Below are two sample deployment models for installing the chart: one with a non-persistent backend and another with a persistent backend. + +{{< alert warning >}} +The examples below use values files located in the `helm/polaris/ci` directory. +**These files are intended for testing purposes primarily, and may not be suitable for production use**. +For production deployments, create your own values files based on the provided examples. +{{< /alert >}} + +#### Non-persistent backend + +Install the chart with a non-persistent backend. From Polaris repo root: +```bash +helm upgrade --install --namespace polaris \ + polaris helm/polaris +``` + +#### Persistent backend + +{{< alert warning >}} +The Postgres deployment set up in the fixtures directory is intended for testing purposes only and is not suitable for production use. For production deployments, use a managed Postgres service or a properly configured and secured Postgres instance. +{{< /alert >}} + +Install the chart with a persistent backend. From Polaris repo root: +```bash +helm upgrade --install --namespace polaris \ + --values helm/polaris/ci/persistence-values.yaml \ + polaris helm/polaris +kubectl wait --namespace polaris --for=condition=ready pod --selector=app.kubernetes.io/name=polaris --timeout=120s +``` + +To access Polaris and Postgres locally, set up port forwarding for both services (This is needed for bootstrap processes): +```bash +kubectl port-forward -n polaris $(kubectl get pod -n polaris -l app.kubernetes.io/name=polaris -o jsonpath='{.items[0].metadata.name}') 8181:8181 + +kubectl port-forward -n polaris $(kubectl get pod -n polaris -l app.kubernetes.io/name=postgres -o jsonpath='{.items[0].metadata.name}') 5432:5432 +``` + +Run the catalog bootstrap using the Polaris admin tool. This step initializes the catalog with the required configuration: +```bash +container_envs=$(kubectl exec -it -n polaris $(kubectl get pod -n polaris -l app.kubernetes.io/name=polaris -o jsonpath='{.items[0].metadata.name}') -- env) +export QUARKUS_DATASOURCE_USERNAME=$(echo "$container_envs" | grep quarkus.datasource.username | awk -F '=' '{print $2}' | tr -d '\n\r') +export QUARKUS_DATASOURCE_PASSWORD=$(echo "$container_envs" | grep quarkus.datasource.password | awk -F '=' '{print $2}' | tr -d '\n\r') +export QUARKUS_DATASOURCE_JDBC_URL=$(echo "$container_envs" | grep quarkus.datasource.jdbc.url | sed 's/postgres/localhost/2' | awk -F '=' '{print $2}' | tr -d '\n\r') + +java -jar runtime/admin/build/quarkus-app/quarkus-run.jar bootstrap -c POLARIS,root,pass -r POLARIS +``` + +### Uninstalling + +```bash +helm uninstall --namespace polaris polaris + +kubectl delete --namespace polaris -f helm/polaris/ci/fixtures/ + +kubectl delete namespace polaris +``` + +## Development & Testing + +This section is intended for developers who want to run the Polaris Helm chart tests. + +### Prerequisites + +The following tools are required to run the tests: + +* [Helm Unit Test](https://github.com/helm-unittest/helm-unittest) +* [Chart Testing](https://github.com/helm/chart-testing) + +Quick installation instructions for these tools: +```bash +helm plugin install https://github.com/helm-unittest/helm-unittest.git +brew install chart-testing +``` + +The integration tests also require some fixtures to be deployed. The `ci/fixtures` directory +contains the required resources. To deploy them, run the following command: +```bash +kubectl apply --namespace polaris -f helm/polaris/ci/fixtures/ +kubectl wait --namespace polaris --for=condition=ready pod --selector=app.kubernetes.io/name=postgres --timeout=120s +``` + +The `helm/polaris/ci` contains a number of values files that will be used to install the chart with +different configurations. + +### Running the unit tests + +Helm unit tests do not require a Kubernetes cluster. To run the unit tests, execute Helm Unit from +the Polaris repo root: +```bash +helm unittest helm/polaris +``` + +You can also lint the chart using the Chart Testing tool, with the following command: + +```bash +ct lint --charts helm/polaris +``` + +### Running the integration tests + +Integration tests require a Kubernetes cluster. See installation instructions above for setting up +a local cluster. + +Integration tests are run with the Chart Testing tool: +```bash +ct install --namespace polaris --charts ./helm/polaris +``` + +## Values + +| Key | Type | Default | Description | +|-----|------|---------|-------------| +| advancedConfig | object | `{}` | Advanced configuration. You can pass here any valid Polaris or Quarkus configuration property. Any property that is defined here takes precedence over all the other configuration values generated by this chart. Properties can be passed "flattened" or as nested YAML objects (see examples below). Note: values should be strings; avoid using numbers, booleans, or other types. | +| affinity | object | `{}` | Affinity and anti-affinity for polaris pods. See https://kubernetes.io/docs/concepts/scheduling-eviction/assign-pod-node/#affinity-and-anti-affinity. | +| authentication | object | `{"authenticator":{"type":"default"},"realmOverrides":{},"tokenBroker":{"maxTokenGeneration":"PT1H","secret":{"name":null,"privateKey":"private.pem","publicKey":"public.pem","rsaKeyPair":{"privateKey":"private.pem","publicKey":"public.pem"},"secretKey":"symmetric.pem","symmetricKey":{"secretKey":"symmetric.key"}},"type":"rsa-key-pair"},"tokenService":{"type":"default"},"type":"internal"}` | Polaris authentication configuration. | +| authentication.authenticator | object | `{"type":"default"}` | The `Authenticator` implementation to use. Only one built-in type is supported: default. | +| authentication.realmOverrides | object | `{}` | Authentication configuration overrides per realm. | +| authentication.tokenBroker | object | `{"maxTokenGeneration":"PT1H","secret":{"name":null,"privateKey":"private.pem","publicKey":"public.pem","rsaKeyPair":{"privateKey":"private.pem","publicKey":"public.pem"},"secretKey":"symmetric.pem","symmetricKey":{"secretKey":"symmetric.key"}},"type":"rsa-key-pair"}` | The `TokenBroker` implementation to use. Two built-in types are supported: rsa-key-pair and symmetric-key. Only relevant when using internal (or mixed) authentication. When using external authentication, the token broker is not used. | +| authentication.tokenBroker.maxTokenGeneration | string | `"PT1H"` | Maximum token generation duration (e.g., PT1H for 1 hour). | +| authentication.tokenBroker.secret | object | `{"name":null,"privateKey":"private.pem","publicKey":"public.pem","rsaKeyPair":{"privateKey":"private.pem","publicKey":"public.pem"},"secretKey":"symmetric.pem","symmetricKey":{"secretKey":"symmetric.key"}}` | The secret name to pull the public and private keys, or the symmetric key secret from. | +| authentication.tokenBroker.secret.name | string | `nil` | The name of the secret to pull the keys from. If not provided, a key pair will be generated. This is not recommended for production. | +| authentication.tokenBroker.secret.privateKey | string | `"private.pem"` | DEPRECATED: Use `authentication.tokenBroker.secret.rsaKeyPair.privateKey` instead. Key name inside the secret for the private key | +| authentication.tokenBroker.secret.publicKey | string | `"public.pem"` | DEPRECATED: Use `authentication.tokenBroker.secret.rsaKeyPair.publicKey` instead. Key name inside the secret for the public key | +| authentication.tokenBroker.secret.rsaKeyPair | object | `{"privateKey":"private.pem","publicKey":"public.pem"}` | Optional: configuration specific to RSA key pair secret. | +| authentication.tokenBroker.secret.rsaKeyPair.privateKey | string | `"private.pem"` | Key name inside the secret for the private key | +| authentication.tokenBroker.secret.rsaKeyPair.publicKey | string | `"public.pem"` | Key name inside the secret for the public key | +| authentication.tokenBroker.secret.secretKey | string | `"symmetric.pem"` | DEPRECATED: Use `authentication.tokenBroker.secret.symmetricKey.secretKey` instead. Key name inside the secret for the symmetric key | +| authentication.tokenBroker.secret.symmetricKey | object | `{"secretKey":"symmetric.key"}` | Optional: configuration specific to symmetric key secret. | +| authentication.tokenBroker.secret.symmetricKey.secretKey | string | `"symmetric.key"` | Key name inside the secret for the symmetric key | +| authentication.tokenService | object | `{"type":"default"}` | The token service (`IcebergRestOAuth2ApiService`) implementation to use. Two built-in types are supported: default and disabled. Only relevant when using internal (or mixed) authentication. When using external authentication, the token service is always disabled. | +| authentication.type | string | `"internal"` | The type of authentication to use. Three built-in types are supported: internal, external, and mixed. | +| autoscaling.enabled | bool | `false` | Specifies whether automatic horizontal scaling should be enabled. Do not enable this when using in-memory version store type. | +| autoscaling.maxReplicas | int | `3` | The maximum number of replicas to maintain. | +| autoscaling.minReplicas | int | `1` | The minimum number of replicas to maintain. | +| autoscaling.targetCPUUtilizationPercentage | int | `80` | Optional; set to zero or empty to disable. | +| autoscaling.targetMemoryUtilizationPercentage | string | `nil` | Optional; set to zero or empty to disable. | +| configMapLabels | object | `{}` | Additional Labels to apply to polaris configmap. | +| containerSecurityContext | object | `{"allowPrivilegeEscalation":false,"capabilities":{"drop":["ALL"]},"runAsNonRoot":true,"runAsUser":10000,"seccompProfile":{"type":"RuntimeDefault"}}` | Security context for the polaris container. See https://kubernetes.io/docs/tasks/configure-pod-container/security-context/. | +| containerSecurityContext.runAsUser | int | `10000` | UID 10000 is compatible with Polaris OSS default images; change this if you are using a different image. | +| cors | object | `{"accessControlAllowCredentials":null,"accessControlMaxAge":null,"allowedHeaders":[],"allowedMethods":[],"allowedOrigins":[],"exposedHeaders":[]}` | Polaris CORS configuration. | +| cors.accessControlAllowCredentials | string | `nil` | The `Access-Control-Allow-Credentials` response header. The value of this header will default to `true` if `allowedOrigins` property is set and there is a match with the precise `Origin` header. | +| cors.accessControlMaxAge | string | `nil` | The `Access-Control-Max-Age` response header value indicating how long the results of a pre-flight request can be cached. Must be a valid duration. | +| cors.allowedHeaders | list | `[]` | HTTP headers allowed for CORS, ex: X-Custom, Content-Disposition. If this is not set or empty, all requested headers are considered allowed. | +| cors.allowedMethods | list | `[]` | HTTP methods allowed for CORS, ex: GET, PUT, POST. If this is not set or empty, all requested methods are considered allowed. | +| cors.allowedOrigins | list | `[]` | Origins allowed for CORS, e.g. http://polaris.apache.org, http://localhost:8181. In case an entry of the list is surrounded by forward slashes, it is interpreted as a regular expression. | +| cors.exposedHeaders | list | `[]` | HTTP headers exposed to the client, ex: X-Custom, Content-Disposition. The default is an empty list. | +| extraEnv | list | `[]` | Advanced configuration via Environment Variables. Extra environment variables to add to the Polaris server container. You can pass here any valid EnvVar object: https://kubernetes.io/docs/reference/generated/kubernetes-api/v1.27/#envvar-v1-core This can be useful to get configuration values from Kubernetes secrets or config maps. | +| extraInitContainers | list | `[]` | Add additional init containers to the polaris pod(s) See https://kubernetes.io/docs/concepts/workloads/pods/init-containers/. | +| extraServices | list | `[]` | Additional service definitions. All service definitions always select all Polaris pods. Use this if you need to expose specific ports with different configurations, e.g. expose polaris-http with an alternate LoadBalancer service instead of ClusterIP. | +| extraVolumeMounts | list | `[]` | Extra volume mounts to add to the polaris container. See https://kubernetes.io/docs/concepts/storage/volumes/. | +| extraVolumes | list | `[]` | Extra volumes to add to the polaris pod. See https://kubernetes.io/docs/concepts/storage/volumes/. | +| features | object | `{"realmOverrides":{}}` | Polaris features configuration. | +| features.realmOverrides | object | `{}` | Features to enable or disable per realm. This field is a map of maps. The realm name is the key, and the value is a map of feature names to values. If a feature is not present in the map, the default value from the 'defaults' field is used. | +| fileIo | object | `{"type":"default"}` | Polaris FileIO configuration. | +| fileIo.type | string | `"default"` | The type of file IO to use. Two built-in types are supported: default and wasb. The wasb one translates WASB paths to ABFS ones. | +| image.configDir | string | `"/deployments/config"` | The path to the directory where the application.properties file, and other configuration files, if any, should be mounted. Note: if you are using EclipseLink, then this value must be at least two folders down to the root folder, e.g. `/deployments/config` is OK, whereas `/deployments` is not. | +| image.pullPolicy | string | `"IfNotPresent"` | The image pull policy. | +| image.repository | string | `"apache/polaris"` | The image repository to pull from. | +| image.tag | string | `"latest"` | The image tag. | +| imagePullSecrets | list | `[]` | References to secrets in the same namespace to use for pulling any of the images used by this chart. Each entry is a LocalObjectReference to an existing secret in the namespace. The secret must contain a .dockerconfigjson key with a base64-encoded Docker configuration file. See https://kubernetes.io/docs/tasks/configure-pod-container/pull-image-private-registry/ for more information. | +| ingress.annotations | object | `{}` | Annotations to add to the ingress. | +| ingress.className | string | `""` | Specifies the ingressClassName; leave empty if you don't want to customize it | +| ingress.enabled | bool | `false` | Specifies whether an ingress should be created. | +| ingress.hosts | list | `[{"host":"chart-example.local","paths":[]}]` | A list of host paths used to configure the ingress. | +| ingress.tls | list | `[]` | A list of TLS certificates; each entry has a list of hosts in the certificate, along with the secret name used to terminate TLS traffic on port 443. | +| livenessProbe | object | `{"failureThreshold":3,"initialDelaySeconds":5,"periodSeconds":10,"successThreshold":1,"terminationGracePeriodSeconds":30,"timeoutSeconds":10}` | Configures the liveness probe for polaris pods. | +| livenessProbe.failureThreshold | int | `3` | Minimum consecutive failures for the probe to be considered failed after having succeeded. Minimum value is 1. | +| livenessProbe.initialDelaySeconds | int | `5` | Number of seconds after the container has started before liveness probes are initiated. Minimum value is 0. | +| livenessProbe.periodSeconds | int | `10` | How often (in seconds) to perform the probe. Minimum value is 1. | +| livenessProbe.successThreshold | int | `1` | Minimum consecutive successes for the probe to be considered successful after having failed. Minimum value is 1. | +| livenessProbe.terminationGracePeriodSeconds | int | `30` | Optional duration in seconds the pod needs to terminate gracefully upon probe failure. Minimum value is 1. | +| livenessProbe.timeoutSeconds | int | `10` | Number of seconds after which the probe times out. Minimum value is 1. | +| logging | object | `{"categories":{"org.apache.iceberg.rest":"INFO","org.apache.polaris":"INFO"},"console":{"enabled":true,"format":"%d{yyyy-MM-dd HH:mm:ss,SSS} %-5p [%c{3.}] [%X{requestId},%X{realmId}] [%X{traceId},%X{parentId},%X{spanId},%X{sampled}] (%t) %s%e%n","json":false,"threshold":"ALL"},"file":{"enabled":false,"fileName":"polaris.log","format":"%d{yyyy-MM-dd HH:mm:ss,SSS} %-5p [%c{3.}] [%X{requestId},%X{realmId}] [%X{traceId},%X{parentId},%X{spanId},%X{sampled}] (%t) %s%e%n","json":false,"logsDir":"/deployments/logs","rotation":{"fileSuffix":null,"maxBackupIndex":5,"maxFileSize":"100Mi"},"storage":{"className":"standard","selectorLabels":{},"size":"512Gi"},"threshold":"ALL"},"level":"INFO","mdc":{},"requestIdHeaderName":"Polaris-Request-Id"}` | Logging configuration. | +| logging.categories | object | `{"org.apache.iceberg.rest":"INFO","org.apache.polaris":"INFO"}` | Configuration for specific log categories. | +| logging.console | object | `{"enabled":true,"format":"%d{yyyy-MM-dd HH:mm:ss,SSS} %-5p [%c{3.}] [%X{requestId},%X{realmId}] [%X{traceId},%X{parentId},%X{spanId},%X{sampled}] (%t) %s%e%n","json":false,"threshold":"ALL"}` | Configuration for the console appender. | +| logging.console.enabled | bool | `true` | Whether to enable the console appender. | +| logging.console.format | string | `"%d{yyyy-MM-dd HH:mm:ss,SSS} %-5p [%c{3.}] [%X{requestId},%X{realmId}] [%X{traceId},%X{parentId},%X{spanId},%X{sampled}] (%t) %s%e%n"` | The log format to use. Ignored if JSON format is enabled. See https://quarkus.io/guides/logging#logging-format for details. | +| logging.console.json | bool | `false` | Whether to log in JSON format. | +| logging.console.threshold | string | `"ALL"` | The log level of the console appender. | +| logging.file | object | `{"enabled":false,"fileName":"polaris.log","format":"%d{yyyy-MM-dd HH:mm:ss,SSS} %-5p [%c{3.}] [%X{requestId},%X{realmId}] [%X{traceId},%X{parentId},%X{spanId},%X{sampled}] (%t) %s%e%n","json":false,"logsDir":"/deployments/logs","rotation":{"fileSuffix":null,"maxBackupIndex":5,"maxFileSize":"100Mi"},"storage":{"className":"standard","selectorLabels":{},"size":"512Gi"},"threshold":"ALL"}` | Configuration for the file appender. | +| logging.file.enabled | bool | `false` | Whether to enable the file appender. | +| logging.file.fileName | string | `"polaris.log"` | The log file name. | +| logging.file.format | string | `"%d{yyyy-MM-dd HH:mm:ss,SSS} %-5p [%c{3.}] [%X{requestId},%X{realmId}] [%X{traceId},%X{parentId},%X{spanId},%X{sampled}] (%t) %s%e%n"` | The log format to use. Ignored if JSON format is enabled. See https://quarkus.io/guides/logging#logging-format for details. | +| logging.file.json | bool | `false` | Whether to log in JSON format. | +| logging.file.logsDir | string | `"/deployments/logs"` | The local directory where log files are stored. The persistent volume claim will be mounted here. | +| logging.file.rotation | object | `{"fileSuffix":null,"maxBackupIndex":5,"maxFileSize":"100Mi"}` | Log rotation configuration. | +| logging.file.rotation.fileSuffix | string | `nil` | An optional suffix to append to the rotated log files. If present, the rotated log files will be grouped in time buckets, and each bucket will contain at most maxBackupIndex files. The suffix must be in a date-time format that is understood by DateTimeFormatter. If the suffix ends with .gz or .zip, the rotated files will also be compressed using the corresponding algorithm. | +| logging.file.rotation.maxBackupIndex | int | `5` | The maximum number of backup files to keep. | +| logging.file.rotation.maxFileSize | string | `"100Mi"` | The maximum size of the log file before it is rotated. Should be expressed as a Kubernetes quantity. | +| logging.file.storage | object | `{"className":"standard","selectorLabels":{},"size":"512Gi"}` | The log storage configuration. A persistent volume claim will be created using these settings. | +| logging.file.storage.className | string | `"standard"` | The storage class name of the persistent volume claim to create. | +| logging.file.storage.selectorLabels | object | `{}` | Labels to add to the persistent volume claim spec selector; a persistent volume with matching labels must exist. Leave empty if using dynamic provisioning. | +| logging.file.storage.size | string | `"512Gi"` | The size of the persistent volume claim to create. | +| logging.file.threshold | string | `"ALL"` | The log level of the file appender. | +| logging.level | string | `"INFO"` | The log level of the root category, which is used as the default log level for all categories. | +| logging.mdc | object | `{}` | Configuration for MDC (Mapped Diagnostic Context). Values specified here will be added to the log context of all incoming requests and can be used in log patterns. | +| logging.requestIdHeaderName | string | `"Polaris-Request-Id"` | The header name to use for the request ID. | +| managementService | object | `{"annotations":{},"clusterIP":"None","externalTrafficPolicy":null,"internalTrafficPolicy":null,"ports":[{"name":"polaris-mgmt","nodePort":null,"port":8182,"protocol":null,"targetPort":null}],"sessionAffinity":null,"trafficDistribution":null,"type":"ClusterIP"}` | Management service settings. These settings are used to configure liveness and readiness probes, and to configure the dedicated headless service that will expose health checks and metrics, e.g. for metrics scraping and service monitoring. | +| managementService.annotations | object | `{}` | Annotations to add to the service. | +| managementService.clusterIP | string | `"None"` | By default, the management service is headless, i.e. it does not have a cluster IP. This is generally the right option for exposing health checks and metrics, e.g. for metrics scraping and service monitoring. | +| managementService.ports | list | `[{"name":"polaris-mgmt","nodePort":null,"port":8182,"protocol":null,"targetPort":null}]` | The ports the management service will listen on. At least one port is required; the first port implicitly becomes the HTTP port that the application will use for serving management requests. By default, it's 8182. Note: port names must be unique and no more than 15 characters long. | +| managementService.ports[0] | object | `{"name":"polaris-mgmt","nodePort":null,"port":8182,"protocol":null,"targetPort":null}` | The name of the management port. Required. | +| managementService.ports[0].nodePort | string | `nil` | The port on each node on which this service is exposed when type is NodePort or LoadBalancer. Usually assigned by the system. If not specified, a port will be allocated if this Service requires one. If this field is specified when creating a Service which does not need it, creation will fail. | +| managementService.ports[0].port | int | `8182` | The port the management service listens on. By default, the management interface is exposed on HTTP port 8182. | +| managementService.ports[0].protocol | string | `nil` | The IP protocol for this port. Supports "TCP", "UDP", and "SCTP". Default is TCP. | +| managementService.ports[0].targetPort | string | `nil` | Number or name of the port to access on the pods targeted by the service. If this is a string, it will be looked up as a named port in the target Pod's container ports. If this is not specified, the value of the 'port' field is used. | +| managementService.type | string | `"ClusterIP"` | The type of service to create. Valid values are: ExternalName, ClusterIP, NodePort, and LoadBalancer. The default value is ClusterIP. | +| metrics.enabled | bool | `true` | Specifies whether metrics for the polaris server should be enabled. | +| metrics.tags | object | `{}` | Additional tags (dimensional labels) to add to the metrics. | +| nodeSelector | object | `{}` | Node labels which must match for the polaris pod to be scheduled on that node. See https://kubernetes.io/docs/concepts/scheduling-eviction/assign-pod-node/#nodeselector. | +| oidc | object | `{"authServeUrl":null,"client":{"id":"polaris","secret":{"key":"clientSecret","name":null}},"principalMapper":{"idClaimPath":null,"nameClaimPath":null,"type":"default"},"principalRolesMapper":{"filter":null,"mappings":[],"rolesClaimPath":null,"type":"default"}}` | Polaris OIDC configuration. Only relevant when at least one realm is configured for external (or mixed) authentication. The currently supported configuration is for a single, default OIDC tenant. For more complex scenarios, including OIDC multi-tenancy, you will need to provide the relevant configuration using the `advancedConfig` section. | +| oidc.authServeUrl | string | `nil` | The authentication server URL. Must be provided if at least one realm is configured for external authentication. | +| oidc.client | object | `{"id":"polaris","secret":{"key":"clientSecret","name":null}}` | The client to use when authenticating with the authentication server. | +| oidc.client.id | string | `"polaris"` | The client ID to use when contacting the authentication server's introspection endpoint in order to validate tokens. | +| oidc.client.secret | object | `{"key":"clientSecret","name":null}` | The secret to pull the client secret from. If no client secret is required, leave the secret name unset. | +| oidc.client.secret.key | string | `"clientSecret"` | The key name inside the secret to pull the client secret from. | +| oidc.client.secret.name | string | `nil` | The name of the secret to pull the client secret from. If not provided, the client is assumed to not require a client secret when contacting the introspection endpoint. | +| oidc.principalMapper | object | `{"idClaimPath":null,"nameClaimPath":null,"type":"default"}` | Principal mapping configuration. | +| oidc.principalMapper.idClaimPath | string | `nil` | The path to the claim that contains the principal ID. Nested paths can be expressed using "/" as a separator, e.g. "polaris/principal_id" would look for the "principal_id" field inside the "polaris" object in the token claims. Optional. Either this option or `nameClaimPath` (or both) must be provided. | +| oidc.principalMapper.nameClaimPath | string | `nil` | The claim that contains the principal name. Nested paths can be expressed using "/" as a separator, e.g. "polaris/principal_name" would look for the "principal_name" field inside the "polaris" object in the token claims. Optional. Either this option or `idClaimPath` (or both) must be provided. | +| oidc.principalMapper.type | string | `"default"` | The `PrincipalMapper` implementation to use. Only one built-in type is supported: default. | +| oidc.principalRolesMapper | object | `{"filter":null,"mappings":[],"rolesClaimPath":null,"type":"default"}` | Principal roles mapping configuration. | +| oidc.principalRolesMapper.filter | string | `nil` | A regular expression that matches the role names in the identity. Only roles that match this regex will be included in the Polaris-specific roles. | +| oidc.principalRolesMapper.mappings | list | `[]` | A list of regex mappings that will be applied to each role name in the identity. This can be used to transform the role names in the identity into role names as expected by Polaris. The default Authenticator expects the security identity to expose role names in the format `POLARIS_ROLE:`. | +| oidc.principalRolesMapper.rolesClaimPath | string | `nil` | The path to the claim that contains the principal roles. Nested paths can be expressed using "/" as a separator, e.g. "polaris/principal_roles" would look for the "principal_roles" field inside the "polaris" object in the token claims. If not set, Quarkus looks for roles in standard locations. See https://quarkus.io/guides/security-oidc-bearer-token-authentication#token-claims-and-security-identity-roles. | +| oidc.principalRolesMapper.type | string | `"default"` | The `PrincipalRolesMapper` implementation to use. Only one built-in type is supported: default. | +| persistence | object | `{"relationalJdbc":{"secret":{"jdbcUrl":"jdbcUrl","name":null,"password":"password","username":"username"}},"type":"in-memory"}` | Polaris persistence configuration. | +| persistence.relationalJdbc | object | `{"secret":{"jdbcUrl":"jdbcUrl","name":null,"password":"password","username":"username"}}` | The configuration for the relational-jdbc persistence manager. | +| persistence.relationalJdbc.secret | object | `{"jdbcUrl":"jdbcUrl","name":null,"password":"password","username":"username"}` | The secret name to pull the database connection properties from. | +| persistence.relationalJdbc.secret.jdbcUrl | string | `"jdbcUrl"` | The secret key holding the database JDBC connection URL | +| persistence.relationalJdbc.secret.name | string | `nil` | The secret name to pull database connection properties from | +| persistence.relationalJdbc.secret.password | string | `"password"` | The secret key holding the database password for authentication | +| persistence.relationalJdbc.secret.username | string | `"username"` | The secret key holding the database username for authentication | +| persistence.type | string | `"in-memory"` | The type of persistence to use. Two built-in types are supported: in-memory and relational-jdbc. The eclipse-link type is also supported but is deprecated. | +| podAnnotations | object | `{}` | Annotations to apply to polaris pods. | +| podLabels | object | `{}` | Additional Labels to apply to polaris pods. | +| podSecurityContext | object | `{"fsGroup":10001,"seccompProfile":{"type":"RuntimeDefault"}}` | Security context for the polaris pod. See https://kubernetes.io/docs/tasks/configure-pod-container/security-context/. | +| podSecurityContext.fsGroup | int | `10001` | GID 10001 is compatible with Polaris OSS default images; change this if you are using a different image. | +| rateLimiter | object | `{"tokenBucket":{"requestsPerSecond":9999,"type":"default","window":"PT10S"},"type":"no-op"}` | Polaris rate limiter configuration. | +| rateLimiter.tokenBucket | object | `{"requestsPerSecond":9999,"type":"default","window":"PT10S"}` | The configuration for the default rate limiter, which uses the token bucket algorithm with one bucket per realm. | +| rateLimiter.tokenBucket.requestsPerSecond | int | `9999` | The maximum number of requests per second allowed for each realm. | +| rateLimiter.tokenBucket.type | string | `"default"` | The type of the token bucket rate limiter. Only the default type is supported out of the box. | +| rateLimiter.tokenBucket.window | string | `"PT10S"` | The time window. | +| rateLimiter.type | string | `"no-op"` | The type of rate limiter filter to use. Two built-in types are supported: default and no-op. | +| readinessProbe | object | `{"failureThreshold":3,"initialDelaySeconds":5,"periodSeconds":10,"successThreshold":1,"timeoutSeconds":10}` | Configures the readiness probe for polaris pods. | +| readinessProbe.failureThreshold | int | `3` | Minimum consecutive failures for the probe to be considered failed after having succeeded. Minimum value is 1. | +| readinessProbe.initialDelaySeconds | int | `5` | Number of seconds after the container has started before readiness probes are initiated. Minimum value is 0. | +| readinessProbe.periodSeconds | int | `10` | How often (in seconds) to perform the probe. Minimum value is 1. | +| readinessProbe.successThreshold | int | `1` | Minimum consecutive successes for the probe to be considered successful after having failed. Minimum value is 1. | +| readinessProbe.timeoutSeconds | int | `10` | Number of seconds after which the probe times out. Minimum value is 1. | +| realmContext | object | `{"realms":["POLARIS"],"type":"default"}` | Realm context resolver configuration. | +| realmContext.realms | list | `["POLARIS"]` | List of valid realms, for use with the default realm context resolver. The first realm in the list is the default realm. Realms not in this list will be rejected. | +| realmContext.type | string | `"default"` | The type of realm context resolver to use. Two built-in types are supported: default and test; test is not recommended for production as it does not perform any realm validation. | +| replicaCount | int | `1` | The number of replicas to deploy (horizontal scaling). Beware that replicas are stateless; don't set this number > 1 when using in-memory meta store manager. | +| resources | object | `{}` | Configures the resources requests and limits for polaris pods. We usually recommend not to specify default resources and to leave this as a conscious choice for the user. This also increases chances charts run on environments with little resources, such as Minikube. If you do want to specify resources, uncomment the following lines, adjust them as necessary, and remove the curly braces after 'resources:'. | +| revisionHistoryLimit | string | `nil` | The number of old ReplicaSets to retain to allow rollback (if not set, the default Kubernetes value is set to 10). | +| service | object | `{"annotations":{},"clusterIP":null,"externalTrafficPolicy":null,"internalTrafficPolicy":null,"ports":[{"name":"polaris-http","nodePort":null,"port":8181,"protocol":null,"targetPort":null}],"sessionAffinity":null,"trafficDistribution":null,"type":"ClusterIP"}` | Polaris main service settings. | +| service.annotations | object | `{}` | Annotations to add to the service. | +| service.clusterIP | string | `nil` | You can specify your own cluster IP address If you define a Service that has the .spec.clusterIP set to "None" then Kubernetes does not assign an IP address. Instead, DNS records for the service will return the IP addresses of each pod targeted by the server. This is called a headless service. See https://kubernetes.io/docs/concepts/services-networking/service/#headless-services | +| service.externalTrafficPolicy | string | `nil` | Controls how traffic from external sources is routed. Valid values are Cluster and Local. The default value is Cluster. Set the field to Cluster to route traffic to all ready endpoints. Set the field to Local to only route to ready node-local endpoints. If the traffic policy is Local and there are no node-local endpoints, traffic is dropped by kube-proxy. | +| service.internalTrafficPolicy | string | `nil` | Controls how traffic from internal sources is routed. Valid values are Cluster and Local. The default value is Cluster. Set the field to Cluster to route traffic to all ready endpoints. Set the field to Local to only route to ready node-local endpoints. If the traffic policy is Local and there are no node-local endpoints, traffic is dropped by kube-proxy. | +| service.ports | list | `[{"name":"polaris-http","nodePort":null,"port":8181,"protocol":null,"targetPort":null}]` | The ports the service will listen on. At least one port is required; the first port implicitly becomes the HTTP port that the application will use for serving API requests. By default, it's 8181. Note: port names must be unique and no more than 15 characters long. | +| service.ports[0] | object | `{"name":"polaris-http","nodePort":null,"port":8181,"protocol":null,"targetPort":null}` | The name of the port. Required. | +| service.ports[0].nodePort | string | `nil` | The port on each node on which this service is exposed when type is NodePort or LoadBalancer. Usually assigned by the system. If not specified, a port will be allocated if this Service requires one. If this field is specified when creating a Service which does not need it, creation will fail. | +| service.ports[0].port | int | `8181` | The port the service listens on. By default, the HTTP port is 8181. | +| service.ports[0].protocol | string | `nil` | The IP protocol for this port. Supports "TCP", "UDP", and "SCTP". Default is TCP. | +| service.ports[0].targetPort | string | `nil` | Number or name of the port to access on the pods targeted by the service. If this is a string, it will be looked up as a named port in the target Pod's container ports. If this is not specified, the value of the 'port' field is used. | +| service.sessionAffinity | string | `nil` | The session affinity for the service. Valid values are: None, ClientIP. The default value is None. ClientIP enables sticky sessions based on the client's IP address. This is generally beneficial to Polaris deployments, but some testing may be required in order to make sure that the load is distributed evenly among the pods. Also, this setting affects only internal clients, not external ones. If Ingress is enabled, it is recommended to set sessionAffinity to None. | +| service.trafficDistribution | string | `nil` | The traffic distribution field provides another way to influence traffic routing within a Kubernetes Service. While traffic policies focus on strict semantic guarantees, traffic distribution allows you to express preferences such as routing to topologically closer endpoints. The only valid value is: PreferClose. The default value is implementation-specific. | +| service.type | string | `"ClusterIP"` | The type of service to create. Valid values are: ExternalName, ClusterIP, NodePort, and LoadBalancer. The default value is ClusterIP. | +| serviceAccount.annotations | object | `{}` | Annotations to add to the service account. | +| serviceAccount.create | bool | `true` | Specifies whether a service account should be created. | +| serviceAccount.name | string | `""` | The name of the service account to use. If not set and create is true, a name is generated using the fullname template. | +| serviceMonitor.enabled | bool | `true` | Specifies whether a ServiceMonitor for Prometheus operator should be created. | +| serviceMonitor.interval | string | `""` | The scrape interval; leave empty to let Prometheus decide. Must be a valid duration, e.g. 1d, 1h30m, 5m, 10s. | +| serviceMonitor.labels | object | `{}` | Labels for the created ServiceMonitor so that Prometheus operator can properly pick it up. | +| serviceMonitor.metricRelabelings | list | `[]` | Relabeling rules to apply to metrics. Ref https://prometheus.io/docs/prometheus/latest/configuration/configuration/#relabel_config. | +| storage | object | `{"secret":{"awsAccessKeyId":null,"awsSecretAccessKey":null,"gcpToken":null,"gcpTokenLifespan":null,"name":null}}` | Storage credentials for the server. If the following properties are unset, default credentials will be used, in which case the pod must have the necessary permissions to access the storage. | +| storage.secret | object | `{"awsAccessKeyId":null,"awsSecretAccessKey":null,"gcpToken":null,"gcpTokenLifespan":null,"name":null}` | The secret to pull storage credentials from. | +| storage.secret.awsAccessKeyId | string | `nil` | The key in the secret to pull the AWS access key ID from. Only required when using AWS. | +| storage.secret.awsSecretAccessKey | string | `nil` | The key in the secret to pull the AWS secret access key from. Only required when using AWS. | +| storage.secret.gcpToken | string | `nil` | The key in the secret to pull the GCP token from. Only required when using GCP. | +| storage.secret.gcpTokenLifespan | string | `nil` | The key in the secret to pull the GCP token expiration time from. Only required when using GCP. Must be a valid ISO 8601 duration. The default is PT1H (1 hour). | +| storage.secret.name | string | `nil` | The name of the secret to pull storage credentials from. | +| tasks | object | `{"maxConcurrentTasks":null,"maxQueuedTasks":null}` | Polaris asynchronous task executor configuration. | +| tasks.maxConcurrentTasks | string | `nil` | The maximum number of concurrent tasks that can be executed at the same time. The default is the number of available cores. | +| tasks.maxQueuedTasks | string | `nil` | The maximum number of tasks that can be queued up for execution. The default is Integer.MAX_VALUE. | +| tolerations | list | `[]` | A list of tolerations to apply to polaris pods. See https://kubernetes.io/docs/concepts/scheduling-eviction/taint-and-toleration/. | +| tracing.attributes | object | `{}` | Resource attributes to identify the polaris service among other tracing sources. See https://opentelemetry.io/docs/reference/specification/resource/semantic_conventions/#service. If left empty, traces will be attached to a service named "Apache Polaris"; to change this, provide a service.name attribute here. | +| tracing.enabled | bool | `false` | Specifies whether tracing for the polaris server should be enabled. | +| tracing.endpoint | string | `"http://otlp-collector:4317"` | The collector endpoint URL to connect to (required). The endpoint URL must have either the http:// or the https:// scheme. The collector must talk the OpenTelemetry protocol (OTLP) and the port must be its gRPC port (by default 4317). See https://quarkus.io/guides/opentelemetry for more information. | +| tracing.sample | string | `"1.0d"` | Which requests should be sampled. Valid values are: "all", "none", or a ratio between 0.0 and "1.0d" (inclusive). E.g. "0.5d" means that 50% of the requests will be sampled. Note: avoid entering numbers here, always prefer a string representation of the ratio. | diff --git a/1.2.0/managing-security/_index.md b/1.2.0/managing-security/_index.md new file mode 100644 index 0000000000..3a10c8900b --- /dev/null +++ b/1.2.0/managing-security/_index.md @@ -0,0 +1,28 @@ +--- +# +# Licensed to the Apache Software Foundation (ASF) under one +# or more contributor license agreements. See the NOTICE file +# distributed with this work for additional information +# regarding copyright ownership. The ASF licenses this file +# to you under the Apache License, Version 2.0 (the +# "License"); you may not use this file except in compliance +# with the License. You may obtain a copy of the License at +# +# http://www.apache.org/licenses/LICENSE-2.0 +# +# Unless required by applicable law or agreed to in writing, +# software distributed under the License is distributed on an +# "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY +# KIND, either express or implied. See the License for the +# specific language governing permissions and limitations +# under the License. +# +title: Managing Security +linkTitle: Managing Security +type: docs +weight: 550 +--- + +## [Access Control]({{< relref "access-control" >}}) + +## [Authentification and Identity Providers]({{< relref "external-idp" >}}) \ No newline at end of file diff --git a/1.2.0/managing-security/access-control.md b/1.2.0/managing-security/access-control.md new file mode 100644 index 0000000000..b8a1b697ca --- /dev/null +++ b/1.2.0/managing-security/access-control.md @@ -0,0 +1,201 @@ +--- +# +# Licensed to the Apache Software Foundation (ASF) under one +# or more contributor license agreements. See the NOTICE file +# distributed with this work for additional information +# regarding copyright ownership. The ASF licenses this file +# to you under the Apache License, Version 2.0 (the +# "License"); you may not use this file except in compliance +# with the License. You may obtain a copy of the License at +# +# http://www.apache.org/licenses/LICENSE-2.0 +# +# Unless required by applicable law or agreed to in writing, +# software distributed under the License is distributed on an +# "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY +# KIND, either express or implied. See the License for the +# specific language governing permissions and limitations +# under the License. +# +title: Role-Based Access Control +linkTitle: Access Control +type: docs +weight: 200 +--- + +This section provides information about how access control works for Apache Polaris (Incubating). + +Polaris uses a role-based access control (RBAC) model in which the Polaris administrator assigns access privileges to catalog roles +and then grants access to resources to principals by assigning catalog roles to principal roles. + +These are the key concepts to understanding access control in Polaris: + +- **Securable object** +- **Principal role** +- **Catalog role** +- **Privilege** + +## Securable object + +A securable object is an object to which access can be granted. Polaris +has the following securable objects: + +- Catalog +- Namespace +- Iceberg table +- View +- Policy + +## Principal role + +A principal role is a resource in Polaris that you can use to logically group Polaris principals together and grant privileges on +securable objects. + +Polaris supports a many-to-many relationship between principals and principal roles. For example, to grant the same privileges to +multiple principals, you can assign a single principal role to those principals. Likewise, a principal can be granted +multiple principal roles. + +You don't grant privileges directly to a principal role. Instead, you configure object permissions at the catalog role level, and then grant +catalog roles to a principal role. + +The following table shows examples of principal roles that you might configure in Polaris: + +| Principal role name | Description | +| -----------------------| ----------- | +| Data_engineer | A role that is granted to multiple principals for running data engineering jobs. | +| Data_scientist | A role that is granted to multiple principals for running data science or AI jobs. | + +## Catalog role + +A catalog role belongs to a particular catalog resource in Polaris and specifies a set of permissions for actions on the catalog or objects +in the catalog, such as catalog namespaces or tables. You can create one or more catalog roles for a catalog. + +You grant privileges to a catalog role and then grant the catalog role to a principal role to bestow the privileges to one or more principals. + +Polaris also supports a many-to-many relationship between catalog roles and principal roles. You can grant the same catalog role to one or more +principal roles. Likewise, a principal role can be granted to one or more catalog roles. + +The following table displays examples of catalog roles that you might +configure in Polaris: + +| Example Catalog role | Description| +| -----------------------|-----------| +| Catalog administrators | A role that has been granted multiple privileges to emulate full access to the catalog.
Principal roles that have been granted this role are permitted to create, alter, read, write, and drop tables in the catalog. | +| Catalog readers | A role that has been granted read-only privileges to tables in the catalog.
Principal roles that have been granted this role are allowed to read from tables in the catalog. | +| Catalog contributor | A role that has been granted read and write access privileges to all tables that belong to the catalog.
Principal roles that have been granted this role are allowed to perform read and write operations on tables in the catalog. | + +## RBAC model + +The following diagram illustrates the RBAC model used by Polaris. For each catalog, the Polaris administrator assigns access +privileges to catalog roles and then grants principals access to resources by assigning catalog roles to principal roles. Polaris +supports a many-to-many relationship between principals and principal roles. + +![Diagram that shows the RBAC model for Apache Polaris.](/img/rbac-model.svg "Apache Polaris RBAC model") + +## Access control privileges + +This section describes the privileges that are available in the Polaris access control model. Privileges are granted to catalog roles, catalog +roles are granted to principal roles, and principal roles are granted to principals to specify the operations that principals can +perform on objects in Polaris. + +To grant the full set of privileges (drop, list, read, write, etc.) on an object, you can use the *full privilege* option. + +### Table privileges + +| Privilege | Description | +| --------- | ----------- | +| TABLE_CREATE | Enables registering a table with the catalog. | +| TABLE_DROP | Enables dropping a table from the catalog. | +| TABLE_LIST | Enables listing any table in the catalog. | +| TABLE_READ_PROPERTIES | Enables reading properties of the table. | +| TABLE_WRITE_PROPERTIES | Enables configuring properties for the table. | +| TABLE_READ_DATA | Enables reading data from the table by receiving short-lived read-only storage credentials from the catalog. | +| TABLE_WRITE_DATA | Enables writing data to the table by receiving short-lived read+write storage credentials from the catalog. | +| TABLE_FULL_METADATA | Grants all table privileges, except TABLE_READ_DATA and TABLE_WRITE_DATA, which need to be granted individually. | +| TABLE_ATTACH_POLICY | Enables attaching policy to a table. | +| TABLE_DETACH_POLICY | Enables detaching policy from a table. | + +### View privileges + +| Privilege | Description | +| --------- | ----------- | +| VIEW_CREATE | Enables registering a view with the catalog. | +| VIEW_DROP | Enables dropping a view from the catalog. | +| VIEW_LIST | Enables listing any views in the catalog. | +| VIEW_READ_PROPERTIES | Enables reading all the view properties. | +| VIEW_WRITE_PROPERTIES | Enables configuring view properties. | +| VIEW_FULL_METADATA | Grants all view privileges. | + +### Namespace privileges + +| Privilege | Description | +| --------- | ----------- | +| NAMESPACE_CREATE | Enables creating a namespace in a catalog. | +| NAMESPACE_DROP | Enables dropping the namespace from the catalog. | +| NAMESPACE_LIST | Enables listing any object in the namespace, including nested namespaces and tables. | +| NAMESPACE_READ_PROPERTIES | Enables reading all the namespace properties. | +| NAMESPACE_WRITE_PROPERTIES | Enables configuring namespace properties. | +| NAMESPACE_FULL_METADATA | Grants all namespace privileges. | +| NAMESPACE_ATTACH_POLICY | Enables attaching policy to a namespace. | +| NAMESPACE_DETACH_POLICY | Enables detaching policy from a namespace. | + +### Catalog privileges + +| Privilege | Description | +| -----------------------| ----------- | +| CATALOG_MANAGE_ACCESS | Includes the ability to grant or revoke privileges on objects in a catalog to catalog roles, and the ability to grant or revoke catalog roles to or from principal roles. | +| CATALOG_MANAGE_CONTENT | Enables full management of content for the catalog. This privilege encompasses the following privileges:
  • CATALOG_MANAGE_METADATA
  • TABLE_FULL_METADATA
  • NAMESPACE_FULL_METADATA
  • VIEW_FULL_METADATA
  • TABLE_WRITE_DATA
  • TABLE_READ_DATA
  • CATALOG_READ_PROPERTIES
  • CATALOG_WRITE_PROPERTIES
| +| CATALOG_MANAGE_METADATA | Enables full management of the catalog, catalog roles, namespaces, and tables. | +| CATALOG_READ_PROPERTIES | Enables listing catalogs and reading properties of the catalog. | +| CATALOG_WRITE_PROPERTIES | Enables configuring catalog properties. | +| CATALOG_ATTACH_POLICY | Enables attaching policy to a catalog. | +| CATALOG_DETACH_POLICY | Enables detaching policy from a catalog. | + +### Policy privileges + +| Privilege | Description | +| -----------------------| ----------- | +| POLICY_CREATE | Enables creating a policy under specified namespace. | +| POLICY_READ | Enables reading policy content and metadata. | +| POLICY_WRITE | Enables updating the policy details such as its content or description. | +| POLICY_LIST | Enables listing any policy from the catalog. | +| POLICY_DROP | Enables dropping a policy if it is not attached to any resource entity. | +| POLICY_FULL_METADATA | Grants all policy privileges. | +| POLICY_ATTACH | Enables policy to be attached to entities. | +| POLICY_DETACH | Enables policy to be detached from entities. | + +## RBAC example + +The following diagram illustrates how RBAC works in Polaris and +includes the following users: + +- **Alice:** A service admin who signs up for Polaris. Alice can + create principals. She can also create catalogs and + namespaces and configure access control for Polaris resources. + +- **Bob:** A data engineer who uses Apache Spark™ to + interact with Polaris. + + - Alice has created a principal for Bob. It has been + granted the Data_engineer principal role, which in turn has been + granted the following catalog roles: Catalog contributor and + Data administrator (for both the Silver and Gold zone catalogs + in the following diagram). + + - The Catalog contributor role grants permission to create + namespaces and tables in the Bronze zone catalog. + + - The Data administrator roles grant full administrative rights to + the Silver zone catalog and Gold zone catalog. + +- **Mark:** A data scientist who uses trains models with data managed + by Polaris. + + - Alice has created a principal for Mark. It has been + granted the Data_scientist principal role, which in turn has + been granted the catalog role named Catalog reader. + + - The Catalog reader role grants read-only access for a catalog + named Gold zone catalog. + +![Diagram that shows an example of how RBAC works in Apache Polaris.](/img/rbac-example.svg "Apache Polaris RBAC example") diff --git a/1.2.0/managing-security/external-idp/_index.md b/1.2.0/managing-security/external-idp/_index.md new file mode 100644 index 0000000000..0b236cf31a --- /dev/null +++ b/1.2.0/managing-security/external-idp/_index.md @@ -0,0 +1,255 @@ +--- +# +# Licensed to the Apache Software Foundation (ASF) under one +# or more contributor license agreements. See the NOTICE file +# distributed with this work for additional information +# regarding copyright ownership. The ASF licenses this file +# to you under the Apache License, Version 2.0 (the +# "License"); you may not use this file except in compliance +# with the License. You may obtain a copy of the License at +# +# http://www.apache.org/licenses/LICENSE-2.0 +# +# Unless required by applicable law or agreed to in writing, +# software distributed under the License is distributed on an +# "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY +# KIND, either express or implied. See the License for the +# specific language governing permissions and limitations +# under the License. +# +title: Identity Providers +linkTitle: Identity Providers +type: docs +weight: 300 +--- + +Apache Polaris supports authentication via external identity providers (IdPs) using OpenID Connect (OIDC) in addition to the internal authentication system. This feature enables flexible identity federation with enterprise IdPs and allows gradual migration or hybrid authentication strategies across realms in Polaris. + +## Authentication Types + +Polaris supports three authentication modes: + +1. `internal` (Default) + - Only Polaris internal authentication is used. +2. `external` + - Authenticates using external OIDC providers (via Quarkus OIDC). + - Disables the internal token endpoint (returns HTTP 501). +3. `mixed` + - Tries internal authentication first; if this fails, it falls back to OIDC. + +Authentication can be configured globally or per realm by setting the following properties: + +```properties +# Global default +polaris.authentication.type=internal +# Per-realm override +polaris.authentication.realm1.type=external +polaris.authentication.realm2.type=mixed +``` + +## Key Components + +### Authenticator + +The `Authenticator` is a component responsible for resolving the principal and the principal roles, and for creating a `PolarisPrincipal` from the credentials provided by the authentication process. It is a central component and is invoked for all types of authentication. + +The `type` property is used to define the `Authenticator` implementation. It is overridable per realm: + +```properties +polaris.authentication.authenticator.type=default +polaris.authentication.realm1.authenticator.type=custom +``` + +## Internal Authentication Configuration + +### Token Broker + +The `TokenBroker` signs and verifies tokens to ensure that they can be validated and remain unaltered. + +```properties +polaris.authentication.token-broker.type=rsa-key-pair +polaris.authentication.token-broker.max-token-generation=PT1H +``` + +Two types are available: + +- `rsa-key-pair` (recommended for production): Uses an RSA key pair for token signing and validation. +- `symmetric-key`: Uses a shared secret for both operations; suitable for single-node deployments or testing. + +The property `polaris.authentication.token-broker.max-token-generation` specifies the maximum validity duration of tokens issued by the internal `TokenBroker`. + +- Format: ISO-8601 duration (e.g., `PT1H` for 1 hour, `PT30M` for 30 minutes). +- Default: `PT1H`. + +### Token Service + +The Token Service and `TokenServiceConfiguration` (Quarkus) is responsible for issuing and validating tokens (e.g., bearer tokens) for authenticated principals when internal authentication is used. It works in coordination with the `Authenticator` and `TokenBroker`. The default implementation is `default`, and this must be configured when using internal authentication. + +```properties +polaris.authentication.token-service.type=default +``` + +### Role Mapping + +When using internal authentication, token requests should include a `scope` parameter that specifies the roles to be activated for the principal. The `scope` parameter is a space-separated list of role names. + +The default `ActiveRolesProvider` expects role names to be in the following format: `PRINCIPAL_ROLE:`. + +For example, if the principal has the roles `service_admin` and `catalog_admin` and wants both activated, the `scope` parameter should look like this: + +```properties +scope=PRINCIPAL_ROLE:service_admin PRINCIPAL_ROLE:catalog_admin +``` + +Here is an example of a full request to the Polaris token endpoint using internal authentication: + +```http request +POST /api/catalog/v1/oauth/tokens HTTP/1.1 +Host: polaris.example.com:8181 +Content-Type: application/x-www-form-urlencoded + +grant_type=client_credentials&client_id=root&client_secret=s3cr3t&scope=PRINCIPAL_ROLE%3Aservice_admin%20PRINCIPAL_ROLE%3Acatalog_admin +``` + +## External Authentication Configuration + +External authentication is configured via Quarkus OIDC and Polaris-specific OIDC extensions. The following settings are used to integrate with an identity provider and extract identity and role information from tokens. + +### OIDC Tenant Configuration + +At least one OIDC tenant must be explicitly enabled. In Polaris, realms and OIDC tenants are distinct concepts. An OIDC tenant represents a specific identity provider configuration (e.g., `quarkus.oidc.idp1`). A [realm]({{% ref "../../realm" %}}) is a logical partition within Polaris. + +- Multiple realms can share a single OIDC tenant. +- Each realm can be associated with only one OIDC tenant. + +Therefore, multi-realm deployments can share a common identity provider while still enforcing realm-level scoping. To configure the default tenant: + +```properties +quarkus.oidc.tenant-enabled=true +quarkus.oidc.auth-server-url=https://auth.example.com/realms/polaris +quarkus.oidc.client-id=polaris +``` + +Alternatively, it is possible to use multiple named tenants. Each OIDC-named tenant is then configured with standard Quarkus settings: + +```properties +quarkus.oidc.oidc-tenant1.auth-server-url=http://localhost:8080/realms/polaris +quarkus.oidc.oidc-tenant1.client-id=client1 +quarkus.oidc.oidc-tenant1.application-type=service +``` + +When using multiple OIDC tenants, it's your responsibility to configure tenant resolution appropriately. See the [Quarkus OpenID Connect Multitenancy Guide](https://quarkus.io/guides/security-openid-connect-multitenancy#tenant-resolution). + +### Principal Mapping + +While OIDC tenant resolution is entirely delegated to Quarkus, Polaris requires additional configuration to extract the Polaris principal and its roles from the credentials generated and validated by Quarkus. This part of the authentication process is configured with Polaris-specific properties that map JWT claims to Polaris principal fields: + +```properties +polaris.oidc.principal-mapper.type=default +polaris.oidc.principal-mapper.id-claim-path=polaris/principal_id +polaris.oidc.principal-mapper.name-claim-path=polaris/principal_name +``` + +These properties are overridable per OIDC tenant: + +```properties +polaris.oidc.oidc-tenant1.principal-mapper.id-claim-path=polaris/principal_id +polaris.oidc.oidc-tenant1.principal-mapper.name-claim-path=polaris/principal_name +``` + +{{< alert important >}} +The default implementation of PrincipalMapper can only work with JWT tokens. If your IDP issues opaque tokens instead, you will need to provide a custom implementation. +{{< /alert >}} + +### Role Mapping + +Similarly, Polaris requires additional configuration to map roles provided by Quarkus to roles defined in Polaris. The process happens in two phases: first, Quarkus maps the JWT claims to security roles, using the `quarkus.oidc.roles.*` properties; then, Polaris-specific properties are used to map the Quarkus-provided security roles to Polaris roles: + +```properties +quarkus.oidc.roles.role-claim-path=polaris/roles +polaris.oidc.principal-roles-mapper.type=default +polaris.oidc.principal-roles-mapper.filter=^(?!profile$|email$).* +polaris.oidc.principal-roles-mapper.mappings[0].regex=^.*$ +polaris.oidc.principal-roles-mapper.mappings[0].replacement=PRINCIPAL_ROLE:$0 +``` + +These mappings can be overridden per OIDC tenant and used across different realms that rely on external identity providers. For example: + +```properties +polaris.oidc.oidc-tenant1.principal-roles-mapper.type=custom +polaris.oidc.oidc-tenant1.principal-roles-mapper.filter=PRINCIPAL_ROLE:.* +polaris.oidc.oidc-tenant1.principal-roles-mapper.mappings[0].regex=PRINCIPAL_ROLE:(.*) +polaris.oidc.oidc-tenant1.principal-roles-mapper.mappings[0].replacement=PRINCIPAL_ROLE:$1 +``` + +The default `Authenticator` expects the security identity to expose role names in the following format: `PRINCIPAL_ROLE:`. You can use the `filter` and `mappings` properties to adjust the role names as they appear in the JWT claims. + +For example, assume that the security identity produced by Quarkus exposes the following roles: `role_service_admin` and `role_catalog_admin`. Polaris expects `PRINCIPAL_ROLE:service_admin` and `PRINCIPAL_ROLE:catalog_admin` respectively. The following configuration can be used to achieve the desired mapping: + +```properties +# Exclude role names that don't start with "role_" +polaris.oidc.principal-roles-mapper.filter=role_.* +# Extract the text after "role_" +polaris.oidc.principal-roles-mapper.mappings[0].regex=role_(.*) +# Replace the extracted text with "PRINCIPAL_ROLE:" +polaris.oidc.principal-roles-mapper.mappings[0].replacement=PRINCIPAL_ROLE:$1 +``` + +See more examples below. + +### Example JWT Mappings + +#### Example 1: Custom Claim Paths + +- JWT + + ```json + { + "polaris": + { + "roles": ["PRINCIPAL_ROLE:ALL"], + "principal_name": "root", + "principal_id": 1 + } + } + ``` + +- Configuration + + ```properties + quarkus.oidc.roles.role-claim-path=polaris/roles + polaris.oidc.principal-mapper.id-claim-path=polaris/principal_id + polaris.oidc.principal-mapper.name-claim-path=polaris/principal_name + ``` + +#### Example 2: Generic OIDC Claims + +- JWT + + ```json + { + "sub": "1", + "scope": "service_admin catalog_admin profile email", + "preferred_username": "root" + } + ``` + +- Configuration + + ```properties + quarkus.oidc.roles.role-claim-path=scope + polaris.oidc.principal-mapper.id-claim-path=sub + polaris.oidc.principal-mapper.name-claim-path=preferred_username + polaris.oidc.principal-roles-mapper.filter=^(?!profile$|email$).* + polaris.oidc.principal-roles-mapper.mappings[0].regex=^.*$ + polaris.oidc.principal-roles-mapper.mappings[0].replacement=PRINCIPAL_ROLE:$0 + ``` + +- Result + + Polaris roles: `PRINCIPAL_ROLE:service_admin` and `PRINCIPAL_ROLE:catalog_admin` + +### Additional Links + +* For complete Keycloak integration example, see: [Keycloak External IDP Configuration Guide]({{< relref "keycloak-idp.md" >}}) +* See [Developer Notes]({{< relref "idp-dev-notes.md" >}}) with internal implementation details for developers who want to understand or extend Polaris authentication. \ No newline at end of file diff --git a/1.2.0/managing-security/external-idp/idp-dev-notes.md b/1.2.0/managing-security/external-idp/idp-dev-notes.md new file mode 100644 index 0000000000..16bc759b8d --- /dev/null +++ b/1.2.0/managing-security/external-idp/idp-dev-notes.md @@ -0,0 +1,122 @@ +--- +# +# Licensed to the Apache Software Foundation (ASF) under one +# or more contributor license agreements. See the NOTICE file +# distributed with this work for additional information +# regarding copyright ownership. The ASF licenses this file +# to you under the Apache License, Version 2.0 (the +# "License"); you may not use this file except in compliance +# with the License. You may obtain a copy of the License at +# +# http://www.apache.org/licenses/LICENSE-2.0 +# +# Unless required by applicable law or agreed to in writing, +# software distributed under the License is distributed on an +# "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY +# KIND, either express or implied. See the License for the +# specific language governing permissions and limitations +# under the License. +# +title: Authentification Development Details +linkTitle: Development Details +type: docs +weight: 301 +--- + +## Developer Architecture Notes + +### Authentication Architecture + +Polaris separates authentication into two logical phases using [Quarkus Security](https://quarkus.io/guides/security-overview): + +1. Credential extraction – parsing headers and tokens +2. Credential authentication – validating identity and assigning roles + +### Key Interfaces + +- [`Authenticator`](https://github.com/apache/polaris/blob/main/runtime/service/src/main/java/org/apache/polaris/service/auth/Authenticator.java): A core interface used to authenticate credentials and resolve principal and principal roles. Roles may be derived from OIDC claims or internal mappings. +- [`InternalPolarisToken`](https://github.com/apache/polaris/blob/main/runtime/service/src/main/java/org/apache/polaris/service/auth/InternalPolarisToken.java): Used in internal auth and inherits from `PrincipalCredential`. + +- The [`DefaultAuthenticator`](https://github.com/apache/polaris/blob/main/runtime/service/src/main/java/org/apache/polaris/service/auth/DefaultAuthenticator.java) is used to implement realm-specific logic based on these abstractions. + +### Token Broker Configuration + +When internal authentication is enabled, Polaris uses [`TokenBroker`](https://github.com/apache/polaris/blob/main/runtime/service/src/main/java/org/apache/polaris/service/auth/TokenBroker.java) to handle the decoding and validation of authentication tokens. These brokers are request-scoped and can be configured per realm. Each realm may use its own strategy, such as RSA key pairs or shared secrets, depending on security requirements. +See [Token Broker description]({{< relref "../external-idp#token-broker" >}}) for configuration details. + +## Developer Authentication Workflows + +### Internal Authentication + +1. [`InternalAuthenticationMechanism`](https://github.com/apache/polaris/blob/main/runtime/service/src/main/java/org/apache/polaris/service/auth/internal/InternalAuthenticationMechanism.java) parses the auth header. +2. Uses [`TokenBroker`](https://github.com/apache/polaris/blob/main/runtime/service/src/main/java/org/apache/polaris/service/auth/TokenBroker.java) to decode the token. +3. Builds [`InternalAuthenticationRequest`](https://github.com/apache/polaris/blob/main/runtime/service/src/main/java/org/apache/polaris/service/auth/internal/InternalAuthenticationRequest.java) and generates `SecurityIdentity` (Quarkus). +4. `Authenticator.authenticate()` validates the credential, resolves the principal and principal roles, then creates the `PolarisPrincipal`. + +### External Authentication + +1. `OidcAuthenticationMechanism` (Quarkus) processes the auth header. +2. [`OidcTenantResolvingAugmentor`](https://github.com/apache/polaris/blob/main/runtime/service/src/main/java/org/apache/polaris/service/auth/external/tenant/OidcTenantResolvingAugmentor.java) selects the OIDC tenant. +3. [`OidcPolarisCredentialAugmentor`](https://github.com/apache/polaris/blob/main/runtime/service/src/main/java/org/apache/polaris/service/auth/external/OidcPolarisCredentialAugmentor.java) extracts JWT claims. +4. `Authenticator.authenticate()` validates the claims, resolves the principal and principal roles, then creates the `PolarisPrincipal`. + +### Mixed Authentication + +1. [`InternalAuthenticationMechanism`](https://github.com/apache/polaris/blob/main/runtime/service/src/main/java/org/apache/polaris/service/auth/internal/InternalAuthenticationMechanism.java) tries decoding. +2. If successful, proceed with internal authentication. +3. Otherwise, fall back to external (OIDC) authentication. + +## OIDC Configuration Reference + +### Principal Mapping + +- Interface: [`PrincipalMapper`](https://github.com/apache/polaris/blob/main/runtime/service/src/main/java/org/apache/polaris/service/auth/external/mapping/PrincipalMapper.java) + + The `PrincipalMapper` is responsible for extracting the Polaris principal ID and display name from OIDC tokens. + +- Implementation selector: + + This property selects the implementation of the `PrincipalMapper` interface. The default implementation extracts fields from specific claim paths. + + ```properties + polaris.oidc.principal-mapper.type=default + ``` + +- Configuration properties for the default implementation: + + ```properties + polaris.oidc.principal-mapper.id-claim-path=polaris/principal_id + polaris.oidc.principal-mapper.name-claim-path=polaris/principal_name + ``` + +- It can be overridden per OIDC tenant. + +### Roles Mapping + +- Interface: [`PrincipalRolesMapper`](https://github.com/apache/polaris/blob/main/runtime/service/src/main/java/org/apache/polaris/service/auth/external/mapping/PrincipalRolesMapper.java) + + Polaris uses this component to transform role claims from OIDC tokens into Polaris roles. + +- Quarkus OIDC configuration: + + This setting instructs Quarkus on where to locate roles within the OIDC token. + + ```properties + quarkus.oidc.roles.role-claim-path=polaris/roles + ``` + +- Implementation selector: + + This property selects the implementation of `PrincipalRolesMapper`. The `default` implementation applies regular expression (regex) transformations to OIDC roles. + + ```properties + polaris.oidc.principal-roles-mapper.type=default + ``` + +- Configuration properties for the default implementation: + + ```properties + polaris.oidc.principal-roles-mapper.filter=^(?!profile$|email$).* + polaris.oidc.principal-roles-mapper.mappings[0].regex=^.*$ + polaris.oidc.principal-roles-mapper.mappings[0].replacement=PRINCIPAL_ROLE:$0 + ``` diff --git a/1.2.0/metastores.md b/1.2.0/metastores.md new file mode 100644 index 0000000000..c22bbdd907 --- /dev/null +++ b/1.2.0/metastores.md @@ -0,0 +1,188 @@ +--- +# +# Licensed to the Apache Software Foundation (ASF) under one +# or more contributor license agreements. See the NOTICE file +# distributed with this work for additional information +# regarding copyright ownership. The ASF licenses this file +# to you under the Apache License, Version 2.0 (the +# "License"); you may not use this file except in compliance +# with the License. You may obtain a copy of the License at +# +# http://www.apache.org/licenses/LICENSE-2.0 +# +# Unless required by applicable law or agreed to in writing, +# software distributed under the License is distributed on an +# "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY +# KIND, either express or implied. See the License for the +# specific language governing permissions and limitations +# under the License. +# +title: Metastores +type: docs +weight: 700 +--- + +This page explains how to configure and use Polaris metastores with either the recommended Relational JDBC or the +deprecated EclipseLink persistence backends. + +## Relational JDBC +This implementation leverages Quarkus for datasource management and supports configuration through +environment variables or JVM -D flags at startup. For more information, refer to the [Quarkus configuration reference](https://quarkus.io/guides/config-reference#env-file). + +We have 2 options for configuring the persistence backend: + +### 1. Relational JDBC metastore with username and password + +using environment variables: +``` +POLARIS_PERSISTENCE_TYPE=relational-jdbc + +QUARKUS_DATASOURCE_USERNAME= +QUARKUS_DATASOURCE_PASSWORD= +QUARKUS_DATASOURCE_JDBC_URL= +``` +using properties file: + +``` +polaris.persistence.type=relational-jdbc +quarkus.datasource.jdbc.username= +quarkus.datasource.jdbc.password= +quarkus.datasource.jdbc.jdbc-url= +``` + +### 2. AWS Aurora PostgreSQL metastore using IAM AWS authentication + +``` +polaris.persistence.type=relational-jdbc +quarkus.datasource.jdbc.url=jdbc:postgresql://polaris-cluster.cluster-xyz.us-east-1.rds.amazonaws.com:6160/polaris +quarkus.datasource.jdbc.additional-jdbc-properties.wrapperPlugins=iam +quarkus.datasource.username=dbusername +quarkus.datasource.db-kind=postgresql +quarkus.datasource.jdbc.additional-jdbc-properties.ssl=true +quarkus.datasource.jdbc.additional-jdbc-properties.sslmode=require +quarkus.datasource.credentials-provider=aws + +quarkus.rds.credentials-provider.aws.use-quarkus-client=true +quarkus.rds.credentials-provider.aws.username=dbusername +quarkus.rds.credentials-provider.aws.hostname=polaris-cluster.cluster-xyz.us-east-1.rds.amazonaws.com +quarkus.rds.credentials-provider.aws.port=6160 +``` +This is the basic configuration. For more details, please refer to the [Quarkus plugin documentation](https://docs.quarkiverse.io/quarkus-amazon-services/dev/amazon-rds.html#_configuration_reference) + +The Relational JDBC metastore currently relies on a Quarkus-managed datasource and supports only PostgresSQL and H2 databases. This limitation is similar to that of EclipseLink, primarily due to underlying schema differences. At this time, official documentation is provided exclusively for usage with PostgreSQL. +Please refer to the documentation here: +[Configure data sources in Quarkus](https://quarkus.io/guides/datasource) + +Additionally, the retries can be configured via `polaris.persistence.relational.jdbc.*` properties please ref [configuration]({{% ref "configuration" %}}) + +## EclipseLink (Deprecated) +{{< alert important >}} +EclipseLink persistence will be completely removed from Polaris in 1.3.0 or in 2.0.0 (whichever happens earlier). +{{< /alert >}} + +Polaris includes EclipseLink plugin by default with PostgresSQL driver. + +Configure the `polaris.persistence` section in your Polaris configuration file +(`application.properties`) as follows: + +``` +polaris.persistence.type=eclipse-link +polaris.persistence.eclipselink.configuration-file=/path/to/persistence.xml +polaris.persistence.eclipselink.persistence-unit=polaris +``` + +Alternatively, configuration can also be done with environment variables or system properties. Refer +to the [Quarkus Configuration Reference] for more information. + +The `configuration-file` option must point to an [EclipseLink configuration file]. This file, named +`persistence.xml`, is used to set up the database connection properties, which can differ depending +on the type of database and its configuration. + +{{< alert note >}} +You have to locate the `persistence.xml` at least two folders down to the root folder, e.g. `/deployments/config/persistence.xml` is OK, whereas `/deployments/persistence.xml` will cause an infinity loop. +{{< /alert >}} +[Quarkus Configuration Reference]: https://quarkus.io/guides/config-reference +[EclipseLink configuration file]: https://eclipse.dev/eclipselink/documentation/4.0/solutions/solutions.html#TESTINGJPA002 + +Polaris creates and connects to a separate database for each realm. Specifically, the `{realm}` placeholder in `jakarta.persistence.jdbc.url` is substituted with the actual realm name, allowing the Polaris server to connect to different databases based on the realm. + +{{< alert note >}} +Some database systems such as Postgres don't create databases automatically. Database admins need to create them manually before running Polaris server. +{{< /alert >}} + +A single `persistence.xml` can describe multiple [persistence units](https://eclipse.dev/eclipselink/documentation/4.0/concepts/concepts.html#APPDEV001). For example, with both a `polaris-dev` and `polaris` persistence unit defined, you could use a single `persistence.xml` to easily switch between development and production databases. Use the `persistence-unit` option in the Polaris server configuration to easily switch between persistence units. + +### Using H2 + +{{< alert important >}} +H2 is an in-memory database and is not suitable for production! +{{< /alert >}} + +The default [persistence.xml] in Polaris is already configured for H2, but you can easily customize +your H2 configuration using the persistence unit template below: + +[persistence.xml]: https://github.com/apache/polaris/blob/main/persistence/eclipselink/src/main/resources/META-INF/persistence.xml + +```xml + + org.eclipse.persistence.jpa.PersistenceProvider + org.apache.polaris.extension.persistence.impl.eclipselink.models.ModelEntity + org.apache.polaris.extension.persistence.impl.eclipselink.models.ModelEntityActive + org.apache.polaris.extension.persistence.impl.eclipselink.models.ModelEntityChangeTracking + org.apache.polaris.extension.persistence.impl.eclipselink.models.ModelEntityDropped + org.apache.polaris.extension.persistence.impl.eclipselink.models.ModelGrantRecord + org.apache.polaris.extension.persistence.impl.eclipselink.models.ModelPrincipalSecrets + org.apache.polaris.extension.persistence.impl.eclipselink.models.ModelSequenceId + NONE + + + + + + + +``` + +To build Polaris with the necessary H2 dependency and start the Polaris service, run the following: + +```shell +./gradlew \ + :polaris-server:assemble \ + :polaris-server:quarkusAppPartsBuild --rerun \ + -PeclipseLinkDeps=com.h2database:h2:2.3.232 +java -Dpolaris.persistence.type=eclipse-link \ + -Dpolaris.persistence.eclipselink.configuration-file=/path/to/persistence.xml \ + -Dpolaris.persistence.eclipselink.persistence-unit=polaris \ + -jar runtime/server/build/quarkus-app/quarkus-run.jar +``` + +### Using Postgres + +PostgreSQL is included by default in the Polaris server distribution. + +The following shows a sample configuration for integrating Polaris with Postgres. + +```xml + + org.eclipse.persistence.jpa.PersistenceProvider + org.apache.polaris.extension.persistence.impl.eclipselink.models.ModelEntity + org.apache.polaris.extension.persistence.impl.eclipselink.models.ModelEntityActive + org.apache.polaris.extension.persistence.impl.eclipselink.models.ModelEntityChangeTracking + org.apache.polaris.extension.persistence.impl.eclipselink.models.ModelEntityDropped + org.apache.polaris.extension.persistence.impl.eclipselink.models.ModelGrantRecord + org.apache.polaris.extension.persistence.impl.eclipselink.models.ModelPrincipalSecrets + org.apache.polaris.extension.persistence.impl.eclipselink.models.ModelSequenceId + NONE + + + + + + + + + + +``` diff --git a/1.2.0/polaris-api-specs/_index.md b/1.2.0/polaris-api-specs/_index.md new file mode 100644 index 0000000000..3f4a98498d --- /dev/null +++ b/1.2.0/polaris-api-specs/_index.md @@ -0,0 +1,27 @@ +--- +# +# Licensed to the Apache Software Foundation (ASF) under one +# or more contributor license agreements. See the NOTICE file +# distributed with this work for additional information +# regarding copyright ownership. The ASF licenses this file +# to you under the Apache License, Version 2.0 (the +# "License"); you may not use this file except in compliance +# with the License. You may obtain a copy of the License at +# +# http://www.apache.org/licenses/LICENSE-2.0 +# +# Unless required by applicable law or agreed to in writing, +# software distributed under the License is distributed on an +# "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY +# KIND, either express or implied. See the License for the +# specific language governing permissions and limitations +# under the License. +# +title: 'Polaris API Reference' +type: docs +weight: 1100 +--- + +The Apache Polaris API offers a comprehensive set of endpoints that enable you to manage principals, principal-roles, catalogs, and catalog-roles programmatically. + +It follows REST standards, using clear, resource-based URLs, standard HTTP methods, response codes, and secure authentication. With the Polaris API, you can create, manage, and query Iceberg catalogs efficiently. \ No newline at end of file diff --git a/1.2.0/polaris-api-specs/polaris-catalog-api.md b/1.2.0/polaris-api-specs/polaris-catalog-api.md new file mode 100644 index 0000000000..4774c16cae --- /dev/null +++ b/1.2.0/polaris-api-specs/polaris-catalog-api.md @@ -0,0 +1,27 @@ +--- +# +# Licensed to the Apache Software Foundation (ASF) under one +# or more contributor license agreements. See the NOTICE file +# distributed with this work for additional information +# regarding copyright ownership. The ASF licenses this file +# to you under the Apache License, Version 2.0 (the +# "License"); you may not use this file except in compliance +# with the License. You may obtain a copy of the License at +# +# http://www.apache.org/licenses/LICENSE-2.0 +# +# Unless required by applicable law or agreed to in writing, +# software distributed under the License is distributed on an +# "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY +# KIND, either express or implied. See the License for the +# specific language governing permissions and limitations +# under the License. +# +title: 'Apache Polaris Catalog Service OpenAPI Specification' +linkTitle: 'Catalog API ↗' +weight: 200 +params: + show_page_toc: false +--- + +{{< redoc-polaris "generated/bundled-polaris-catalog-service.yaml" >}} diff --git a/1.2.0/polaris-api-specs/polaris-management-api.md b/1.2.0/polaris-api-specs/polaris-management-api.md new file mode 100644 index 0000000000..eea43448be --- /dev/null +++ b/1.2.0/polaris-api-specs/polaris-management-api.md @@ -0,0 +1,27 @@ +--- +# +# Licensed to the Apache Software Foundation (ASF) under one +# or more contributor license agreements. See the NOTICE file +# distributed with this work for additional information +# regarding copyright ownership. The ASF licenses this file +# to you under the Apache License, Version 2.0 (the +# "License"); you may not use this file except in compliance +# with the License. You may obtain a copy of the License at +# +# http://www.apache.org/licenses/LICENSE-2.0 +# +# Unless required by applicable law or agreed to in writing, +# software distributed under the License is distributed on an +# "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY +# KIND, either express or implied. See the License for the +# specific language governing permissions and limitations +# under the License. +# +title: 'Apache Polaris Management Service OpenAPI Specification' +linkTitle: 'Management API ↗' +weight: 100 +params: + show_page_toc: false +--- + +{{< redoc-polaris "polaris-management-service.yml" >}} diff --git a/1.2.0/polaris-spark-client.md b/1.2.0/polaris-spark-client.md new file mode 100644 index 0000000000..c990e565a5 --- /dev/null +++ b/1.2.0/polaris-spark-client.md @@ -0,0 +1,129 @@ +--- +# +# Licensed to the Apache Software Foundation (ASF) under one +# or more contributor license agreements. See the NOTICE file +# distributed with this work for additional information +# regarding copyright ownership. The ASF licenses this file +# to you under the Apache License, Version 2.0 (the +# "License"); you may not use this file except in compliance +# with the License. You may obtain a copy of the License at +# +# http://www.apache.org/licenses/LICENSE-2.0 +# +# Unless required by applicable law or agreed to in writing, +# software distributed under the License is distributed on an +# "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY +# KIND, either express or implied. See the License for the +# specific language governing permissions and limitations +# under the License. +# +Title: Polaris Spark Client +type: docs +weight: 650 +--- + +Apache Polaris now provides Catalog support for Generic Tables (non-Iceberg tables), please check out +the [Polaris Catalog OpenAPI Spec]({{% ref "polaris-api-specs/polaris-catalog-api.md" %}}) for Generic Table API specs. + +Along with the Generic Table Catalog support, Polaris is also releasing a Spark client, which helps to +provide an end-to-end solution for Apache Spark to manage Delta tables using Polaris. + +Note the Polaris Spark client is able to handle both Iceberg and Delta tables, not just Delta. + +This page documents how to connect Spark with Polaris Service using the Polaris Spark client. + +## Quick Start with Local Polaris service +If you want to quickly try out the functionality with a local Polaris service, simply check out the Polaris repo +and follow the instructions in the Spark plugin getting-started +[README](https://github.com/apache/polaris/blob/main/plugins/spark/v3.5/getting-started/README.md). + +Check out the Polaris repo: +```shell +git clone https://github.com/apache/polaris.git ~/polaris +``` + +## Start Spark against a deployed Polaris service +Before starting, ensure that the deployed Polaris service supports Generic Tables, and that Spark 3.5(version 3.5.3 or later is installed). +Spark 3.5.6 is recommended, and you can follow the instructions below to get a Spark 3.5.6 distribution. +```shell +cd ~ +wget https://www.apache.org/dyn/closer.lua/spark/spark-3.5.6/spark-3.5.6-bin-hadoop3.tgz?action=download +mkdir spark-3.5 +tar xzvf spark-3.5.6-bin-hadoop3.tgz -C spark-3.5 --strip-components=1 +cd spark-3.5 +``` + +### Connecting with Spark using the Polaris Spark client +The following CLI command can be used to start the Spark with connection to the deployed Polaris service using +a released Polaris Spark client. + +```shell +bin/spark-shell \ +--packages ,org.apache.iceberg:iceberg-aws-bundle:1.10.0,io.delta:delta-spark_2.12:3.3.1 \ +--conf spark.sql.extensions=org.apache.iceberg.spark.extensions.IcebergSparkSessionExtensions,io.delta.sql.DeltaSparkSessionExtension \ +--conf spark.sql.catalog.spark_catalog=org.apache.spark.sql.delta.catalog.DeltaCatalog \ +--conf spark.sql.catalog..warehouse= \ +--conf spark.sql.catalog..header.X-Iceberg-Access-Delegation=vended-credentials \ +--conf spark.sql.catalog.=org.apache.polaris.spark.SparkCatalog \ +--conf spark.sql.catalog..uri= \ +--conf spark.sql.catalog..credential=':' \ +--conf spark.sql.catalog..scope='PRINCIPAL_ROLE:ALL' \ +--conf spark.sql.catalog..token-refresh-enabled=true +``` +Assume the released Polaris Spark client you want to use is `org.apache.polaris:polaris-spark-3.5_2.12:1.0.0`, +replace the `polaris-spark-client-package` field with the release. + +The `spark-catalog-name` is the catalog name you will use with Spark, and `polaris-catalog-name` is the catalog name used +by Polaris service, for simplicity, you can use the same name. + +Replace the `polaris-service-uri` with the uri of the deployed Polaris service. For example, with a locally deployed +Polaris service, the uri would be `http://localhost:8181/api/catalog`. + +For `client-id` and `client-secret` values, you can refer to [Using Polaris]({{% ref "getting-started/using-polaris" %}}) +for more details. + +You can also start the connection by programmatically initialize a SparkSession, following is an example with PySpark: +```python +from pyspark.sql import SparkSession + +spark = SparkSession.builder + .config("spark.jars.packages", ",org.apache.iceberg:iceberg-aws-bundle:1.10.0,io.delta:delta-spark_2.12:3.3.1") + .config("spark.sql.catalog.spark_catalog", "org.apache.spark.sql.delta.catalog.DeltaCatalog") + .config("spark.sql.extensions", "org.apache.iceberg.spark.extensions.IcebergSparkSessionExtensions,io.delta.sql.DeltaSparkSessionExtension") + .config("spark.sql.catalog.", "org.apache.polaris.spark.SparkCatalog") + .config("spark.sql.catalog..uri", ) + .config("spark.sql.catalog..token-refresh-enabled", "true") + .config("spark.sql.catalog..credential", ":") + .config("spark.sql.catalog..warehouse", ) + .config("spark.sql.catalog.polaris.scope", 'PRINCIPAL_ROLE:ALL') + .config("spark.sql.catalog.polaris.header.X-Iceberg-Access-Delegation", 'vended-credentials') + .getOrCreate() +``` +Similar as the CLI command, make sure the corresponding fields are replaced correctly. + +### Create tables with Spark +After Spark is started, you can use it to create and access Iceberg and Delta tables, for example: +```python +spark.sql("USE polaris") +spark.sql("CREATE NAMESPACE IF NOT EXISTS DELTA_NS") +spark.sql("CREATE NAMESPACE IF NOT EXISTS DELTA_NS.PUBLIC") +spark.sql("USE NAMESPACE DELTA_NS.PUBLIC") +spark.sql("""CREATE TABLE IF NOT EXISTS PEOPLE ( + id int, name string) +USING delta LOCATION 'file:///tmp/var/delta_tables/people'; +""") +``` + +## Connecting with Spark using local Polaris Spark client jar +If you would like to use a version of the Spark client that is currently not yet released, you can +build a Spark client jar locally from source. Please check out the Polaris repo and refer to the Spark plugin +[README](https://github.com/apache/polaris/blob/main/plugins/spark/README.md) for detailed instructions. + +## Limitations +The Polaris Spark client has the following functionality limitations: +1) Create table as select (CTAS) is not supported for Delta tables. As a result, the `saveAsTable` method of `Dataframe` + is also not supported, since it relies on the CTAS support. +2) Create a Delta table without explicit location is not supported. +3) Rename a Delta table is not supported. +4) ALTER TABLE ... SET LOCATION is not supported for DELTA table. +5) For other non-Iceberg tables like csv, it is not supported. diff --git a/1.2.0/policy.md b/1.2.0/policy.md new file mode 100644 index 0000000000..5ad26edd4c --- /dev/null +++ b/1.2.0/policy.md @@ -0,0 +1,199 @@ +--- +# +# Licensed to the Apache Software Foundation (ASF) under one +# or more contributor license agreements. See the NOTICE file +# distributed with this work for additional information +# regarding copyright ownership. The ASF licenses this file +# to you under the Apache License, Version 2.0 (the +# "License"); you may not use this file except in compliance +# with the License. You may obtain a copy of the License at +# +# http://www.apache.org/licenses/LICENSE-2.0 +# +# Unless required by applicable law or agreed to in writing, +# software distributed under the License is distributed on an +# "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY +# KIND, either express or implied. See the License for the +# specific language governing permissions and limitations +# under the License. +# +title: Policy +type: docs +weight: 425 +--- + +The Polaris Policy framework empowers organizations to centrally define, manage, and enforce fine-grained governance, lifecycle, and operational rules across all data resources in the catalog. + +With the policy API, you can: +- Create and manage policies +- Attach policies to specific resources (catalogs, namespaces, tables, or views) +- Check applicable policies for any given resource + +## What is a Policy? + +A policy in Apache Polaris is a structured entity that defines rules governing actions on specified resources under +predefined conditions. Each policy contains: + +- **Name**: A unique identifier within a namespace +- **Type**: Determines the semantics and expected format of the policy content +- **Description**: Explains the purpose of the policy +- **Content**: Contains the actual rules defining the policy behavior +- **Version**: An automatically tracked revision number +- **Inheritable**: Whether the policy can be inherited by child resources, decided by its type + +### Policy Types + +Polaris supports several predefined system policy types (prefixed with `system.`): + +| Policy Type | Purpose | JSON-Schema | Applies To | +|-------------|-------------------------------------------------------|-------------|------------| +| **`system.data-compaction`** | Defines rules for data file compaction operations | [`data-compaction/2025-02-03.json`](https://polaris.apache.org/schemas/policies/system/data-compaction/2025-02-03.json) | Iceberg **table**, **namespace**, **catalog** | +| **`system.metadata-compaction`** | Defines rules for metadata file compaction operations | [`metadata-compaction/2025-02-03.json`](https://polaris.apache.org/schemas/policies/system/metadata-compaction/2025-02-03.json) | Iceberg **table**, **namespace**, **catalog** | +| **`system.orphan-file-removal`** | Defines rules for removing orphaned files | [`orphan-file-removal/2025-02-03.json`](https://polaris.apache.org/schemas/policies/system/orphan-file-removal/2025-02-03.json) | Iceberg **table**, **namespace**, **catalog** | +| **`system.snapshot-expiry`** | Defines rules for snapshot expiration | [`snapshot-expiry/2025-02-03.json`](https://polaris.apache.org/schemas/policies/system/snapshot-expiry/2025-02-03.json) | Iceberg **table**, **namespace**, **catalog** | + +Support for additional predefined system policy types and custom policy type definitions is in progress. +For more details, please refer to the [roadmap](https://github.com/apache/polaris/discussions/1028). + +### Policy Inheritance + +The entity hierarchy in Polaris is structured as follows: + +``` + Catalog + | + Namespace + | + +-----------+----------+ + | | | +Iceberg Iceberg Generic + Table View Table +``` + +Policies can be attached at any level, and inheritance flows from catalog down to namespace, then to tables and views. + +Policies can be inheritable or non-inheritable: + +- **Inheritable policies**: Apply to the target resource and all its applicable child resources +- **Non-inheritable policies**: Apply only to the specific target resource + +The inheritance follows an override mechanism: +1. Table-level policies override namespace and catalog policies +2. Namespace-level policies override parent namespace and catalog policies + +{{< alert important >}} +Because an override completely replaces the same policy type at higher levels, +**only one instance of a given policy type can be attached to (and therefore affect) a resource**. +{{< /alert >}} + +## Working with Policies + +### Creating a Policy + +To create a policy, you need to provide a name, type, and optionally a description and content: + +```json +POST /polaris/v1/{prefix}/namespaces/{namespace}/policies +{ + "name": "compaction-policy", + "type": "system.data-compaction", + "description": "Policy for optimizing table storage", + "content": "{\"version\": \"2025-02-03\", \"enable\": true, \"config\": {\"target_file_size_bytes\": 134217728}}" +} +``` + +The policy content is validated against a schema specific to its type. Here are a few policy content examples: +- Data Compaction Policy +```json +{ + "version": "2025-02-03", + "enable": true, + "config": { + "target_file_size_bytes": 134217728, + "compaction_strategy": "bin-pack", + "max-concurrent-file-group-rewrites": 5 + } +} +``` +- Orphan File Removal Policy +```json +{ + "version": "2025-02-03", + "enable": true, + "max_orphan_file_age_in_days": 30, + "locations": ["s3://my-bucket/my-table-location"], + "config": { + "prefix_mismatch_mode": "ignore" + } +} +``` + +### Attaching Policies to Resources + +Policies can be attached to different resource levels: + +1. **Catalog level**: Applies to the entire catalog +2. **Namespace level**: Applies to a specific namespace +3. **Table-like level**: Applies to individual tables or views + +Example of attaching a policy to a table: + +```json +PUT /polaris/v1/{prefix}/namespaces/{namespace}/policies/{policy-name}/mappings +{ + "target": { + "type": "table-like", + "path": ["NS1", "NS2", "test_table_1"] + } +} +``` + +For inheritable policies, only one policy of a given type can be attached to a resource. For non-inheritable policies, +multiple policies of the same type can be attached. + +### Retrieving Applicable Policies +A user can view applicable policies on a resource (e.g., table, namespace, or catalog) as long as they have +read permission on that resource. + +Here is an example to find all policies that apply to a specific resource (including inherited policies): +``` +GET /polaris/v1/catalog/applicable-policies?namespace=finance%1Fquarterly&target-name=transactions +``` + +**Sample response:** +```json +{ + "policies": [ + { + "name": "snapshot-expiry-policy", + "type": "system.snapshot-expiry", + "appliedAt": "namespace", + "content": { + "version": "2025-02-03", + "enable": true, + "config": { + "min_snapshot_to_keep": 1, + "max_snapshot_age_days": 2, + "max_ref_age_days": 3 + } + } + }, + { + "name": "compaction-policy", + "type": "system.data-compaction", + "appliedAt": "catalog", + "content": { + "version": "2025-02-03", + "enable": true, + "config": { + "target_file_size_bytes": 134217728 + } + } + } + ] +} +``` + +### API Reference + +For the complete and up-to-date API specification, see the [policy-api.yaml](https://github.com/apache/polaris/blob/main/spec/polaris-catalog-apis/policy-apis.yaml). diff --git a/1.2.0/realm.md b/1.2.0/realm.md new file mode 100644 index 0000000000..4e0cc1ce25 --- /dev/null +++ b/1.2.0/realm.md @@ -0,0 +1,53 @@ +--- +# +# Licensed to the Apache Software Foundation (ASF) under one +# or more contributor license agreements. See the NOTICE file +# distributed with this work for additional information +# regarding copyright ownership. The ASF licenses this file +# to you under the Apache License, Version 2.0 (the +# "License"); you may not use this file except in compliance +# with the License. You may obtain a copy of the License at +# +# http://www.apache.org/licenses/LICENSE-2.0 +# +# Unless required by applicable law or agreed to in writing, +# software distributed under the License is distributed on an +# "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY +# KIND, either express or implied. See the License for the +# specific language governing permissions and limitations +# under the License. +# +Title: Realm +type: docs +weight: 350 +--- + +This page explains what a realm is and what it is used for in Polaris. + +### What is it? + +A realm in Polaris serves as logical partitioning mechanism within the catalog system. This isolation allows for multitenancy, enabling different teams, environments or organizations to operate independently within the same Polaris deployment. + +### Key Characteristics + +**Isolation:** Each realm encapsulates its own set of resources, ensuring that operations, policies in one realm do not affect others. + +**Authentication Context:** When configuring Polaris, principals credentials are associated with a specific realm. This allows for the separation of security concerns across different realms. + +**Configuration Scope:** Realm identifiers are used in various configurations, such as connection strings feature configurations, etc. + +An example of this is: + +`jdbc:postgresql://localhost:5432/{realm}` + +This ensures that each realm's data is stored separately. + +### How is it used in the system? + +**RealmContext:** It is a key concept used to identify and resolve the context in which operations are performed. For example `DefaultRealmContextResolver`, a realm is resolved from request headers, and operations are performed based on the resolved realm identifier. + +**Authentication and Authorization:** For example, in `DefaultAuthenticator`, `RealmContext` is used to provide context about the current security domain, which is used to retrieve the correct `PolarisMetastoreManager` that manages all Polaris entities and associated grant records metadata for +authorization. + +**Isolation:** In methods like `createEntityManagerFactory(@Nonnull RealmContext realmContext)` from `PolarisEclipseLinkPersistenceUnit` interface, the realm context influence how resources are created or managed based on the security policies of that realm. +An example of this is the way a realm name can be used to create a database connection url so that you have one database instance per realm, when applicable. Or it can be more granular and applied at primary key level (within the same database instance). diff --git a/1.2.0/telemetry.md b/1.2.0/telemetry.md new file mode 100644 index 0000000000..13b2823789 --- /dev/null +++ b/1.2.0/telemetry.md @@ -0,0 +1,196 @@ +--- +# +# Licensed to the Apache Software Foundation (ASF) under one +# or more contributor license agreements. See the NOTICE file +# distributed with this work for additional information +# regarding copyright ownership. The ASF licenses this file +# to you under the Apache License, Version 2.0 (the +# "License"); you may not use this file except in compliance +# with the License. You may obtain a copy of the License at +# +# http://www.apache.org/licenses/LICENSE-2.0 +# +# Unless required by applicable law or agreed to in writing, +# software distributed under the License is distributed on an +# "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY +# KIND, either express or implied. See the License for the +# specific language governing permissions and limitations +# under the License. +# +title: Telemetry +type: docs +weight: 450 +--- + +## Metrics + +Metrics are published using [Micrometer]; they are available from Polaris's management interface +(port 8282 by default) under the path `/q/metrics`. For example, if the server is running on +localhost, the metrics can be accessed via http://localhost:8282/q/metrics. + +[Micrometer]: https://quarkus.io/guides/telemetry-micrometer + +Metrics can be scraped by Prometheus or any compatible metrics scraping server. See: +[Prometheus](https://prometheus.io) for more information. + +Additional tags can be added to the metrics by setting the `polaris.metrics.tags.*` property. Each +tag is a key-value pair, where the key is the tag name and the value is the tag value. For example, +to add a tag `environment=prod` to all metrics, set `polaris.metrics.tags.environment=prod`. Many +tags can be added, such as below: + +```properties +polaris.metrics.tags.service=polaris +polaris.metrics.tags.environment=prod +polaris.metrics.tags.region=us-west-2 +``` + +Note that by default Polaris adds one tag: `application=Polaris`. You can override this tag by +setting the `polaris.metrics.tags.application=` property. + +### Realm ID Tag + +Polaris can add the realm ID as a tag to all API and HTTP request metrics. This is disabled by +default to prevent high cardinality issues, but can be enabled by setting the following properties: + +```properties +polaris.metrics.realm-id-tag.enable-in-api-metrics=true +polaris.metrics.realm-id-tag.enable-in-http-metrics=true +``` + +You should be particularly careful when enabling the realm ID tag in HTTP request metrics, as these +metrics typically have a much higher cardinality than API request metrics. + +In order to prevent the number of tags from growing indefinitely and causing performance issues or +crashing the server, the number of unique realm IDs in HTTP request metrics is limited to 100 by +default. If the number of unique realm IDs exceeds this value, a warning will be logged and no more +HTTP request metrics will be recorded. This threshold can be changed by setting the +`polaris.metrics.realm-id-tag.http-metrics-max-cardinality` property. + +## Traces + +Traces are published using [OpenTelemetry]. + +[OpenTelemetry]: https://quarkus.io/guides/opentelemetry-tracing + +By default OpenTelemetry is disabled in Polaris, because there is no reasonable default +for the collector endpoint for all cases. + +To enable OpenTelemetry and publish traces for Polaris set `quarkus.otel.sdk.disabled=false` +and configure a valid collector endpoint URL with `http://` or `https://` as the server property +`quarkus.otel.exporter.otlp.traces.endpoint`. + +_If these properties are not set, the server will not publish traces._ + +The collector must talk the OpenTelemetry protocol (OTLP) and the port must be its gRPC port +(by default 4317), e.g. "http://otlp-collector:4317". + +By default, Polaris adds a few attributes to the [OpenTelemetry Resource] to identify the server, +and notably: + +- `service.name`: set to `Apache Polaris Server (incubating)`; +- `service.version`: set to the Polaris version. + +[OpenTelemetry Resource]: https://opentelemetry.io/docs/languages/js/resources/ + +You can override the default resource attributes or add additional ones by setting the +`quarkus.otel.resource.attributes` property. + +This property expects a comma-separated list of key-value pairs, where the key is the attribute name +and the value is the attribute value. For example, to change the service name to `Polaris` and add +an attribute `deployment.environment=dev`, set the following property: + +```properties +quarkus.otel.resource.attributes=service.name=Polaris,deployment.environment=dev +``` + +The alternative syntax below can also be used: + +```properties +quarkus.otel.resource.attributes[0]=service.name=Polaris +quarkus.otel.resource.attributes[1]=deployment.environment=dev +``` + +Finally, two additional span attributes are added to all request parent spans: + +- `polaris.request.id`: The unique identifier of the request, if set by the caller through the + `Polaris-Request-Id` header. +- `polaris.realm`: The unique identifier of the realm. Always set (unless the request failed because + of a realm resolution error). + +### Troubleshooting Traces + +If the server is unable to publish traces, check first for a log warning message like the following: + +``` +SEVERE [io.ope.exp.int.grp.OkHttpGrpcExporter] (OkHttp http://localhost:4317/...) Failed to export spans. +The request could not be executed. Full error message: Failed to connect to localhost/0:0:0:0:0:0:0:1:4317 +``` + +This means that the server is unable to connect to the collector. Check that the collector is +running and that the URL is correct. + +## Logging + +Polaris relies on [Quarkus](https://quarkus.io/guides/logging) for logging. + +By default, logs are written to the console and to a file located in the `./logs` directory. The log +file is rotated daily and compressed. The maximum size of the log file is 10MB, and the maximum +number of backup files is 14. + +JSON logging can be enabled by setting the `quarkus.log.console.json.enabled` and `quarkus.log.file.json.enabled` +properties to `true`. By default, JSON logging is disabled. + +The log level can be set for the entire application or for specific packages. The default log level +is `INFO`. To set the log level for the entire application, use the `quarkus.log.level` property. + +To set the log level for a specific package, use the `quarkus.log.category."package-name".level`, +where `package-name` is the name of the package. For example, the package `io.smallrye.config` has a +useful logger to help debugging configuration issues; but it needs to be set to the `DEBUG` level. +This can be done by setting the following property: + +```properties +quarkus.log.category."io.smallrye.config".level=DEBUG +``` + +The log message format for both console and file output is highly configurable. The default format +is: + +``` +%d{yyyy-MM-dd HH:mm:ss,SSS} %-5p [%c{3.}] [%X{requestId},%X{realmId}] [%X{traceId},%X{parentId},%X{spanId},%X{sampled}] (%t) %s%e%n +``` + +Refer to the [Logging format](https://quarkus.io/guides/logging#logging-format) guide for more +information on placeholders and how to customize the log message format. + +### MDC Logging + +Polaris uses Mapped Diagnostic Context (MDC) to enrich log messages with additional context. The +following MDC keys are available: + +- `requestId`: The unique identifier of the request, if set by the caller through the + `Polaris-Request-Id` header. +- `realmId`: The unique identifier of the realm. Always set. +- `traceId`: The unique identifier of the trace. Present if tracing is enabled and the message is + originating from a traced context. +- `parentId`: The unique identifier of the parent span. Present if tracing is enabled and the + message is originating from a traced context. +- `spanId`: The unique identifier of the span. Present if tracing is enabled and the message is + originating from a traced context. +- `sampled`: Whether the trace has been sampled. Present if tracing is enabled and the message is + originating from a traced context. + +Other MDC keys can be added by setting the `polaris.log.mdc.*` property. Each property is a +key-value pair, where the key is the MDC key name and the value is the MDC key value. For example, +to add the MDC keys `environment=prod` and `region=us-west-2` to all log messages, set the following +properties: + +```properties +polaris.log.mdc.environment=prod +polaris.log.mdc.region=us-west-2 +``` + +MDC context is propagated across threads, including in `TaskExecutor` threads. + +## Links + +Visit [Using Polaris with telemetry tools]({{% relref "telemetry-tools" %}}) to see sample Polaris config with Prometheus and Jaeger.