diff --git a/0.9.0/_index.md b/0.9.0/_index.md deleted file mode 100644 index f3ce53c611..0000000000 --- a/0.9.0/_index.md +++ /dev/null @@ -1,38 +0,0 @@ ---- -# -# Licensed to the Apache Software Foundation (ASF) under one -# or more contributor license agreements. See the NOTICE file -# distributed with this work for additional information -# regarding copyright ownership. The ASF licenses this file -# to you under the Apache License, Version 2.0 (the -# "License"); you may not use this file except in compliance -# with the License. You may obtain a copy of the License at -# -# http://www.apache.org/licenses/LICENSE-2.0 -# -# Unless required by applicable law or agreed to in writing, -# software distributed under the License is distributed on an -# "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY -# KIND, either express or implied. See the License for the -# specific language governing permissions and limitations -# under the License. -# -linkTitle: '0.9.0' -title: '0.9.0' -type: docs -weight: 200 -params: - top_hidden: true - show_page_toc: false -cascade: - type: docs - params: - show_page_toc: true -# This file will NOT be copied into a new release's versioned docs folder. ---- - -Check out the [Quick Start]({{% ref "quickstart" %}}) page to get started. - - diff --git a/0.9.0/access-control.md b/0.9.0/access-control.md deleted file mode 100644 index 7c4c9cc8e8..0000000000 --- a/0.9.0/access-control.md +++ /dev/null @@ -1,193 +0,0 @@ ---- -# -# Licensed to the Apache Software Foundation (ASF) under one -# or more contributor license agreements. See the NOTICE file -# distributed with this work for additional information -# regarding copyright ownership. The ASF licenses this file -# to you under the Apache License, Version 2.0 (the -# "License"); you may not use this file except in compliance -# with the License. You may obtain a copy of the License at -# -# http://www.apache.org/licenses/LICENSE-2.0 -# -# Unless required by applicable law or agreed to in writing, -# software distributed under the License is distributed on an -# "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY -# KIND, either express or implied. See the License for the -# specific language governing permissions and limitations -# under the License. -# -Title: Access Control -type: docs -weight: 500 ---- - -This section provides information about how access control works for Apache Polaris (Incubating). - -Polaris uses a role-based access control (RBAC) model in which the Polaris administrator assigns access privileges to catalog roles -and then grants access to resources to service principals by assigning catalog roles to principal roles. - -These are the key concepts to understanding access control in Polaris: - -- **Securable object** -- **Principal role** -- **Catalog role** -- **Privilege** - -## Securable object - -A securable object is an object to which access can be granted. Polaris -has the following securable objects: - -- Catalog -- Namespace -- Iceberg table -- View - -## Principal role - -A principal role is a resource in Polaris that you can use to logically group Polaris service principals together and grant privileges on -securable objects. - -Polaris supports a many-to-one relationship between service principals and principal roles. For example, to grant the same privileges to -multiple service principals, you can grant a single principal role to those service principals. A service principal can be granted one -principal role. When registering a service connection, the Polaris administrator specifies the principal role that is granted to the -service principal. - -You don't grant privileges directly to a principal role. Instead, you configure object permissions at the catalog role level, and then grant -catalog roles to a principal role. - -The following table shows examples of principal roles that you might configure in Polaris: - -| Principal role name | Description | -| -----------------------| ----------- | -| Data_engineer | A role that is granted to multiple service principals for running data engineering jobs. | -| Data_scientist | A role that is granted to multiple service principals for running data science or AI jobs. | - -## Catalog role - -A catalog role belongs to a particular catalog resource in Polaris and specifies a set of permissions for actions on the catalog or objects -in the catalog, such as catalog namespaces or tables. You can create one or more catalog roles for a catalog. - -You grant privileges to a catalog role and then grant the catalog role to a principal role to bestow the privileges to one or more service -principals. - -> **Note** -> -> If you update the privileges bestowed to a service principal, the updates won't take effect for up to one hour. This means that if you -> revoke or grant some privileges for a catalog, the updated privileges won't take effect on any service principal with access to that catalog -> for up to one hour. - -Polaris also supports a many-to-many relationship between catalog roles and principal roles. You can grant the same catalog role to one or more -principal roles. Likewise, a principal role can be granted to one or more catalog roles. - -The following table displays examples of catalog roles that you might -configure in Polaris: - -| Example Catalog role | Description | -| -----------------------| ----------- | -| Catalog administrators | A role that has been granted multiple privileges to emulate full access to the catalog.

Principal roles that have been granted this role are permitted to create, alter, read, write, and drop tables in the catalog. | -| Catalog readers | A role that has been granted read-only privileges to tables in the catalog.

Principal roles that have been granted this role are allowed to read from tables in the catalog. | -| Catalog contributor | A role that has been granted read and write access privileges to all tables that belong to the catalog.

Principal roles that have been granted this role are allowed to perform read and write operations on tables in the catalog. | - -## RBAC model - -The following diagram illustrates the RBAC model used by Polaris. For each catalog, the Polaris administrator assigns access -privileges to catalog roles and then grants service principals access to resources by assigning catalog roles to principal roles. Polaris -supports a many-to-one relationship between service principals and principal roles. - -![Diagram that shows the RBAC model for Apache Polaris.](/img/rbac-model.svg "Apache Polaris RBAC model") - -## Access control privileges - -This section describes the privileges that are available in the Polaris access control model. Privileges are granted to catalog roles, catalog -roles are granted to principal roles, and principal roles are granted to service principals to specify the operations that service principals can -perform on objects in Polaris. - -> **Important** -> -> You can only grant privileges at the catalog level. Fine-grained access controls are not available. For example, you can grant read -> privileges to all tables in a catalog but not to an individual table in the catalog. - -To grant the full set of privileges (drop, list, read, write, etc.) on an object, you can use the *full privilege* option. - -### Table privileges - -| Privilege | Description | -| --------- | ----------- | -| TABLE_CREATE | Enables registering a table with the catalog. | -| TABLE_DROP | Enables dropping a table from the catalog. | -| TABLE_LIST | Enables listing any tables in the catalog. | -| TABLE_READ_PROPERTIES | Enables reading [properties](https://iceberg.apache.org/docs/nightly/configuration/#table-properties) of the table. | -| TABLE_WRITE_PROPERTIES | Enables configuring [properties](https://iceberg.apache.org/docs/nightly/configuration/#table-properties) for the table. | -| TABLE_READ_DATA | Enables reading data from the table by receiving short-lived read-only storage credentials from the catalog. | -| TABLE_WRITE_DATA | Enables writing data to the table by receiving short-lived read+write storage credentials from the catalog. | -| TABLE_FULL_METADATA | Grants all table privileges, except TABLE_READ_DATA and TABLE_WRITE_DATA, which need to be granted individually. | - -### View privileges - -| Privilege | Description | -| --------- | ----------- | -| VIEW_CREATE | Enables registering a view with the catalog. | -| VIEW_DROP | Enables dropping a view from the catalog. | -| VIEW_LIST | Enables listing any views in the catalog. | -| VIEW_READ_PROPERTIES | Enables reading all the view properties. | -| VIEW_WRITE_PROPERTIES | Enables configuring view properties. | -| VIEW_FULL_METADATA | Grants all view privileges. | - -### Namespace privileges - -| Privilege | Description | -| --------- | ----------- | -| NAMESPACE_CREATE | Enables creating a namespace in a catalog. | -| NAMESPACE_DROP | Enables dropping the namespace from the catalog. | -| NAMESPACE_LIST | Enables listing any object in the namespace, including nested namespaces and tables. | -| NAMESPACE_READ_PROPERTIES | Enables reading all the namespace properties. | -| NAMESPACE_WRITE_PROPERTIES | Enables configuring namespace properties. | -| NAMESPACE_FULL_METADATA | Grants all namespace privileges. | - -### Catalog privileges - -| Privilege | Description | -| -----------------------| ----------- | -| CATALOG_MANAGE_ACCESS | Includes the ability to grant or revoke privileges on objects in a catalog to catalog roles, and the ability to grant or revoke catalog roles to or from principal roles. | -| CATALOG_MANAGE_CONTENT | Enables full management of content for the catalog. This privilege encompasses the following privileges: | -| CATALOG_MANAGE_METADATA | Enables full management of the catalog, catalog roles, namespaces, and tables. | -| CATALOG_READ_PROPERTIES | Enables listing catalogs and reading properties of the catalog. | -| CATALOG_WRITE_PROPERTIES | Enables configuring catalog properties. | - -## RBAC example - -The following diagram illustrates how RBAC works in Polaris and -includes the following users: - -- **Alice:** A service admin who signs up for Polaris. Alice can - create service principals. She can also create catalogs and - namespaces and configure access control for Polaris resources. - -- **Bob:** A data engineer who uses Apache Spark™ to - interact with Polaris. - - - Alice has created a service principal for Bob. It has been - granted the Data_engineer principal role, which in turn has been - granted the following catalog roles: Catalog contributor and - Data administrator (for both the Silver and Gold zone catalogs - in the following diagram). - - - The Catalog contributor role grants permission to create - namespaces and tables in the Bronze zone catalog. - - - The Data administrator roles grant full administrative rights to - the Silver zone catalog and Gold zone catalog. - -- **Mark:** A data scientist who uses trains models with data managed - by Polaris. - - - Alice has created a service principal for Mark. It has been - granted the Data_scientist principal role, which in turn has - been granted the catalog role named Catalog reader. - - - The Catalog reader role grants read-only access for a catalog - named Gold zone catalog. - -![Diagram that shows an example of how RBAC works in Apache Polaris.](/img/rbac-example.svg "Apache Polaris RBAC example") diff --git a/0.9.0/command-line-interface.md b/0.9.0/command-line-interface.md deleted file mode 100644 index 4a26ed4b39..0000000000 --- a/0.9.0/command-line-interface.md +++ /dev/null @@ -1,1088 +0,0 @@ ---- -# -# Licensed to the Apache Software Foundation (ASF) under one -# or more contributor license agreements. See the NOTICE file -# distributed with this work for additional information -# regarding copyright ownership. The ASF licenses this file -# to you under the Apache License, Version 2.0 (the -# "License"); you may not use this file except in compliance -# with the License. You may obtain a copy of the License at -# -# http://www.apache.org/licenses/LICENSE-2.0 -# -# Unless required by applicable law or agreed to in writing, -# software distributed under the License is distributed on an -# "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY -# KIND, either express or implied. See the License for the -# specific language governing permissions and limitations -# under the License. -# -linkTitle: Command Line Interface -title: Apache Polaris (Incubating) CLI -type: docs -weight: 300 ---- - -In order to help administrators quickly set up and manage their Polaris server, Polaris provides a simple command-line interface (CLI) for common tasks. - -The basic syntax of the Polaris CLI is outlined below: - -``` -polaris [options] COMMAND ... - -options: ---host ---port ---client-id ---client-secret -``` - -`COMMAND` must be one of the following: -1. catalogs -2. principals -3. principal-roles -4. catalog-roles -5. namespaces -6. privileges - -Each _command_ supports several _subcommands_, and some _subcommands_ have _actions_ that come after the subcommand in turn. Finally, _arguments_ follow to form a full invocation. Within a set of named arguments at the end of an invocation ordering is generally not important. Many invocations also have a required positional argument of the type that the _command_ refers to. Again, the ordering of this positional argument relative to named arguments is not important. - -Some example full invocations: - -``` -polaris principals list -polaris catalogs delete some_catalog_name -polaris catalogs update --property foo=bar some_other_catalog -polaris catalogs update another_catalog --property k=v -polaris privileges namespace grant --namespace some.schema --catalog fourth_catalog --catalog-role some_catalog_role TABLE_READ_DATA -``` - -### Authentication - -As outlined above, the Polaris CLI may take credentials using the `--client-id` and `--client-secret` options. For example: - -``` -polaris --client-id 4b5ed1ca908c3cc2 --client-secret 07ea8e4edefb9a9e57c247e8d1a4f51c principals ... -``` - -If `--client-id` and `--client-secret` are not provided, the Polaris CLI will try to read the client ID and client secret from environment variables called `CLIENT_ID` and `CLIENT_SECRET` respectively. If these flags are not provided and the environment variables are not set, the CLI will fail. - -If the `--host` and `--port` options are not provided, the CLI will default to communicating with `localhost:8181`. - -### PATH - -These examples assume the Polaris CLI is on the PATH and so can be invoked just by the command `polaris`. You can add the CLI to your PATH environment variable with a command like the following: - -``` -export PATH="~/polaris:$PATH" -``` - -Alternatively, you can run the CLI by providing a path to it, such as with the following invocation: - -``` -~/polaris principals list -``` - -## Commands - -Each of the commands `catalogs`, `principals`, `principal-roles`, `catalog-roles`, and `privileges` is used to manage a different type of entity within Polaris. - -To find details on the options that can be provided to a particular command or subcommand ad-hoc, you may wish to use the `--help` flag. For example: - -``` -polaris catalogs --help -polaris principals create --help -``` - -### catalogs - -The `catalogs` command is used to create, discover, and otherwise manage catalogs within Polaris. - -`catalogs` supports the following subcommands: - -1. create -2. delete -3. get -4. list -5. update - -#### create - -The `create` subcommand is used to create a catalog. - -``` -input: polaris catalogs create --help -options: - create - Named arguments: - --type The type of catalog to create in [INTERNAL, EXTERNAL]. INTERNAL by default. - --storage-type (Required) The type of storage to use for the catalog - --default-base-location (Required) Default base location of the catalog - --allowed-location An allowed location for files tracked by the catalog. Multiple locations can be provided by specifying this option more than once. - --role-arn (Required for S3) A role ARN to use when connecting to S3 - --external-id (Only for S3) The external ID to use when connecting to S3 - --tenant-id (Required for Azure) A tenant ID to use when connecting to Azure Storage - --multi-tenant-app-name (Only for Azure) The app name to use when connecting to Azure Storage - --consent-url (Only for Azure) A consent URL granting permissions for the Azure Storage location - --service-account (Only for GCS) The service account to use when connecting to GCS - --remote-url (For external catalogs) The remote URL to use - --property A key/value pair such as: tag=value. Multiple can be provided by specifying this option more than once - Positional arguments: - catalog -``` - -##### Examples - -``` -polaris catalogs create \ - --storage-type s3 \ - --default-base-location s3://example-bucket/my_data \ - --role-arn ${ROLE_ARN} \ - my_catalog - -polaris catalogs create \ - --storage-type s3 \ - --default-base-location s3://example-bucket/my_other_data \ - --allowed-location s3://example-bucket/second_location \ - --allowed-location s3://other-bucket/third_location \ - --role-arn ${ROLE_ARN} \ - my_other_catalog -``` - -#### delete - -The `delete` subcommand is used to delete a catalog. - -``` -input: polaris catalogs delete --help -options: - delete - Positional arguments: - catalog -``` - -##### Examples - -``` -polaris catalogs delete some_catalog -``` - -#### get - -The `get` subcommand is used to retrieve details about a catalog. - -``` -input: polaris catalogs get --help -options: - get - Positional arguments: - catalog -``` - -##### Examples - -``` -polaris catalogs get some_catalog - -polaris catalogs get another_catalog -``` - -#### list - -The `list` subcommand is used to show details about all catalogs, or those that a certain principal role has access to. The principal used to perform this operation must have the `CATALOG_LIST` privilege. - -``` -input: polaris catalogs list --help -options: - list - Named arguments: - --principal-role The name of a principal role -``` - -##### Examples - -``` -polaris catalogs list - -polaris catalogs list --principal-role some_user -``` - -#### update - -The `update` subcommand is used to update a catalog. Currently, this command supports changing the properties of a catalog or updating its storage configuration. - -``` -input: polaris catalogs update --help -options: - update - Named arguments: - --default-base-location (Required) Default base location of the catalog - --allowed-location An allowed location for files tracked by the catalog. Multiple locations can be provided by specifying this option more than once. - --property A key/value pair such as: tag=value. Multiple can be provided by specifying this option more than once - Positional arguments: - catalog -``` - -##### Examples - -``` -polaris catalogs update --property tag=new_value my_catalog - -polaris catalogs update --default-base-location s3://new-bucket/my_data my_catalog -``` - -### Principals - -The `principals` command is used to manage principals within Polaris. - -`principals` supports the following subcommands: - -1. create -2. delete -3. get -4. list -5. rotate-credentials -6. update - -#### create - -The `create` subcommand is used to create a new principal. - -``` -input: polaris principals create --help -options: - create - Named arguments: - --type The type of principal to create in [SERVICE] - --property A key/value pair such as: tag=value. Multiple can be provided by specifying this option more than once - Positional arguments: - principal -``` - -##### Examples - -``` -polaris principals create some_user - -polaris principals create --client-id ${CLIENT_ID} --property admin=true some_admin_user -``` - -#### delete - -The `delete` subcommand is used to delete a principal. - -``` -input: polaris principals delete --help -options: - delete - Positional arguments: - principal -``` - -##### Examples - -``` -polaris principals delete some_user - -polaris principals delete some_admin_user -``` - -#### get - -The `get` subcommand retrieves details about a principal. - -``` -input: polaris principals get --help -options: - get - Positional arguments: - principal -``` - -##### Examples - -``` -polaris principals get some_user - -polaris principals get some_admin_user -``` - -#### list - -The `list` subcommand shows details about all principals. - -##### Examples - -``` -polaris principals list -``` - -#### rotate-credentials - -The `rotate-credentials` subcommand is used to update the credentials used by a principal. After this command runs successfully, the new credentials will be printed to stdout. - -``` -input: polaris principals rotate-credentials --help -options: - rotate-credentials - Positional arguments: - principal -``` - -##### Examples - -``` -polaris principals rotate-credentials some_user - -polaris principals rotate-credentials some_admin_user -``` - -#### update - -The `update` subcommand is used to update a principal. Currently, this supports rewriting the properties associated with a principal. - -``` -input: polaris principals update --help -options: - update - Named arguments: - --property A key/value pair such as: tag=value. Multiple can be provided by specifying this option more than once - Positional arguments: - principal -``` - -##### Examples - -``` -polaris principals update --property key=value --property other_key=other_value some_user - -polaris principals update --property are_other_keys_removed=yes some_user -``` - -### Principal Roles - -The `principal-roles` command is used to create, discover, and manage principal roles within Polaris. Additionally, this command can identify principals or catalog roles associated with a principal role, and can be used to grant a principal role to a principal. - -`principal-roles` supports the following subcommands: - -1. create -2. delete -3. get -4. list -5. update -6. grant -7. revoke - -#### create - -The `create` subcommand is used to create a new principal role. - -``` -input: polaris principal-roles create --help -options: - create - Named arguments: - --property A key/value pair such as: tag=value. Multiple can be provided by specifying this option more than once - Positional arguments: - principal_role -``` - -##### Examples - -``` -polaris principal-roles create data_engineer - -polaris principal-roles create --property key=value data_analyst -``` - -#### delete - -The `delete` subcommand is used to delete a principal role. - -``` -input: polaris principal-roles delete --help -options: - delete - Positional arguments: - principal_role -``` - -##### Examples - -``` -polaris principal-roles delete data_engineer - -polaris principal-roles delete data_analyst -``` - -#### get - -The `get` subcommand retrieves details about a principal role. - -``` -input: polaris principal-roles get --help -options: - get - Positional arguments: - principal_role -``` - -##### Examples - -``` -polaris principal-roles get data_engineer - -polaris principal-roles get data_analyst -``` - -#### list - -The list subcommand is used to print out all principal roles or, alternatively, to list all principal roles associated with a given principal or with a given catalog role. - -``` -input: polaris principal-roles list --help -options: - list - Named arguments: - --catalog-role The name of a catalog role. If provided, show only principal roles assigned to this catalog role. - --principal The name of a principal. If provided, show only principal roles assigned to this principal. -``` - -##### Examples - -``` -polaris principal-roles list - -polaris principal-roles --principal d.knuth - -polaris principal-roles --catalog-role super_secret_data -``` - -#### update - -The `update` subcommand is used to update a principal role. Currently, this supports updating the properties tied to a principal role. - -``` -input: polaris principal-roles update --help -options: - update - Named arguments: - --property A key/value pair such as: tag=value. Multiple can be provided by specifying this option more than once - Positional arguments: - principal_role -``` - -##### Examples - -``` -polaris principal-roles update --property key=value2 data_engineer - -polaris principal-roles update data_analyst --property key=value3 -``` - -#### grant - -The `grant` subcommand is used to grant a principal role to a principal. - -``` -input: polaris principal-roles grant --help -options: - grant - Named arguments: - --principal A principal to grant this principal role to - Positional arguments: - principal_role -``` - -##### Examples - -``` -polaris principal-roles grant --principal d.knuth data_engineer - -polaris principal-roles grant data_scientist --principal a.ng -``` - -#### revoke - -The `revoke` subcommand is used to revoke a principal role from a principal. - -``` -input: polaris principal-roles revoke --help -options: - revoke - Named arguments: - --principal A principal to revoke this principal role from - Positional arguments: - principal_role -``` - -##### Examples - -``` -polaris principal-roles revoke --principal former.employee data_engineer - -polaris principal-roles revoke data_scientist --principal changed.role -``` - -### Catalog Roles - -The catalog-roles command is used to create, discover, and manage catalog roles within Polaris. Additionally, this command can be used to grant a catalog role to a principal role. - -`catalog-roles` supports the following subcommands: - -1. create -2. delete -3. get -4. list -5. update -6. grant -7. revoke - -#### create - -The `create` subcommand is used to create a new catalog role. - -``` -input: polaris catalog-roles create --help -options: - create - Named arguments: - --catalog The name of an existing catalog - --property A key/value pair such as: tag=value. Multiple can be provided by specifying this option more than once - Positional arguments: - catalog_role -``` - -##### Examples - -``` -polaris catalog-roles create --property key=value --catalog some_catalog sales_data - -polaris catalog-roles create --catalog other_catalog sales_data -``` - -#### delete - -The `delete` subcommand is used to delete a catalog role. - -``` -input: polaris catalog-roles delete --help -options: - delete - Named arguments: - --catalog The name of an existing catalog - Positional arguments: - catalog_role -``` - -##### Examples - -``` -polaris catalog-roles delete --catalog some_catalog sales_data - -polaris catalog-roles delete --catalog other_catalog sales_data -``` - -#### get - -The `get` subcommand retrieves details about a catalog role. - -``` -input: polaris catalog-roles get --help -options: - get - Named arguments: - --catalog The name of an existing catalog - Positional arguments: - catalog_role -``` - -##### Examples - -``` -polaris catalog-roles get --catalog some_catalog inventory_data - -polaris catalog-roles get --catalog other_catalog inventory_data -``` - -#### list - -The `list` subcommand is used to print all catalog roles. Alternatively, if a principal role is provided, only catalog roles associated with that principal are shown. - -``` -input: polaris catalog-roles list --help -options: - list - Named arguments: - --principal-role The name of a principal role - Positional arguments: - catalog -``` - -##### Examples - -``` -polaris catalog-roles list - -polaris catalog-roles list --principal-role data_engineer -``` - -#### update - -The `update` subcommand is used to update a catalog role. Currently, only updating properties associated with the catalog role is supported. - -``` -input: polaris catalog-roles update --help -options: - update - Named arguments: - --catalog The name of an existing catalog - --property A key/value pair such as: tag=value. Multiple can be provided by specifying this option more than once - Positional arguments: - catalog_role -``` - -##### Examples - -``` -polaris catalog-roles update --property contains_pii=true --catalog some_catalog sales_data - -polaris catalog-roles update sales_data --catalog some_catalog --property key=value -``` - -#### grant - -The `grant` subcommand is used to grant a catalog role to a principal role. - -``` -input: polaris catalog-roles grant --help -options: - grant - Named arguments: - --catalog The name of an existing catalog - --principal-role The name of a catalog role - Positional arguments: - catalog_role -``` - -##### Examples - -``` -polaris catalog-roles grant sensitive_data --catalog some_catalog --principal-role power_user - -polaris catalog-roles grant --catalog sales_data contains_cc_info_catalog_role --principal-role financial_analyst_role -``` - -#### revoke - -The `revoke` subcommand is used to revoke a catalog role from a principal role. - -``` -input: polaris catalog-roles revoke --help -options: - revoke - Named arguments: - --catalog The name of an existing catalog - --principal-role The name of a catalog role - Positional arguments: - catalog_role -``` - -##### Examples - -``` -polaris catalog-roles revoke sensitive_data --catalog some_catalog --principal-role power_user - -polaris catalog-roles revoke --catalog sales_data contains_cc_info_catalog_role --principal-role financial_analyst_role -``` - -### Namespaces - -The `namespaces` command is used to manage namespaces within Polaris. - -`namespaces` supports the following subcommands: - -1. create -2. delete -3. get -4. list - -#### create - -The `create` subcommand is used to create a new namespace. - -When creating a namespace with an explicit location, that location must reside within the parent catalog or namespace. - -``` -input: polaris namespaces create --help -options: - create - Named arguments: - --catalog The name of an existing catalog - --location If specified, the location at which to store the namespace and entities inside it - --property A key/value pair such as: tag=value. Multiple can be provided by specifying this option more than once - Positional arguments: - namespace -``` - -##### Examples - -``` -polaris namespaces create --catalog my_catalog outer - -polaris namespaces create --catalog my_catalog --location 's3://bucket/outer/inner_SUFFIX' outer.inner -``` - -#### delete - -The `delete` subcommand is used to delete a namespace. - -``` -input: polaris namespaces delete --help -options: - delete - Named arguments: - --catalog The name of an existing catalog - Positional arguments: - namespace -``` - -##### Examples - -``` -polaris namespaces delete outer_namespace.inner_namespace --catalog my_catalog - -polaris namespaces delete --catalog my_catalog outer_namespace -``` - -#### get - -The `get` subcommand retrieves details about a namespace. - -``` -input: polaris namespaces get --help -options: - get - Named arguments: - --catalog The name of an existing catalog - Positional arguments: - namespace -``` - -##### Examples - -``` -polaris namespaces get --catalog some_catalog a.b - -polaris namespaces get a.b.c --catalog some_catalog -``` - -#### list - -The `list` subcommand shows details about all namespaces directly within a catalog or, optionally, within some parent prefix in that catalog. - -``` -input: polaris namespaces list --help -options: - list - Named arguments: - --catalog The name of an existing catalog - --parent If specified, list namespaces inside this parent namespace -``` - -##### Examples - -``` -polaris namespaces list --catalog my_catalog - -polaris namespaces list --catalog my_catalog --parent a - -polaris namespaces list --catalog my_catalog --parent a.b -``` - -### Privileges - -The `privileges` command is used to grant various privileges to a catalog role, or to revoke those privileges. Privileges can be on the level of a catalog, a namespace, a table, or a view. For more information on privileges, please refer to the [docs]({{% ref "entities#privilege" %}}). - -Note that when using the `privileges` command, the user specifies the relevant catalog and catalog role before selecting a subcommand. - -`privileges` supports the following subcommands: - -1. list -2. catalog -3. namespace -4. table -5. view - -Each of these subcommands, except `list`, supports the `grant` and `revoke` actions and requires an action to be specified. - -Note that each subcommand's `revoke` action always accepts the same options that the corresponding `grant` action does, but with the addition of the `cascade` option. `cascade` is used to revoke all other privileges that depend on the specified privilege. - -#### list - -The `list` subcommand shows details about all privileges for a catalog role. - -``` -input: polaris privileges list --help -options: - list - Named arguments: - --catalog The name of an existing catalog - --catalog-role The name of a catalog role -``` - -##### Examples - -``` -polaris privileges list --catalog my_catalog --catalog-role my_role - -polaris privileges my_role list --catalog-role my_other_role --catalog my_catalog -``` - -#### catalog - -The `catalog` subcommand manages privileges at the catalog level. `grant` is used to grant catalog privileges to the specified catalog role, and `revoke` is used to revoke them. - -``` -input: polaris privileges catalog --help -options: - catalog - grant - Named arguments: - --catalog The name of an existing catalog - --catalog-role The name of a catalog role - Positional arguments: - privilege - revoke - Named arguments: - --cascade When revoking privileges, additionally revoke privileges that depend on the specified privilege - --catalog The name of an existing catalog - --catalog-role The name of a catalog role - Positional arguments: - privilege -``` - -##### Examples - -``` -polaris privileges \ - catalog \ - grant \ - --catalog my_catalog \ - --catalog-role catalog_role \ - TABLE_CREATE - -polaris privileges \ - catalog \ - revoke \ - --catalog my_catalog \ - --catalog-role catalog_role \ - --cascade \ - TABLE_CREATE -``` - -#### namespace - -The `namespace` subcommand manages privileges at the namespace level. - -``` -input: polaris privileges namespace --help -options: - namespace - grant - Named arguments: - --namespace A period-delimited namespace - --catalog The name of an existing catalog - --catalog-role The name of a catalog role - Positional arguments: - privilege - revoke - Named arguments: - --namespace A period-delimited namespace - --cascade When revoking privileges, additionally revoke privileges that depend on the specified privilege - --catalog The name of an existing catalog - --catalog-role The name of a catalog role - Positional arguments: - privilege -``` - -##### Examples - -``` -polaris privileges \ - namespace \ - grant \ - --catalog my_catalog \ - --catalog-role catalog_role \ - --namespace a.b \ - TABLE_LIST - -polaris privileges \ - namespace \ - revoke \ - --catalog my_catalog \ - --catalog-role catalog_role \ - --namespace a.b \ - TABLE_LIST -``` - -#### table - -The `table` subcommand manages privileges at the table level. - -``` -input: polaris privileges table --help -options: - table - grant - Named arguments: - --namespace A period-delimited namespace - --table The name of a table - --catalog The name of an existing catalog - --catalog-role The name of a catalog role - Positional arguments: - privilege - revoke - Named arguments: - --namespace A period-delimited namespace - --table The name of a table - --cascade When revoking privileges, additionally revoke privileges that depend on the specified privilege - --catalog The name of an existing catalog - --catalog-role The name of a catalog role - Positional arguments: - privilege -``` - -##### Examples - -``` -polaris privileges \ - table \ - grant \ - --catalog my_catalog \ - --catalog-role catalog_role \ - --namespace a.b \ - --table t \ - TABLE_DROP - -polaris privileges \ - table \ - grant \ - --catalog my_catalog \ - --catalog-role catalog_role \ - --namespace a.b \ - --table t \ - --cascade \ - TABLE_DROP -``` - -#### view - -The `view` subcommand manages privileges at the view level. - -``` -input: polaris privileges view --help -options: - view - grant - Named arguments: - --namespace A period-delimited namespace - --view The name of a view - --catalog The name of an existing catalog - --catalog-role The name of a catalog role - Positional arguments: - privilege - revoke - Named arguments: - --namespace A period-delimited namespace - --view The name of a view - --cascade When revoking privileges, additionally revoke privileges that depend on the specified privilege - --catalog The name of an existing catalog - --catalog-role The name of a catalog role - Positional arguments: - privilege -``` - -##### Examples - -``` -polaris privileges \ - view \ - grant \ - --catalog my_catalog \ - --catalog-role catalog_role \ - --namespace a.b.c \ - --view v \ - VIEW_FULL_METADATA - -polaris privileges \ - view \ - grant \ - --catalog my_catalog \ - --catalog-role catalog_role \ - --namespace a.b.c \ - --view v \ - --cascade \ - VIEW_FULL_METADATA -``` - -## Examples - -This section outlines example code for a few common operations as well as for some more complex ones. - -For especially complex operations, you may wish to instead directly use the Python API. - -### Creating a principal and a catalog - -``` -polaris principals create my_user - -polaris catalogs create \ - --type internal \ - --storage-type s3 \ - --default-base-location s3://iceberg-bucket/polaris-base \ - --role-arn arn:aws:iam::111122223333:role/ExampleCorpRole \ - --allowed-location s3://iceberg-bucket/polaris-alt-location-1 \ - --allowed-location s3://iceberg-bucket/polaris-alt-location-2 \ - my_catalog -``` - -### Granting a principal the ability to manage the content of a catalog - -``` -polaris principal-roles create power_user -polaris principal-roles grant --principal my_user power_user - -polaris catalog-roles create --catalog my_catalog my_catalog_role -polaris catalog-roles grant \ - --catalog my_catalog \ - --principal-role power_user \ - my_catalog_role - -polaris privileges \ - catalog \ - --catalog my_catalog \ - --catalog-role my_catalog_role \ - grant \ - CATALOG_MANAGE_CONTENT -``` - -### Identifying the tables a given principal has been granted explicit access to read - -_Note that some other privileges, such as `CATALOG_MANAGE_CONTENT`, subsume `TABLE_READ_DATA` and would not be discovered here._ - -``` -principal_roles=$(polaris principal-roles list --principal my_principal) -for principal_role in ${principal_roles}; do - catalog_roles=$(polaris catalog-roles --list --principal-role "${principal_role}") - for catalog_role in ${catalog_roles}; do - grants=$(polaris privileges list --catalog-role "${catalog_role}" --catalog "${catalog}") - for grant in $(echo "${grants}" | jq -c '.[] | select(.privilege == "TABLE_READ_DATA")'); do - echo "${grant}" - fi - done - done -done -``` - - diff --git a/0.9.0/configuring-polaris-for-production.md b/0.9.0/configuring-polaris-for-production.md deleted file mode 100644 index 152d12fd44..0000000000 --- a/0.9.0/configuring-polaris-for-production.md +++ /dev/null @@ -1,131 +0,0 @@ ---- -# -# Licensed to the Apache Software Foundation (ASF) under one -# or more contributor license agreements. See the NOTICE file -# distributed with this work for additional information -# regarding copyright ownership. The ASF licenses this file -# to you under the Apache License, Version 2.0 (the -# "License"); you may not use this file except in compliance -# with the License. You may obtain a copy of the License at -# -# http://www.apache.org/licenses/LICENSE-2.0 -# -# Unless required by applicable law or agreed to in writing, -# software distributed under the License is distributed on an -# "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY -# KIND, either express or implied. See the License for the -# specific language governing permissions and limitations -# under the License. -# -title: Configuring Apache Polaris (Incubating) for Production -linkTitle: Deploying In Production -type: docs -weight: 600 ---- - -The default `polaris-server.yml` configuration is intended for development and testing. When deploying Polaris in production, there are several best practices to keep in mind. - -## Security - -### Configurations - -Notable configuration used to secure a Polaris deployment are outlined below. - -#### oauth2 - -> [!WARNING] -> Ensure that the `tokenBroker` setting reflects the token broker specified in `authenticator` below. - -* Configure [OAuth](https://oauth.net/2/) with this setting. Remove the `TestInlineBearerTokenPolarisAuthenticator` option and uncomment the `DefaultPolarisAuthenticator` authenticator option beneath it. -* Then, configure the token broker. You can configure the token broker to use either [asymmetric](https://github.com/apache/polaris/blob/b482617bf8cc508b37dbedf3ebc81a9408160a5e/polaris-service/src/main/java/io/polaris/service/auth/JWTRSAKeyPair.java#L24) or [symmetric](https://github.com/apache/polaris/blob/b482617bf8cc508b37dbedf3ebc81a9408160a5e/polaris-service/src/main/java/io/polaris/service/auth/JWTSymmetricKeyBroker.java#L23) keys. - -#### authenticator.tokenBroker - -> [!WARNING] -> Ensure that the `tokenBroker` setting reflects the token broker specified in `oauth2` above. - -#### callContextResolver & realmContextResolver -* Use these configurations to specify a service that can resolve a realm from bearer tokens. -* The service(s) used here must implement the relevant interfaces (i.e. [CallContextResolver](https://github.com/apache/polaris/blob/8290019c10290a600e40b35ddb1e2f54bf99e120/polaris-service/src/main/java/io/polaris/service/context/CallContextResolver.java#L27) and [RealmContextResolver](https://github.com/apache/polaris/blob/7ce86f10a68a3b56aed766235c88d6027c0de038/polaris-service/src/main/java/io/polaris/service/context/RealmContextResolver.java)). - -## Metastore Management - -> [!IMPORTANT] -> The default `in-memory` implementation for `metastoreManager` is meant for testing and not suitable for production usage. Instead, consider an implementation such as `eclipse-link` which allows you to store metadata in a remote database. - -A Metastore Manger should be configured with an implementation that durably persists Polaris entities. Use the configuration `metaStoreManager` to configure a [MetastoreManager](https://github.com/apache/polaris/blob/627dc602eb15a3258dcc32babf8def34cf6de0e9/polaris-core/src/main/java/io/polaris/core/persistence/PolarisMetaStoreManager.java#L47) implementation where Polaris entities will be persisted. - -Be sure to secure your metastore backend since it will be storing credentials and catalog metadata. - -### Configuring EclipseLink - -To use EclipseLink for metastore management, specify the configuration `metaStoreManager.conf-file` to point to an EclipseLink `persistence.xml` file. This file, local to the Polaris service, contains details of the database used for metastore management and the connection settings. For more information, refer to the [metastore documentation]({{% ref "metastores" %}}). - -> [!IMPORTANT] -> EclipseLink requires -> 1. Building the JAR for the EclipseLink extension -> 2. Setting the `eclipseLink` gradle property to `true`. -> -> This can be achieved by setting `eclipseLink=true` in the `gradle.properties` file, or by passing the property explicitly while building all JARs, e.g.: `./gradlew -PeclipseLink=true clean assemble` - -### Bootstrapping - -Before using Polaris when using a metastore manager other than `in-memory`, you must **bootstrap** the metastore manager. This is a manual operation that must be performed **only once** in order to prepare the metastore manager to integrate with Polaris. When the metastore manager is bootstrapped, any existing Polaris entities in the metastore manager may be **purged**. - -By default, Polaris will create randomised `CLIENT_ID` and `CLIENT_SECRET` for the `root` principal and store their hashes in the metastore backend. In order to provide your own credentials for `root` principal (so you can request tokens via `api/catalog/v1/oauth/tokens`), set the following envrionment variables for realm name `my_realm`: - -``` -export POLARIS_BOOTSTRAP_MY_REALM_ROOT_CLIENT_ID=my-client-id -export POLARIS_BOOTSTRAP_MY_REALM_ROOT_CLIENT_SECRET=my-client-secret -``` - -**IMPORTANT**: In case you use `default-realm` for metastore backend database, you won't be able to use `export` command. Use this instead: - -```bash -env POLARIS_BOOTSTRAP_DEFAULT-REALM_ROOT_CLIENT_ID=my-client-id POLARIS_BOOTSTRAP_DEFAULT-REALM_ROOT_CLIENT_SECRET=my-client-secret -``` - -Now, to bootstrap Polaris, run: - -```bash -java -jar /path/to/jar/polaris-service-all.jar bootstrap polaris-server.yml -``` - -or in a container: - -```bash -bin/polaris-service bootstrap config/polaris-server.yml -``` - -Afterward, Polaris can be launched normally: - -```bash -java -jar /path/to/jar/polaris-service-all.jar server polaris-server.yml -``` - -You can verify the setup by attempting a token issue for the `root` principal: - -```bash -curl -X POST http://localhost:8181/api/catalog/v1/oauth/tokens -d "grant_type=client_credentials&client_id=my-client-id&client_secret=my-client-secret&scope=PRINCIPAL_ROLE:ALL" -``` - -which should return: - -```json -{"access_token":"...","token_type":"bearer","issued_token_type":"urn:ietf:params:oauth:token-type:access_token","expires_in":3600} -``` - -Note that if you used non-default realm name, for example, `iceberg` instead of `default-realm` in your `polaris-server.yml`, then you should add an appropriate request header: -```bash -curl -X POST -H 'realm: iceberg' http://localhost:8181/api/catalog/v1/oauth/tokens -d "grant_type=client_credentials&client_id=my-client-id&client_secret=my-client-secret&scope=PRINCIPAL_ROLE:ALL" -``` - -## Other Configurations - -When deploying Polaris in production, consider adjusting the following configurations: - -#### featureConfiguration.SUPPORTED_CATALOG_STORAGE_TYPES - - By default Polaris catalogs are allowed to be located in local filesystem with the `FILE` storage type. This should be disabled for production systems. - - Use this configuration to additionally disable any other storage types that will not be in use. - - diff --git a/0.9.0/entities.md b/0.9.0/entities.md deleted file mode 100644 index 0e02c6a8c4..0000000000 --- a/0.9.0/entities.md +++ /dev/null @@ -1,89 +0,0 @@ ---- -# -# Licensed to the Apache Software Foundation (ASF) under one -# or more contributor license agreements. See the NOTICE file -# distributed with this work for additional information -# regarding copyright ownership. The ASF licenses this file -# to you under the Apache License, Version 2.0 (the -# "License"); you may not use this file except in compliance -# with the License. You may obtain a copy of the License at -# -# http://www.apache.org/licenses/LICENSE-2.0 -# -# Unless required by applicable law or agreed to in writing, -# software distributed under the License is distributed on an -# "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY -# KIND, either express or implied. See the License for the -# specific language governing permissions and limitations -# under the License. -# -Title: Entities -type: docs -weight: 400 ---- - -This page documents various entities that can be managed in Apache Polaris (Incubating). - -## Catalog - -A catalog is a top-level entity in Polaris that may contain other entities like [namespaces](#namespace) and [tables](#table). These map directly to [Apache Iceberg catalogs](https://iceberg.apache.org/concepts/catalog/). - -For information on managing catalogs with the REST API or for more information on what data can be associated with a catalog, see [the API docs]({{% github-polaris "regtests/client/python/docs/CreateCatalogRequest.md" %}}). - -### Storage Type - -All catalogs in Polaris are associated with a _storage type_. Valid Storage Types are `S3`, `Azure`, and `GCS`. The `FILE` type is also additionally available for testing. Each of these types relates to a different storage provider where data within the catalog may reside. Depending on the storage type, various other configurations may be set for a catalog including credentials to be used when accessing data inside the catalog. - -For details on how to use Storage Types in the REST API, see [the API docs]({{% github-polaris "regtests/client/python/docs/StorageConfigInfo.md" %}}). - -## Namespace - -A namespace is a logical entity that resides within a [catalog](#catalog) and can contain other entities such as [tables](#table) or [views](#view). Some other systems may refer to namespaces as _schemas_ or _databases_. - -In Polaris, namespaces can be nested. For example, `a.b.c.d.e.f.g` is a valid namespace. `b` is said to reside within `a`, and so on. - -For information on managing namespaces with the REST API or for more information on what data can be associated with a namespace, see [the API docs]({{% github-polaris "regtests/client/python/docs/CreateNamespaceRequest.md" %}}). - - -## Table - -Polaris tables are entities that map to [Apache Iceberg tables](https://iceberg.apache.org/docs/nightly/configuration/). - -For information on managing tables with the REST API or for more information on what data can be associated with a table, see [the API docs]({{% github-polaris "regtests/client/python/docs/CreateTableRequest.md" %}}). - -## View - -Polaris views are entities that map to [Apache Iceberg views](https://iceberg.apache.org/view-spec/). - -For information on managing views with the REST API or for more information on what data can be associated with a view, see [the API docs]({{% github-polaris "regtests/client/python/docs/CreateViewRequest.md" %}}). - -## Principal - -Polaris principals are unique identities that can be used to represent users or services. Each principal may have one or more [principal roles](#principal-role) assigned to it for the purpose of accessing catalogs and the entities within them. - -For information on managing principals with the REST API or for more information on what data can be associated with a principal, see [the API docs]({{% github-polaris "regtests/client/python/docs/CreatePrincipalRequest.md" %}}). - -## Principal Role - -Polaris principal roles are labels that may be granted to [principals](#principal). Each principal may have one or more principal roles, and the same principal role may be granted to multiple principals. Principal roles may be assigned based on the persona or responsibilities of a given principal, or on how that principal will need to access different entities within Polaris. - -For information on managing principal roles with the REST API or for more information on what data can be associated with a principal role, see [the API docs]({{% github-polaris "regtests/client/python/docs/CreatePrincipalRoleRequest.md" %}}). - - -## Catalog Role - -Polaris catalog roles are labels that may be granted to [catalogs](#catalog). Each catalog may have one or more catalog roles, and the same catalog role may be granted to multiple catalogs. Catalog roles may be assigned based on the nature of data that will reside in a catalog, or by the groups of users and services that might need to access that data. - -Each catalog role may have multiple [privileges](#privilege) granted to it, and each catalog role can be granted to one or more [principal roles](#principal-role). This is the mechanism by which principals are granted access to entities inside a catalog such as namespaces and tables. - -## Privilege - -Polaris privileges are granted to [catalog roles](#catalog-role) in order to grant principals with a given principal role some degree of access to catalogs with a given catalog role. When a privilege is granted to a catalog role, any principal roles granted that catalog role receive the privilege. In turn, any principals who are granted that principal role receive it. - -A privilege can be scoped to any entity inside a catalog, including the catalog itself. - -For a list of supported privileges for each privilege class, see the API docs: -* [Table Privileges]({{% github-polaris "regtests/client/python/docs/TablePrivilege.md" %}}) -* [View Privileges]({{% github-polaris "regtests/client/python/docs/ViewPrivilege.md" %}}) -* [Namespace Privileges]({{% github-polaris "regtests/client/python/docs/NamespacePrivilege.md" %}}) -* [Catalog Privileges]({{% github-polaris "regtests/client/python/docs/CatalogPrivilege.md" %}}) diff --git a/0.9.0/metastores.md b/0.9.0/metastores.md deleted file mode 100644 index 74766c9c80..0000000000 --- a/0.9.0/metastores.md +++ /dev/null @@ -1,112 +0,0 @@ ---- -# -# Licensed to the Apache Software Foundation (ASF) under one -# or more contributor license agreements. See the NOTICE file -# distributed with this work for additional information -# regarding copyright ownership. The ASF licenses this file -# to you under the Apache License, Version 2.0 (the -# "License"); you may not use this file except in compliance -# with the License. You may obtain a copy of the License at -# -# http://www.apache.org/licenses/LICENSE-2.0 -# -# Unless required by applicable law or agreed to in writing, -# software distributed under the License is distributed on an -# "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY -# KIND, either express or implied. See the License for the -# specific language governing permissions and limitations -# under the License. -# -title: Metastores -linkTitle: Metastores -type: docs -weight: 700 ---- - -This page documents important configurations for connecting to production database through [EclipseLink](https://eclipse.dev/eclipselink/). - -## Polaris Server Configuration -Configure the `metaStoreManager` section in the Polaris configuration (`polaris-server.yml` by default) as follows: -``` -metaStoreManager: - type: eclipse-link - conf-file: META-INF/persistence.xml - persistence-unit: polaris -``` - -`conf-file` must point to an [EclipseLink configuration file](https://eclipse.dev/eclipselink/documentation/2.5/solutions/testingjpa002.htm) - -By default, `conf-file` points to the embedded resource file `META-INF/persistence.xml` in the `polaris-eclipselink` module. - -In order to specify a configuration file outside the classpath, follow these steps. -1) Place `persistence.xml` into a jar file: `jar cvf /tmp/conf.jar persistence.xml` -2) Use `conf-file: /tmp/conf.jar!/persistence.xml` - -## EclipseLink Configuration - persistence.xml -The configuration file `persistence.xml` is used to set up the database connection properties, which can differ depending on the type of database and its configuration. - -Check out the default [persistence.xml](https://github.com/apache/polaris/blob/main/extension/persistence/eclipselink/src/main/resources/META-INF/persistence.xml) for a complete sample for connecting to the file-based H2 database. - -Polaris creates and connects to a separate database for each realm. Specifically, the `{realm}` placeholder in `jakarta.persistence.jdbc.url` is substituted with the actual realm name, allowing the Polaris server to connect to different databases based on the realm. - -> Note: some database systems such as Postgres don't create databases automatically. Database admins need to create them manually before running Polaris server. -```xml - - org.eclipse.persistence.jpa.PersistenceProvider - org.apache.polaris.jpa.models.ModelEntity - org.apache.polaris.jpa.models.ModelEntityActive - org.apache.polaris.jpa.models.ModelEntityChangeTracking - org.apache.polaris.jpa.models.ModelEntityDropped - org.apache.polaris.jpa.models.ModelGrantRecord - org.apache.polaris.jpa.models.ModelPrincipalSecrets - org.apache.polaris.jpa.models.ModelSequenceId - NONE - - - - - - - -``` - -A single `persistence.xml` can describe multiple [persistence units](https://eclipse.dev/eclipselink/documentation/2.6/concepts/app_dev001.htm). For example, with both a `polaris-dev` and `polaris` persistence unit defined, you could use a single `persistence.xml` to easily switch between development and production databases. Use `persistence-unit` in the Polaris server configuration to easily switch between persistence units. - -To build Polaris with the necessary H2 dependency and start the Polaris service, run the following: -```bash -polaris> ./gradlew --no-daemon --info -PeclipseLink=true -PeclipseLinkDeps=com.h2database:h2:2.3.232 clean shadowJar -polaris> java -jar dropwizard/service/build/libs/polaris-dropwizard-service-*.jar server ./polaris-server.yml -``` - -### Postgres - -The following shows a sample configuration for integrating Polaris with Postgres. - -```xml - - org.eclipse.persistence.jpa.PersistenceProvider - org.apache.polaris.jpa.models.ModelEntity - org.apache.polaris.jpa.models.ModelEntityActive - org.apache.polaris.jpa.models.ModelEntityChangeTracking - org.apache.polaris.jpa.models.ModelEntityDropped - org.apache.polaris.jpa.models.ModelGrantRecord - org.apache.polaris.jpa.models.ModelPrincipalSecrets - org.apache.polaris.jpa.models.ModelSequenceId - NONE - - - - - - - - -``` - -To build Polaris with the necessary Postgres dependency and start the Polaris service, run the following: -```bash -polaris> ./gradlew --no-daemon --info -PeclipseLink=true -PeclipseLinkDeps=org.postgresql:postgresql:42.7.4 clean shadowJar -polaris> java -jar dropwizard/service/build/libs/polaris-dropwizard-service-*.jar server ./polaris-server.yml -``` \ No newline at end of file diff --git a/0.9.0/overview.md b/0.9.0/overview.md deleted file mode 100644 index 41f8daeab2..0000000000 --- a/0.9.0/overview.md +++ /dev/null @@ -1,215 +0,0 @@ ---- -# -# Licensed to the Apache Software Foundation (ASF) under one -# or more contributor license agreements. See the NOTICE file -# distributed with this work for additional information -# regarding copyright ownership. The ASF licenses this file -# to you under the Apache License, Version 2.0 (the -# "License"); you may not use this file except in compliance -# with the License. You may obtain a copy of the License at -# -# http://www.apache.org/licenses/LICENSE-2.0 -# -# Unless required by applicable law or agreed to in writing, -# software distributed under the License is distributed on an -# "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY -# KIND, either express or implied. See the License for the -# specific language governing permissions and limitations -# under the License. -# -Title: Overview -type: docs -weight: 200 ---- - -Apache Polaris (Incubating) is a catalog implementation for Apache Iceberg™ tables and is built on the open source Apache Iceberg™ REST protocol. - -With Polaris, you can provide centralized, secure read and write access to your Iceberg tables across different REST-compatible query engines. - -![Conceptual diagram of Apache Polaris (Incubating).](/img/overview.svg "Apache Polaris (Incubating) overview") - -## Key concepts - -This section introduces key concepts associated with using Apache Polaris (Incubating). - -In the following diagram, a sample [Apache Polaris (Incubating) structure](#catalog) with nested [namespaces](#namespace) is shown for Catalog1. No tables -or namespaces have been created yet for Catalog2 or Catalog3. - -![Diagram that shows an example Apache Polaris (Incubating) structure.](/img/sample-catalog-structure.svg "Sample Apache Polaris (Incubating) structure") - -### Catalog - -In Polaris, you can create one or more catalog resources to organize Iceberg tables. - -Configure your catalog by setting values in the storage configuration for S3, Azure, or Google Cloud Storage. An Iceberg catalog enables a -query engine to manage and organize tables. The catalog forms the first architectural layer in the [Apache Iceberg™ table specification](https://iceberg.apache.org/spec/#overview) and must support the following tasks: - -- Storing the current metadata pointer for one or more Iceberg tables. A metadata pointer maps a table name to the location of that table's - current metadata file. - -- Performing atomic operations so that you can update the current metadata pointer for a table to the metadata pointer of a new version of - the table. - -To learn more about Iceberg catalogs, see the [Apache Iceberg™ documentation](https://iceberg.apache.org/concepts/catalog/). - -#### Catalog types - -A catalog can be one of the following two types: - -- Internal: The catalog is managed by Polaris. Tables from this catalog can be read and written in Polaris. - -- External: The catalog is externally managed by another Iceberg catalog provider (for example, Snowflake, Glue, Dremio Arctic). Tables from - this catalog are synced to Polaris. These tables are read-only in Polaris. In the current release, only a Snowflake external catalog is provided. - -A catalog is configured with a storage configuration that can point to S3, Azure storage, or GCS. - -### Namespace - -You create *namespaces* to logically group Iceberg tables within a catalog. A catalog can have multiple namespaces. You can also create -nested namespaces. Iceberg tables belong to namespaces. - -### Apache Iceberg™ tables and catalogs - -In an internal catalog, an Iceberg table is registered in Polaris, but read and written via query engines. The table data and -metadata is stored in your external cloud storage. The table uses Polaris as the Iceberg catalog. - -If you have tables housed in another Iceberg catalog, you can sync these tables to an external catalog in Polaris. -If you sync this catalog to Polaris, it appears as an external catalog in Polaris. Clients connecting to the external -catalog can read from or write to these tables. However, clients connecting to Polaris will only be able to -read from these tables. - -> **Important** -> -> For the access privileges defined for a catalog to be enforced correctly, the following conditions must be met: -> -> - The directory only contains the data files that belong to a single table. -> - The directory hierarchy matches the namespace hierarchy for the catalog. -> -> For example, if a catalog includes the following items: -> -> - Top-level namespace namespace1 -> - Nested namespace namespace1a -> - A customers table, which is grouped under nested namespace namespace1a -> - An orders table, which is grouped under nested namespace namespace1a -> -> The directory hierarchy for the catalog must follow this structure: -> -> - /namespace1/namespace1a/customers/ -> - /namespace1/namespace1a/orders/ - -### Service principal - -A service principal is an entity that you create in Polaris. Each service principal encapsulates credentials that you use to connect -to Polaris. - -Query engines use service principals to connect to catalogs. - -Polaris generates a Client ID and Client Secret pair for each service principal. - -The following table displays example service principals that you might create in Polaris: - - | Service connection name | Purpose | - | --------------------------- | ----------- | - | Flink ingestion | For Apache Flink® to ingest streaming data into Apache Iceberg™ tables. | - | Spark ETL pipeline | For Apache Spark™ to run ETL pipeline jobs on Iceberg tables. | - | Snowflake data pipelines | For Snowflake to run data pipelines for transforming data in Apache Iceberg™ tables. | - | Trino BI dashboard | For Trino to run BI queries for powering a dashboard. | - | Snowflake AI team | For Snowflake to run AI jobs on data in Apache Iceberg™ tables. | - -### Service connection - -A service connection represents a REST-compatible engine (such as Apache Spark™, Apache Flink®, or Trino) that can read from and write to Polaris -Catalog. When creating a new service connection, the Polaris administrator grants the service principal that is created with the new service -connection either a new or existing principal role. A principal role is a resource in Polaris that you can use to logically group Polaris -service principals together and grant privileges on securable objects. For more information, see [Principal role]({{% ref "access-control#principal-role" %}}). Polaris uses a role-based access control (RBAC) model to grant service principals access to resources. For more information, -see [Access control]({{% ref "access-control" %}}). For a diagram of this model, see [RBAC model]({{% ref "access-control#rbac-model" %}}). - -If the Polaris administrator grants the service principal for the new service connection a new principal role, the service principal -doesn't have any privileges granted to it yet. When securing the catalog that the new service connection will connect to, the Polaris -administrator grants privileges to catalog roles and then grants these catalog roles to the new principal role. As a result, the service -principal for the new service connection has these privileges. For more information about catalog roles, see [Catalog role]({{% ref "access-control#catalog-role" %}}). - -If the Polaris administrator grants an existing principal role to the service principal for the new service connection, the service principal -has the same privileges granted to the catalog roles that are granted to the existing principal role. If needed, the Polaris -administrator can grant additional catalog roles to the existing principal role or remove catalog roles from it to adjust the privileges -bestowed to the service principal. For an example of how RBAC works in Polaris, see [RBAC example]({{% ref "access-control#rbac-example" %}}). - -### Storage configuration - -A storage configuration stores a generated identity and access management (IAM) entity for your external cloud storage and is created -when you create a catalog. The storage configuration is used to set the values to connect Polaris to your cloud storage. During the -catalog creation process, an IAM entity is generated and used to create a trust relationship between the cloud storage provider and Polaris -Catalog. - -When you create a catalog, you supply the following information about your external cloud storage: - -| Cloud storage provider | Information | -| -----------------------| ----------- | -| Amazon S3 |
  • Default base location for your Amazon S3 bucket
  • Locations for your Amazon S3 bucket
  • S3 role ARN
  • External ID (optional)
| -| Google Cloud Storage (GCS) |
  • Default base location for your GCS bucket
  • Locations for your GCS bucket
| -| Azure |
  • Default base location for your Microsoft Azure container
  • Locations for your Microsoft Azure container
  • Azure tenant ID
| - -## Example workflow - -In the following example workflow, Bob creates an Apache Iceberg™ table named Table1 and Alice reads data from Table1. - -1. Bob uses Apache Spark™ to create the Table1 table under the - Namespace1 namespace in the Catalog1 catalog and insert values into - Table1. - - Bob can create Table1 and insert data into it because he is using a - service connection with a service principal that has - the privileges to perform these actions. - -2. Alice uses Snowflake to read data from Table1. - - Alice can read data from Table1 because she is using a service - connection with a service principal with a catalog integration that - has the privileges to perform this action. Alice - creates an unmanaged table in Snowflake to read data from Table1. - -![Diagram that shows an example workflow for Apache Polaris (Incubating)](/img/example-workflow.svg "Example workflow for Apache Polaris (Incubating)") - -## Security and access control - -This section describes security and access control. - -### Credential vending - -To secure interactions with service connections, Polaris vends temporary storage credentials to the query engine during query -execution. These credentials allow the query engine to run the query without requiring access to your external cloud storage for -Iceberg tables. This process is called credential vending. - -As of now, the following limitation is known regarding Apache Iceberg support: - -- **remove_orphan_files:** Apache Spark can't use credential vending - for this due to a known issue. See [apache/iceberg#7914](https://github.com/apache/iceberg/pull/7914) for details. - -### Identity and access management (IAM) - -Polaris uses the identity and access management (IAM) entity to securely connect to your storage for accessing table data, Iceberg -metadata, and manifest files that store the table schema, partitions, and other metadata. Polaris retains the IAM entity for your -storage location. - -### Access control - -Polaris enforces the access control that you configure across all tables registered with the service and governs security for all -queries from query engines in a consistent manner. - -Polaris uses a role-based access control (RBAC) model that lets you centrally configure access for Polaris service principals to catalogs, -namespaces, and tables. - -Polaris RBAC uses two different role types to delegate privileges: - -- **Principal roles:** Granted to Polaris service principals and - analogous to roles in other access control systems that you grant to - service principals. - -- **Catalog roles:** Configured with certain privileges on Polaris - catalog resources and granted to principal roles. - -For more information, see [Access control]({{% ref "access-control" %}}). - -## Legal Notices - -Apache®, Apache Iceberg™, Apache Spark™, Apache Flink®, and Flink® are either registered trademarks or trademarks of the Apache Software Foundation in the United States and/or other countries. diff --git a/0.9.0/polaris-management-service.md b/0.9.0/polaris-management-service.md deleted file mode 100644 index c81e4d90e0..0000000000 --- a/0.9.0/polaris-management-service.md +++ /dev/null @@ -1,27 +0,0 @@ ---- -# -# Licensed to the Apache Software Foundation (ASF) under one -# or more contributor license agreements. See the NOTICE file -# distributed with this work for additional information -# regarding copyright ownership. The ASF licenses this file -# to you under the Apache License, Version 2.0 (the -# "License"); you may not use this file except in compliance -# with the License. You may obtain a copy of the License at -# -# http://www.apache.org/licenses/LICENSE-2.0 -# -# Unless required by applicable law or agreed to in writing, -# software distributed under the License is distributed on an -# "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY -# KIND, either express or implied. See the License for the -# specific language governing permissions and limitations -# under the License. -# -title: 'Apache Polaris Management Service OpenAPI' -linkTitle: 'Management OpenAPI' -weight: 800 -params: - show_page_toc: false ---- - - diff --git a/0.9.0/quickstart.md b/0.9.0/quickstart.md deleted file mode 100644 index 57f8e767f9..0000000000 --- a/0.9.0/quickstart.md +++ /dev/null @@ -1,332 +0,0 @@ ---- -# -# Licensed to the Apache Software Foundation (ASF) under one -# or more contributor license agreements. See the NOTICE file -# distributed with this work for additional information -# regarding copyright ownership. The ASF licenses this file -# to you under the Apache License, Version 2.0 (the -# "License"); you may not use this file except in compliance -# with the License. You may obtain a copy of the License at -# -# http://www.apache.org/licenses/LICENSE-2.0 -# -# Unless required by applicable law or agreed to in writing, -# software distributed under the License is distributed on an -# "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY -# KIND, either express or implied. See the License for the -# specific language governing permissions and limitations -# under the License. -# -Title: Quick Start -type: docs -weight: 100 ---- - -This guide serves as a introduction to several key entities that can be managed with Apache Polaris (Incubating), describes how to build and deploy Polaris locally, and finally includes examples of how to use Polaris with Apache Spark™. - -## Prerequisites - -This guide covers building Polaris, deploying it locally or via [Docker](https://www.docker.com/), and interacting with it using the command-line interface and [Apache Spark](https://spark.apache.org/). Before proceeding with Polaris, be sure to satisfy the relevant prerequisites listed here. - -### Building and Deploying Polaris - -To get the latest Polaris code, you'll need to clone the repository using [git](https://git-scm.com/). You can install git using [homebrew](https://brew.sh/): - -```shell -brew install git -``` - -Then, use git to clone the Polaris repo: - -```shell -cd ~ -git clone https://github.com/apache/polaris.git -``` - -#### With Docker - -If you plan to deploy Polaris inside [Docker](https://www.docker.com/), you'll need to install docker itself. For example, this can be done using [homebrew](https://brew.sh/): - -```shell -brew install --cask docker -``` - -Once installed, make sure Docker is running. - -#### From Source - -If you plan to build Polaris from source yourself, you will need to satisfy a few prerequisites first. - -Polaris is built using [gradle](https://gradle.org/) and is compatible with Java 21. We recommend the use of [jenv](https://www.jenv.be/) to manage multiple Java versions. For example, to install Java 21 via [homebrew](https://brew.sh/) and configure it with jenv: - -```shell -cd ~/polaris -brew install openjdk@21 jenv -jenv add $(brew --prefix openjdk@21) -jenv local 21 -``` - -### Connecting to Polaris - -Polaris is compatible with any [Apache Iceberg](https://iceberg.apache.org/) client that supports the REST API. Depending on the client you plan to use, refer to the prerequisites below. - -#### With Spark - -If you want to connect to Polaris with [Apache Spark](https://spark.apache.org/), you'll need to start by cloning Spark. As [above](#building-and-deploying-polaris), make sure [git](https://git-scm.com/) is installed first. You can install it with [homebrew](https://brew.sh/): - -```shell -brew install git -``` - -Then, clone Spark and check out a versioned branch. This guide uses [Spark 3.5](https://spark.apache.org/releases/spark-release-3-5-0.html). - -```shell -cd ~ -git clone https://github.com/apache/spark.git -cd ~/spark -git checkout branch-3.5 -``` - -## Deploying Polaris - -Polaris can be deployed via a lightweight docker image or as a standalone process. Before starting, be sure that you've satisfied the relevant [prerequisites](#building-and-deploying-polaris) detailed above. - -### Docker Image - -To start using Polaris in Docker, launch Polaris while Docker is running: - -```shell -cd ~/polaris -docker compose -f docker-compose.yml up --build -``` - -Once the `polaris-polaris` container is up, you can continue to [Defining a Catalog](#defining-a-catalog). - -### Building Polaris - -Run Polaris locally with: - -```shell -cd ~/polaris -./gradlew runApp -``` - -You should see output for some time as Polaris builds and starts up. Eventually, you won’t see any more logs and should see messages that resemble the following: - -``` -INFO [...] [main] [] o.e.j.s.handler.ContextHandler: Started i.d.j.MutableServletContextHandler@... -INFO [...] [main] [] o.e.j.server.AbstractConnector: Started application@... -INFO [...] [main] [] o.e.j.server.AbstractConnector: Started admin@... -INFO [...] [main] [] o.eclipse.jetty.server.Server: Started Server@... -``` - -At this point, Polaris is running. - -## Bootstrapping Polaris - -For this tutorial, we'll launch an instance of Polaris that stores entities only in-memory. This means that any entities that you define will be destroyed when Polaris is shut down. It also means that Polaris will automatically bootstrap itself with root credentials. For more information on how to configure Polaris for production usage, see the [docs]({{% ref "configuring-polaris-for-production" %}}). - -When Polaris is launched using in-memory mode the root principal credentials can be found in stdout on initial startup. For example: - -``` -realm: default-realm root principal credentials: : -``` - -Be sure to note of these credentials as we'll be using them below. You can also set these credentials as environment variables for use with the Polaris CLI: - -```shell -export CLIENT_ID= -export CLIENT_SECRET= -``` - -## Defining a Catalog - -In Polaris, the [catalog]({{% ref "entities#catalog" %}}) is the top-level entity that objects like [tables]({{% ref "entities#table" %}}) and [views]({{% ref "entities#view" %}}) are organized under. With a Polaris service running, you can create a catalog like so: - -```shell -cd ~/polaris - -./polaris \ - --client-id ${CLIENT_ID} \ - --client-secret ${CLIENT_SECRET} \ - catalogs \ - create \ - --storage-type s3 \ - --default-base-location ${DEFAULT_BASE_LOCATION} \ - --role-arn ${ROLE_ARN} \ - quickstart_catalog -``` - -This will create a new catalog called **quickstart_catalog**. - -The `DEFAULT_BASE_LOCATION` you provide will be the default location that objects in this catalog should be stored in, and the `ROLE_ARN` you provide should be a [Role ARN](https://docs.aws.amazon.com/IAM/latest/UserGuide/reference-arns.html) with access to read and write data in that location. These credentials will be provided to engines reading data from the catalog once they have authenticated with Polaris using credentials that have access to those resources. - -If you’re using a storage type other than S3, such as Azure, you’ll provide a different type of credential than a Role ARN. For more details on supported storage types, see the [docs]({{% ref "entities#storage-type" %}}). - -Additionally, if Polaris is running somewhere other than `localhost:8181`, you can specify the correct hostname and port by providing `--host` and `--port` flags. For the full set of options supported by the CLI, please refer to the [docs]({{% ref "command-line-interface" %}}). - - -### Creating a Principal and Assigning it Privileges - -With a catalog created, we can create a [principal]({{% ref "entities#principal" %}}) that has access to manage that catalog. For details on how to configure the Polaris CLI, see [the section above](#defining-a-catalog) or refer to the [docs]({{% ref "command-line-interface" %}}). - -```shell -./polaris \ - --client-id ${CLIENT_ID} \ - --client-secret ${CLIENT_SECRET} \ - principals \ - create \ - quickstart_user - -./polaris \ - --client-id ${CLIENT_ID} \ - --client-secret ${CLIENT_SECRET} \ - principal-roles \ - create \ - quickstart_user_role - -./polaris \ - --client-id ${CLIENT_ID} \ - --client-secret ${CLIENT_SECRET} \ - catalog-roles \ - create \ - --catalog quickstart_catalog \ - quickstart_catalog_role -``` - -Be sure to provide the necessary credentials, hostname, and port as before. - -When the `principals create` command completes successfully, it will return the credentials for this new principal. Be sure to note these down for later. For example: - -``` -./polaris ... principals create example -{"clientId": "XXXX", "clientSecret": "YYYY"} -``` - -Now, we grant the principal the [principal role]({{% ref "entities#principal-role" %}}) we created, and grant the [catalog role]({{% ref "entities#catalog-role" %}}) the principal role we created. For more information on these entities, please refer to the linked documentation. - -```shell -./polaris \ - --client-id ${CLIENT_ID} \ - --client-secret ${CLIENT_SECRET} \ - principal-roles \ - grant \ - --principal quickstart_user \ - quickstart_user_role - -./polaris \ - --client-id ${CLIENT_ID} \ - --client-secret ${CLIENT_SECRET} \ - catalog-roles \ - grant \ - --catalog quickstart_catalog \ - --principal-role quickstart_user_role \ - quickstart_catalog_role -``` - -Now, we’ve linked our principal to the catalog via roles like so: - -![Principal to Catalog](/img/quickstart/privilege-illustration-1.png "Principal to Catalog") - -In order to give this principal the ability to interact with the catalog, we must assign some [privileges]({{% ref "entities#privilege" %}}). For the time being, we will give this principal the ability to fully manage content in our new catalog. We can do this with the CLI like so: - -```shell -./polaris \ - --client-id ${CLIENT_ID} \ - --client-secret ${CLIENT_SECRET} \ - privileges \ - catalog \ - grant \ - --catalog quickstart_catalog \ - --catalog-role quickstart_catalog_role \ - CATALOG_MANAGE_CONTENT -``` - -This grants the [catalog privileges]({{% ref "entities#privilege" %}}) `CATALOG_MANAGE_CONTENT` to our catalog role, linking everything together like so: - -![Principal to Catalog with Catalog Role](/img/quickstart/privilege-illustration-2.png "Principal to Catalog with Catalog Role") - -`CATALOG_MANAGE_CONTENT` has create/list/read/write privileges on all entities within the catalog. The same privilege could be granted to a namespace, in which case the principal could create/list/read/write any entity under that namespace. - -## Using Iceberg & Polaris - -At this point, we’ve created a principal and granted it the ability to manage a catalog. We can now use an external engine to assume that principal, access our catalog, and store data in that catalog using [Apache Iceberg](https://iceberg.apache.org/). - -### Connecting with Spark - -To use a Polaris-managed catalog in [Apache Spark](https://spark.apache.org/), we can configure Spark to use the Iceberg catalog REST API. - -This guide uses [Apache Spark 3.5](https://spark.apache.org/releases/spark-release-3-5-0.html), but be sure to find [the appropriate iceberg-spark package for your Spark version](https://mvnrepository.com/artifact/org.apache.iceberg/iceberg-spark). From a local Spark clone on the `branch-3.5` branch we can run the following: - -_Note: the credentials provided here are those for our principal, not the root credentials._ - -```shell -bin/spark-shell \ ---packages org.apache.iceberg:iceberg-spark-runtime-3.5_2.12:1.7.1,org.apache.hadoop:hadoop-aws:3.4.0 \ ---conf spark.sql.extensions=org.apache.iceberg.spark.extensions.IcebergSparkSessionExtensions \ ---conf spark.sql.catalog.quickstart_catalog.warehouse=quickstart_catalog \ ---conf spark.sql.catalog.quickstart_catalog.header.X-Iceberg-Access-Delegation=vended-credentials \ ---conf spark.sql.catalog.quickstart_catalog=org.apache.iceberg.spark.SparkCatalog \ ---conf spark.sql.catalog.quickstart_catalog.catalog-impl=org.apache.iceberg.rest.RESTCatalog \ ---conf spark.sql.catalog.quickstart_catalog.uri=http://localhost:8181/api/catalog \ ---conf spark.sql.catalog.quickstart_catalog.credential='XXXX:YYYY' \ ---conf spark.sql.catalog.quickstart_catalog.scope='PRINCIPAL_ROLE:ALL' \ ---conf spark.sql.catalog.quickstart_catalog.token-refresh-enabled=true -``` - - -Replace `XXXX` and `YYYY` with the client ID and client secret generated when you created the `quickstart_user` principal. - -Similar to the CLI commands above, this configures Spark to use the Polaris running at `localhost:8181`. If your Polaris server is running elsewhere, but sure to update the configuration appropriately. - -Finally, note that we include the `hadoop-aws` package here. If your table is using a different filesystem, be sure to include the appropriate dependency. - -Once the Spark session starts, we can create a namespace and table within the catalog: - -``` -spark.sql("USE quickstart_catalog") -spark.sql("CREATE NAMESPACE IF NOT EXISTS quickstart_namespace") -spark.sql("CREATE NAMESPACE IF NOT EXISTS quickstart_namespace.schema") -spark.sql("USE NAMESPACE quickstart_namespace.schema") -spark.sql(""" - CREATE TABLE IF NOT EXISTS quickstart_table ( - id BIGINT, data STRING - ) -USING ICEBERG -""") -``` - -We can now use this table like any other: - -``` -spark.sql("INSERT INTO quickstart_table VALUES (1, 'some data')") -spark.sql("SELECT * FROM quickstart_table").show(false) -. . . -+---+---------+ -|id |data | -+---+---------+ -|1 |some data| -+---+---------+ -``` - -If at any time access is revoked... - -```shell -./polaris \ - --client-id ${CLIENT_ID} \ - --client-secret ${CLIENT_SECRET} \ - privileges \ - catalog \ - revoke \ - --catalog quickstart_catalog \ - --catalog-role quickstart_catalog_role \ - CATALOG_MANAGE_CONTENT -``` - -Spark will lose access to the table: - -``` -spark.sql("SELECT * FROM quickstart_table").show(false) - -org.apache.iceberg.exceptions.ForbiddenException: Forbidden: Principal 'quickstart_user' with activated PrincipalRoles '[]' and activated ids '[6, 7]' is not authorized for op LOAD_TABLE_WITH_READ_DELEGATION -``` diff --git a/0.9.0/rest-catalog-open-api.md b/0.9.0/rest-catalog-open-api.md deleted file mode 100644 index 896ac66bfa..0000000000 --- a/0.9.0/rest-catalog-open-api.md +++ /dev/null @@ -1,27 +0,0 @@ ---- -# -# Licensed to the Apache Software Foundation (ASF) under one -# or more contributor license agreements. See the NOTICE file -# distributed with this work for additional information -# regarding copyright ownership. The ASF licenses this file -# to you under the Apache License, Version 2.0 (the -# "License"); you may not use this file except in compliance -# with the License. You may obtain a copy of the License at -# -# http://www.apache.org/licenses/LICENSE-2.0 -# -# Unless required by applicable law or agreed to in writing, -# software distributed under the License is distributed on an -# "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY -# KIND, either express or implied. See the License for the -# specific language governing permissions and limitations -# under the License. -# -title: 'Apache Iceberg OpenAPI' -linkTitle: 'Iceberg OpenAPI' -weight: 900 -params: - show_page_toc: false ---- - - diff --git a/1.0.0/_index.md b/1.0.0/_index.md deleted file mode 100644 index bc1d4f6ec0..0000000000 --- a/1.0.0/_index.md +++ /dev/null @@ -1,180 +0,0 @@ ---- -# -# Licensed to the Apache Software Foundation (ASF) under one -# or more contributor license agreements. See the NOTICE file -# distributed with this work for additional information -# regarding copyright ownership. The ASF licenses this file -# to you under the Apache License, Version 2.0 (the -# "License"); you may not use this file except in compliance -# with the License. You may obtain a copy of the License at -# -# http://www.apache.org/licenses/LICENSE-2.0 -# -# Unless required by applicable law or agreed to in writing, -# software distributed under the License is distributed on an -# "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY -# KIND, either express or implied. See the License for the -# specific language governing permissions and limitations -# under the License. -# -linkTitle: '1.0.0' -title: 'Overview' -type: docs -weight: 200 -params: - top_hidden: true - show_page_toc: false -cascade: - type: docs - params: - show_page_toc: true -# This file will NOT be copied into a new release's versioned docs folder. ---- - -Apache Polaris (Incubating) is a catalog implementation for Apache Iceberg™ tables and is built on the open source Apache Iceberg™ REST protocol. - -With Polaris, you can provide centralized, secure read and write access to your Iceberg tables across different REST-compatible query engines. - -![Conceptual diagram of Apache Polaris (Incubating).](/img/overview.svg "Apache Polaris (Incubating) overview") - -## Key concepts - -This section introduces key concepts associated with using Apache Polaris (Incubating). - -In the following diagram, a sample [Apache Polaris (Incubating) structure](#catalog) with nested [namespaces](#namespace) is shown for Catalog1. No tables -or namespaces have been created yet for Catalog2 or Catalog3. - -![Diagram that shows an example Apache Polaris (Incubating) structure.](/img/sample-catalog-structure.svg "Sample Apache Polaris (Incubating) structure") - -### Catalog - -In Polaris, you can create one or more catalog resources to organize Iceberg tables. - -Configure your catalog by setting values in the storage configuration for S3, Azure, or Google Cloud Storage. An Iceberg catalog enables a -query engine to manage and organize tables. The catalog forms the first architectural layer in the [Apache Iceberg™ table specification](https://iceberg.apache.org/spec/#overview) and must support the following tasks: - -- Storing the current metadata pointer for one or more Iceberg tables. A metadata pointer maps a table name to the location of that table's - current metadata file. - -- Performing atomic operations so that you can update the current metadata pointer for a table to the metadata pointer of a new version of - the table. - -To learn more about Iceberg catalogs, see the [Apache Iceberg™ documentation](https://iceberg.apache.org/concepts/catalog/). - -#### Catalog types - -A catalog can be one of the following two types: - -- Internal: The catalog is managed by Polaris. Tables from this catalog can be read and written in Polaris. - -- External: The catalog is externally managed by another Iceberg catalog provider (for example, Snowflake, Glue, Dremio Arctic). Tables from - this catalog are synced to Polaris. These tables are read-only in Polaris. - -A catalog is configured with a storage configuration that can point to S3, Azure storage, or GCS. - -### Namespace - -You create *namespaces* to logically group Iceberg tables within a catalog. A catalog can have multiple namespaces. You can also create -nested namespaces. Iceberg tables belong to namespaces. - -> **Important** -> -> For the access privileges defined for a catalog to be enforced correctly, the following conditions must be met: -> -> - The directory only contains the data files that belong to a single table. -> - The directory hierarchy matches the namespace hierarchy for the catalog. -> -> For example, if a catalog includes the following items: -> -> - Top-level namespace namespace1 -> - Nested namespace namespace1a -> - A customers table, which is grouped under nested namespace namespace1a -> - An orders table, which is grouped under nested namespace namespace1a -> -> The directory hierarchy for the catalog must follow this structure: -> -> - /namespace1/namespace1a/customers/ -> - /namespace1/namespace1a/orders/ - -### Storage configuration - -A storage configuration stores a generated identity and access management (IAM) entity for your cloud storage and is created -when you create a catalog. The storage configuration is used to set the values to connect Polaris to your cloud storage. During the -catalog creation process, an IAM entity is generated and used to create a trust relationship between the cloud storage provider and Polaris -Catalog. - -When you create a catalog, you supply the following information about your cloud storage: - -| Cloud storage provider | Information | -| -----------------------| ----------- | -| Amazon S3 |
  • Default base location for your Amazon S3 bucket
  • Locations for your Amazon S3 bucket
  • S3 role ARN
  • External ID (optional)
| -| Google Cloud Storage (GCS) |
  • Default base location for your GCS bucket
  • Locations for your GCS bucket
| -| Azure |
  • Default base location for your Microsoft Azure container
  • Locations for your Microsoft Azure container
  • Azure tenant ID
| - -## Example workflow - -In the following example workflow, Bob creates an Apache Iceberg™ table named Table1 and Alice reads data from Table1. - -1. Bob uses Apache Spark™ to create the Table1 table under the - Namespace1 namespace in the Catalog1 catalog and insert values into - Table1. - - Bob can create Table1 and insert data into it because he is using a - service connection with a service principal that has - the privileges to perform these actions. - -2. Alice uses Snowflake to read data from Table1. - - Alice can read data from Table1 because she is using a service - connection with a service principal with a catalog integration that - has the privileges to perform this action. Alice - creates an unmanaged table in Snowflake to read data from Table1. - -![Diagram that shows an example workflow for Apache Polaris (Incubating)](/img/example-workflow.svg "Example workflow for Apache Polaris (Incubating)") - -## Security and access control - -### Credential vending - -To secure interactions with service connections, Polaris vends temporary storage credentials to the query engine during query -execution. These credentials allow the query engine to run the query without requiring access to your cloud storage for -Iceberg tables. This process is called credential vending. - -As of now, the following limitation is known regarding Apache Iceberg support: - -- **remove_orphan_files:** Apache Spark can't use credential vending - for this due to a known issue. See [apache/iceberg#7914](https://github.com/apache/iceberg/pull/7914) for details. - -### Identity and access management (IAM) - -Polaris uses the identity and access management (IAM) entity to securely connect to your storage for accessing table data, Iceberg -metadata, and manifest files that store the table schema, partitions, and other metadata. Polaris retains the IAM entity for your -storage location. - -### Access control - -Polaris enforces the access control that you configure across all tables registered with the service and governs security for all -queries from query engines in a consistent manner. - -Polaris uses a role-based access control (RBAC) model that lets you centrally configure access for Polaris service principals to catalogs, -namespaces, and tables. - -Polaris RBAC uses two different role types to delegate privileges: - -- **Principal roles:** Granted to Polaris service principals and - analogous to roles in other access control systems that you grant to - service principals. - -- **Catalog roles:** Configured with certain privileges on Polaris - catalog resources and granted to principal roles. - -For more information, see [Access control]({{% ref "access-control" %}}). - -## Legal Notices - -Apache®, Apache Iceberg™, Apache Spark™, Apache Flink®, and Flink® are either registered trademarks or trademarks of the Apache Software Foundation in the United States and/or other countries. - - - diff --git a/1.0.0/access-control.md b/1.0.0/access-control.md deleted file mode 100644 index f8c21ab781..0000000000 --- a/1.0.0/access-control.md +++ /dev/null @@ -1,212 +0,0 @@ ---- -# -# Licensed to the Apache Software Foundation (ASF) under one -# or more contributor license agreements. See the NOTICE file -# distributed with this work for additional information -# regarding copyright ownership. The ASF licenses this file -# to you under the Apache License, Version 2.0 (the -# "License"); you may not use this file except in compliance -# with the License. You may obtain a copy of the License at -# -# http://www.apache.org/licenses/LICENSE-2.0 -# -# Unless required by applicable law or agreed to in writing, -# software distributed under the License is distributed on an -# "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY -# KIND, either express or implied. See the License for the -# specific language governing permissions and limitations -# under the License. -# -Title: Access Control -type: docs -weight: 500 ---- - -This section provides information about how access control works for Apache Polaris (Incubating). - -Polaris uses a role-based access control (RBAC) model in which the Polaris administrator assigns access privileges to catalog roles -and then grants access to resources to service principals by assigning catalog roles to principal roles. - -These are the key concepts to understanding access control in Polaris: - -- **Securable object** -- **Principal role** -- **Catalog role** -- **Privilege** - -## Securable object - -A securable object is an object to which access can be granted. Polaris -has the following securable objects: - -- Catalog -- Namespace -- Iceberg table -- View - -## Principal role - -A principal role is a resource in Polaris that you can use to logically group Polaris service principals together and grant privileges on -securable objects. - -Polaris supports a many-to-one relationship between service principals and principal roles. For example, to grant the same privileges to -multiple service principals, you can grant a single principal role to those service principals. A service principal can be granted one -principal role. When registering a service connection, the Polaris administrator specifies the principal role that is granted to the -service principal. - -You don't grant privileges directly to a principal role. Instead, you configure object permissions at the catalog role level, and then grant -catalog roles to a principal role. - -The following table shows examples of principal roles that you might configure in Polaris: - -| Principal role name | Description | -| -----------------------| ----------- | -| Data_engineer | A role that is granted to multiple service principals for running data engineering jobs. | -| Data_scientist | A role that is granted to multiple service principals for running data science or AI jobs. | - -## Catalog role - -A catalog role belongs to a particular catalog resource in Polaris and specifies a set of permissions for actions on the catalog or objects -in the catalog, such as catalog namespaces or tables. You can create one or more catalog roles for a catalog. - -You grant privileges to a catalog role and then grant the catalog role to a principal role to bestow the privileges to one or more service -principals. - -> **Note** -> -> If you update the privileges bestowed to a service principal, the updates won't take effect for up to one hour. This means that if you -> revoke or grant some privileges for a catalog, the updated privileges won't take effect on any service principal with access to that catalog -> for up to one hour. - -Polaris also supports a many-to-many relationship between catalog roles and principal roles. You can grant the same catalog role to one or more -principal roles. Likewise, a principal role can be granted to one or more catalog roles. - -The following table displays examples of catalog roles that you might -configure in Polaris: - -| Example Catalog role | Description| -| -----------------------|-----------| -| Catalog administrators | A role that has been granted multiple privileges to emulate full access to the catalog.
Principal roles that have been granted this role are permitted to create, alter, read, write, and drop tables in the catalog. | -| Catalog readers | A role that has been granted read-only privileges to tables in the catalog.
Principal roles that have been granted this role are allowed to read from tables in the catalog. | -| Catalog contributor | A role that has been granted read and write access privileges to all tables that belong to the catalog.
Principal roles that have been granted this role are allowed to perform read and write operations on tables in the catalog. | - -## RBAC model - -The following diagram illustrates the RBAC model used by Polaris. For each catalog, the Polaris administrator assigns access -privileges to catalog roles and then grants service principals access to resources by assigning catalog roles to principal roles. Polaris -supports a many-to-one relationship between service principals and principal roles. - -![Diagram that shows the RBAC model for Apache Polaris.](/img/rbac-model.svg "Apache Polaris RBAC model") - -## Access control privileges - -This section describes the privileges that are available in the Polaris access control model. Privileges are granted to catalog roles, catalog -roles are granted to principal roles, and principal roles are granted to service principals to specify the operations that service principals can -perform on objects in Polaris. - -> **Important** -> -> You can only grant privileges at the catalog level. Fine-grained access controls are not available. For example, you can grant read -> privileges to all tables in a catalog but not to an individual table in the catalog. - -To grant the full set of privileges (drop, list, read, write, etc.) on an object, you can use the *full privilege* option. - -### Table privileges - -| Privilege | Description | -| --------- | ----------- | -| TABLE_CREATE | Enables registering a table with the catalog. | -| TABLE_DROP | Enables dropping a table from the catalog. | -| TABLE_LIST | Enables listing any table in the catalog. | -| TABLE_READ_PROPERTIES | Enables reading properties of the table. | -| TABLE_WRITE_PROPERTIES | Enables configuring properties for the table. | -| TABLE_READ_DATA | Enables reading data from the table by receiving short-lived read-only storage credentials from the catalog. | -| TABLE_WRITE_DATA | Enables writing data to the table by receiving short-lived read+write storage credentials from the catalog. | -| TABLE_FULL_METADATA | Grants all table privileges, except TABLE_READ_DATA and TABLE_WRITE_DATA, which need to be granted individually. | -| TABLE_ATTACH_POLICY | Enables attaching policy to a table. | -| TABLE_DETACH_POLICY | Enables detaching policy from a table. | - -### View privileges - -| Privilege | Description | -| --------- | ----------- | -| VIEW_CREATE | Enables registering a view with the catalog. | -| VIEW_DROP | Enables dropping a view from the catalog. | -| VIEW_LIST | Enables listing any views in the catalog. | -| VIEW_READ_PROPERTIES | Enables reading all the view properties. | -| VIEW_WRITE_PROPERTIES | Enables configuring view properties. | -| VIEW_FULL_METADATA | Grants all view privileges. | - -### Namespace privileges - -| Privilege | Description | -| --------- | ----------- | -| NAMESPACE_CREATE | Enables creating a namespace in a catalog. | -| NAMESPACE_DROP | Enables dropping the namespace from the catalog. | -| NAMESPACE_LIST | Enables listing any object in the namespace, including nested namespaces and tables. | -| NAMESPACE_READ_PROPERTIES | Enables reading all the namespace properties. | -| NAMESPACE_WRITE_PROPERTIES | Enables configuring namespace properties. | -| NAMESPACE_FULL_METADATA | Grants all namespace privileges. | -| NAMESPACE_ATTACH_POLICY | Enables attaching policy to a namespace. | -| NAMESPACE_DETACH_POLICY | Enables detaching policy from a namespace. | - -### Catalog privileges - -| Privilege | Description | -| -----------------------| ----------- | -| CATALOG_MANAGE_ACCESS | Includes the ability to grant or revoke privileges on objects in a catalog to catalog roles, and the ability to grant or revoke catalog roles to or from principal roles. | -| CATALOG_MANAGE_CONTENT | Enables full management of content for the catalog. This privilege encompasses the following privileges:
  • CATALOG_MANAGE_METADATA
  • TABLE_FULL_METADATA
  • NAMESPACE_FULL_METADATA
  • VIEW_FULL_METADATA
  • TABLE_WRITE_DATA
  • TABLE_READ_DATA
  • CATALOG_READ_PROPERTIES
  • CATALOG_WRITE_PROPERTIES
| -| CATALOG_MANAGE_METADATA | Enables full management of the catalog, catalog roles, namespaces, and tables. | -| CATALOG_READ_PROPERTIES | Enables listing catalogs and reading properties of the catalog. | -| CATALOG_WRITE_PROPERTIES | Enables configuring catalog properties. | -| CATALOG_ATTACH_POLICY | Enables attaching policy to a catalog. | -| CATALOG_DETACH_POLICY | Enables detaching policy from a catalog. | - -### Policy privileges - -| Privilege | Description | -| -----------------------| ----------- | -| POLICY_CREATE | Enables creating a policy under specified namespace. | -| POLICY_READ | Enables reading policy content and metadata. | -| POLICY_WRITE | Enables updating the policy details such as its content or description. | -| POLICY_LIST | Enables listing any policy from the catalog. | -| POLICY_DROP | Enables dropping a policy if it is not attached to any resource entity. | -| POLICY_FULL_METADATA | Grants all policy privileges. | -| POLICY_ATTACH | Enables policy to be attached to entities. | -| POLICY_DETACH | Enables policy to be detached from entities. | - -## RBAC example - -The following diagram illustrates how RBAC works in Polaris and -includes the following users: - -- **Alice:** A service admin who signs up for Polaris. Alice can - create service principals. She can also create catalogs and - namespaces and configure access control for Polaris resources. - -- **Bob:** A data engineer who uses Apache Spark™ to - interact with Polaris. - - - Alice has created a service principal for Bob. It has been - granted the Data_engineer principal role, which in turn has been - granted the following catalog roles: Catalog contributor and - Data administrator (for both the Silver and Gold zone catalogs - in the following diagram). - - - The Catalog contributor role grants permission to create - namespaces and tables in the Bronze zone catalog. - - - The Data administrator roles grant full administrative rights to - the Silver zone catalog and Gold zone catalog. - -- **Mark:** A data scientist who uses trains models with data managed - by Polaris. - - - Alice has created a service principal for Mark. It has been - granted the Data_scientist principal role, which in turn has - been granted the catalog role named Catalog reader. - - - The Catalog reader role grants read-only access for a catalog - named Gold zone catalog. - -![Diagram that shows an example of how RBAC works in Apache Polaris.](/img/rbac-example.svg "Apache Polaris RBAC example") diff --git a/1.0.0/admin-tool.md b/1.0.0/admin-tool.md deleted file mode 100644 index 14f37b6f0f..0000000000 --- a/1.0.0/admin-tool.md +++ /dev/null @@ -1,142 +0,0 @@ ---- -# -# Licensed to the Apache Software Foundation (ASF) under one -# or more contributor license agreements. See the NOTICE file -# distributed with this work for additional information -# regarding copyright ownership. The ASF licenses this file -# to you under the Apache License, Version 2.0 (the -# "License"); you may not use this file except in compliance -# with the License. You may obtain a copy of the License at -# -# http://www.apache.org/licenses/LICENSE-2.0 -# -# Unless required by applicable law or agreed to in writing, -# software distributed under the License is distributed on an -# "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY -# KIND, either express or implied. See the License for the -# specific language governing permissions and limitations -# under the License. -# -title: Admin Tool -type: docs -weight: 300 ---- - -Polaris includes a tool for administrators to manage the metastore. - -The tool must be built with the necessary JDBC drivers to access the metastore database. For -example, to build the tool with support for Postgres, run the following: - -```shell -./gradlew \ - :polaris-admin:assemble \ - :polaris-admin:quarkusAppPartsBuild --rerun \ - -Dquarkus.container-image.build=true -``` - -The above command will generate: - -- One standalone JAR in `runtime/admin/build/polaris-admin-*-runner.jar` -- Two distribution archives in `runtime/admin/build/distributions` -- Two Docker images named `apache/polaris-admin-tool:latest` and `apache/polaris-admin-tool:` - -## Usage - -Please make sure the admin tool and Polaris server are with the same version before using it. -To run the standalone JAR, use the following command: - -```shell -java -jar runtime/admin/build/polaris-admin-*-runner.jar --help -``` - -To run the Docker image, use the following command: - -```shell -docker run apache/polaris-admin-tool:latest --help -``` - -The basic usage of the Polaris Admin Tool is outlined below: - -``` -Usage: polaris-admin-runner.jar [-hV] [COMMAND] -Polaris Admin Tool - -h, --help Show this help message and exit. - -V, --version Print version information and exit. -Commands: - help Display help information about the specified command. - bootstrap Bootstraps realms and principal credentials. - purge Purge principal credentials. -``` - -## Configuration - -The Polaris Admin Tool must be executed with the same configuration as the Polaris server. The -configuration can be done via environment variables or system properties. - -At a minimum, it is necessary to configure the Polaris Admin Tool to connect to the same database -used by the Polaris server. - -See the [metastore documentation]({{% ref "metastores" %}}) for more information on configuring the -database connection. - -Note: Polaris will always create schema 'polaris_schema' during bootstrap under the configured database. - -## Bootstrapping Realms and Principal Credentials - -The `bootstrap` command is used to bootstrap realms and create the necessary principal credentials -for the Polaris server. This command is idempotent and can be run multiple times without causing any -issues. If a realm is already bootstrapped, running the `bootstrap` command again will not have any -effect on that realm. - -```shell -java -jar runtime/admin/build/polaris-admin-*-runner.jar bootstrap --help -``` - -The basic usage of the `bootstrap` command is outlined below: - -``` -Usage: polaris-admin-runner.jar bootstrap [-hV] [-c=]... -r= [-r=]... -Bootstraps realms and root principal credentials. - -c, --credential= - Root principal credentials to bootstrap. Must be of the form - 'realm,clientId,clientSecret'. - -h, --help Show this help message and exit. - -r, --realm= The name of a realm to bootstrap. - -V, --version Print version information and exit. -``` - -For example, to bootstrap the `realm1` realm and create its root principal credential with the -client ID `admin` and client secret `admin`, you can run the following command: - -```shell -java -jar runtime/admin/build/polaris-admin-*-runner.jar bootstrap -r realm1 -c realm1,admin,admin -``` - -## Purging Realms and Principal Credentials - -The `purge` command is used to remove realms and principal credentials from the Polaris server. - -> Warning: Running the `purge` command will remove all data associated with the specified realms! - This includes all entities (catalogs, namespaces, tables, views, roles), all principal - credentials, grants, and any other data associated with the realms. - -```shell -java -jar runtime/admin/build/polaris-admin-*-runner.jar purge --help -``` - -The basic usage of the `purge` command is outlined below: - -``` -Usage: polaris-admin-runner.jar purge [-hV] -r= [-r=]... -Purge realms and all associated entities. - -h, --help Show this help message and exit. - -r, --realm= The name of a realm to purge. - -V, --version Print version information and exit. -``` - -For example, to purge the `realm1` realm, you can run the following command: - -```shell -java -jar runtime/admin/build/polaris-admin-*-runner.jar purge -r realm1 -``` \ No newline at end of file diff --git a/1.0.0/command-line-interface.md b/1.0.0/command-line-interface.md deleted file mode 100644 index f20210e2c6..0000000000 --- a/1.0.0/command-line-interface.md +++ /dev/null @@ -1,1224 +0,0 @@ ---- -# -# Licensed to the Apache Software Foundation (ASF) under one -# or more contributor license agreements. See the NOTICE file -# distributed with this work for additional information -# regarding copyright ownership. The ASF licenses this file -# to you under the Apache License, Version 2.0 (the -# "License"); you may not use this file except in compliance -# with the License. You may obtain a copy of the License at -# -# http://www.apache.org/licenses/LICENSE-2.0 -# -# Unless required by applicable law or agreed to in writing, -# software distributed under the License is distributed on an -# "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY -# KIND, either express or implied. See the License for the -# specific language governing permissions and limitations -# under the License. -# -title: Command Line Interface -type: docs -weight: 300 ---- - -In order to help administrators quickly set up and manage their Polaris server, Polaris provides a simple command-line interface (CLI) for common tasks. - -The basic syntax of the Polaris CLI is outlined below: - -``` -polaris [options] COMMAND ... - -options: ---host ---port ---base-url ---client-id ---client-secret ---access-token ---profile -``` - -`COMMAND` must be one of the following: -1. catalogs -2. principals -3. principal-roles -4. catalog-roles -5. namespaces -6. privileges -7. profiles - -Each _command_ supports several _subcommands_, and some _subcommands_ have _actions_ that come after the subcommand in turn. Finally, _arguments_ follow to form a full invocation. Within a set of named arguments at the end of an invocation ordering is generally not important. Many invocations also have a required positional argument of the type that the _command_ refers to. Again, the ordering of this positional argument relative to named arguments is not important. - -Some example full invocations: - -``` -polaris principals list -polaris catalogs delete some_catalog_name -polaris catalogs update --property foo=bar some_other_catalog -polaris catalogs update another_catalog --property k=v -polaris privileges namespace grant --namespace some.schema --catalog fourth_catalog --catalog-role some_catalog_role TABLE_READ_DATA -polaris profiles list -``` - -### Authentication - -As outlined above, the Polaris CLI may take credentials using the `--client-id` and `--client-secret` options. For example: - -``` -polaris --client-id 4b5ed1ca908c3cc2 --client-secret 07ea8e4edefb9a9e57c247e8d1a4f51c principals ... -``` - -If `--client-id` and `--client-secret` are not provided, the Polaris CLI will try to read the client ID and client secret from environment variables called `CLIENT_ID` and `CLIENT_SECRET` respectively. If these flags are not provided and the environment variables are not set, the CLI will fail. - -Alternatively, the `--access-token` option can be used instead of `--client-id` and `--client-secret`, but both authentication methods cannot be used simultaneously. - -Additionally, the `--profile` option can be used to specify a saved profile instead of providing authentication details directly. If `--profile` is not provided, the CLI will check the `CLIENT_PROFILE` environment variable. Profiles store authentication details and connection settings, simplifying repeated CLI usage. - -If the `--host` and `--port` options are not provided, the CLI will default to communicating with `localhost:8181`. - -Alternatively, the `--base-url` option can be used instead of `--host` and `--port`, but both options cannot be used simultaneously. This allows specifying arbitrary Polaris URLs, including HTTPS ones, that have additional base prefixes before the `/api/*/v1` subpaths. - -### PATH - -These examples assume the Polaris CLI is on the PATH and so can be invoked just by the command `polaris`. You can add the CLI to your PATH environment variable with a command like the following: - -``` -export PATH="~/polaris:$PATH" -``` - -Alternatively, you can run the CLI by providing a path to it, such as with the following invocation: - -``` -~/polaris principals list -``` - -## Commands - -Each of the commands `catalogs`, `principals`, `principal-roles`, `catalog-roles`, and `privileges` is used to manage a different type of entity within Polaris. - -In addition to these, the `profiles` command is available for managing stored authentication profiles, allowing login credentials to be configured for reuse. This provides an alternative to passing authentication details with every command. - -To find details on the options that can be provided to a particular command or subcommand ad-hoc, you may wish to use the `--help` flag. For example: - -``` -polaris catalogs --help -polaris principals create --help -polaris profiles --help -``` - -### catalogs - -The `catalogs` command is used to create, discover, and otherwise manage catalogs within Polaris. - -`catalogs` supports the following subcommands: - -1. create -2. delete -3. get -4. list -5. update - -#### create - -The `create` subcommand is used to create a catalog. - -``` -input: polaris catalogs create --help -options: - create - Named arguments: - --type The type of catalog to create in [INTERNAL, EXTERNAL]. INTERNAL by default. - --storage-type (Required) The type of storage to use for the catalog - --default-base-location (Required) Default base location of the catalog - --allowed-location An allowed location for files tracked by the catalog. Multiple locations can be provided by specifying this option more than once. - --role-arn (Required for S3) A role ARN to use when connecting to S3 - --external-id (Only for S3) The external ID to use when connecting to S3 - --tenant-id (Required for Azure) A tenant ID to use when connecting to Azure Storage - --multi-tenant-app-name (Only for Azure) The app name to use when connecting to Azure Storage - --consent-url (Only for Azure) A consent URL granting permissions for the Azure Storage location - --service-account (Only for GCS) The service account to use when connecting to GCS - --property A key/value pair such as: tag=value. Multiple can be provided by specifying this option more than once - Positional arguments: - catalog -``` - -##### Examples - -``` -polaris catalogs create \ - --storage-type s3 \ - --default-base-location s3://example-bucket/my_data \ - --role-arn ${ROLE_ARN} \ - my_catalog - -polaris catalogs create \ - --storage-type s3 \ - --default-base-location s3://example-bucket/my_other_data \ - --allowed-location s3://example-bucket/second_location \ - --allowed-location s3://other-bucket/third_location \ - --role-arn ${ROLE_ARN} \ - my_other_catalog - -polaris catalogs create \ - --storage-type file \ - --default-base-location file:///example/tmp \ - quickstart_catalog -``` - -#### delete - -The `delete` subcommand is used to delete a catalog. - -``` -input: polaris catalogs delete --help -options: - delete - Positional arguments: - catalog -``` - -##### Examples - -``` -polaris catalogs delete some_catalog -``` - -#### get - -The `get` subcommand is used to retrieve details about a catalog. - -``` -input: polaris catalogs get --help -options: - get - Positional arguments: - catalog -``` - -##### Examples - -``` -polaris catalogs get some_catalog - -polaris catalogs get another_catalog -``` - -#### list - -The `list` subcommand is used to show details about all catalogs, or those that a certain principal role has access to. The principal used to perform this operation must have the `CATALOG_LIST` privilege. - -``` -input: polaris catalogs list --help -options: - list - Named arguments: - --principal-role The name of a principal role -``` - -##### Examples - -``` -polaris catalogs list - -polaris catalogs list --principal-role some_user -``` - -#### update - -The `update` subcommand is used to update a catalog. Currently, this command supports changing the properties of a catalog or updating its storage configuration. - -``` -input: polaris catalogs update --help -options: - update - Named arguments: - --default-base-location (Required) Default base location of the catalog - --allowed-location An allowed location for files tracked by the catalog. Multiple locations can be provided by specifying this option more than once. - --property A key/value pair such as: tag=value. Multiple can be provided by specifying this option more than once - Positional arguments: - catalog -``` - -##### Examples - -``` -polaris catalogs update --property tag=new_value my_catalog - -polaris catalogs update --default-base-location s3://new-bucket/my_data my_catalog -``` - -### Principals - -The `principals` command is used to manage principals within Polaris. - -`principals` supports the following subcommands: - -1. create -2. delete -3. get -4. list -5. rotate-credentials -6. update -7. access - -#### create - -The `create` subcommand is used to create a new principal. - -``` -input: polaris principals create --help -options: - create - Named arguments: - --type The type of principal to create in [SERVICE] - --property A key/value pair such as: tag=value. Multiple can be provided by specifying this option more than once - Positional arguments: - principal -``` - -##### Examples - -``` -polaris principals create some_user - -polaris principals create --client-id ${CLIENT_ID} --property admin=true some_admin_user -``` - -#### delete - -The `delete` subcommand is used to delete a principal. - -``` -input: polaris principals delete --help -options: - delete - Positional arguments: - principal -``` - -##### Examples - -``` -polaris principals delete some_user - -polaris principals delete some_admin_user -``` - -#### get - -The `get` subcommand retrieves details about a principal. - -``` -input: polaris principals get --help -options: - get - Positional arguments: - principal -``` - -##### Examples - -``` -polaris principals get some_user - -polaris principals get some_admin_user -``` - -#### list - -The `list` subcommand shows details about all principals. - -##### Examples - -``` -polaris principals list -``` - -#### rotate-credentials - -The `rotate-credentials` subcommand is used to update the credentials used by a principal. After this command runs successfully, the new credentials will be printed to stdout. - -``` -input: polaris principals rotate-credentials --help -options: - rotate-credentials - Positional arguments: - principal -``` - -##### Examples - -``` -polaris principals rotate-credentials some_user - -polaris principals rotate-credentials some_admin_user -``` - -#### update - -The `update` subcommand is used to update a principal. Currently, this supports rewriting the properties associated with a principal. - -``` -input: polaris principals update --help -options: - update - Named arguments: - --property A key/value pair such as: tag=value. Multiple can be provided by specifying this option more than once - Positional arguments: - principal -``` - -##### Examples - -``` -polaris principals update --property key=value --property other_key=other_value some_user - -polaris principals update --property are_other_keys_removed=yes some_user -``` - -#### access - -The `access` subcommand retrieves entities relation about a principal. - -``` -input: polaris principals access --help -options: - access - Positional arguments: - principal -``` - -##### Examples - -``` -polaris principals access quickstart_user -``` - -### Principal Roles - -The `principal-roles` command is used to create, discover, and manage principal roles within Polaris. Additionally, this command can identify principals or catalog roles associated with a principal role, and can be used to grant a principal role to a principal. - -`principal-roles` supports the following subcommands: - -1. create -2. delete -3. get -4. list -5. update -6. grant -7. revoke - -#### create - -The `create` subcommand is used to create a new principal role. - -``` -input: polaris principal-roles create --help -options: - create - Named arguments: - --property A key/value pair such as: tag=value. Multiple can be provided by specifying this option more than once - Positional arguments: - principal_role -``` - -##### Examples - -``` -polaris principal-roles create data_engineer - -polaris principal-roles create --property key=value data_analyst -``` - -#### delete - -The `delete` subcommand is used to delete a principal role. - -``` -input: polaris principal-roles delete --help -options: - delete - Positional arguments: - principal_role -``` - -##### Examples - -``` -polaris principal-roles delete data_engineer - -polaris principal-roles delete data_analyst -``` - -#### get - -The `get` subcommand retrieves details about a principal role. - -``` -input: polaris principal-roles get --help -options: - get - Positional arguments: - principal_role -``` - -##### Examples - -``` -polaris principal-roles get data_engineer - -polaris principal-roles get data_analyst -``` - -#### list - -The list subcommand is used to print out all principal roles or, alternatively, to list all principal roles associated with a given principal or with a given catalog role. - -``` -input: polaris principal-roles list --help -options: - list - Named arguments: - --catalog-role The name of a catalog role. If provided, show only principal roles assigned to this catalog role. - --principal The name of a principal. If provided, show only principal roles assigned to this principal. -``` - -##### Examples - -``` -polaris principal-roles list - -polaris principal-roles --principal d.knuth - -polaris principal-roles --catalog-role super_secret_data -``` - -#### update - -The `update` subcommand is used to update a principal role. Currently, this supports updating the properties tied to a principal role. - -``` -input: polaris principal-roles update --help -options: - update - Named arguments: - --property A key/value pair such as: tag=value. Multiple can be provided by specifying this option more than once - Positional arguments: - principal_role -``` - -##### Examples - -``` -polaris principal-roles update --property key=value2 data_engineer - -polaris principal-roles update data_analyst --property key=value3 -``` - -#### grant - -The `grant` subcommand is used to grant a principal role to a principal. - -``` -input: polaris principal-roles grant --help -options: - grant - Named arguments: - --principal A principal to grant this principal role to - Positional arguments: - principal_role -``` - -##### Examples - -``` -polaris principal-roles grant --principal d.knuth data_engineer - -polaris principal-roles grant data_scientist --principal a.ng -``` - -#### revoke - -The `revoke` subcommand is used to revoke a principal role from a principal. - -``` -input: polaris principal-roles revoke --help -options: - revoke - Named arguments: - --principal A principal to revoke this principal role from - Positional arguments: - principal_role -``` - -##### Examples - -``` -polaris principal-roles revoke --principal former.employee data_engineer - -polaris principal-roles revoke data_scientist --principal changed.role -``` - -### Catalog Roles - -The catalog-roles command is used to create, discover, and manage catalog roles within Polaris. Additionally, this command can be used to grant a catalog role to a principal role. - -`catalog-roles` supports the following subcommands: - -1. create -2. delete -3. get -4. list -5. update -6. grant -7. revoke - -#### create - -The `create` subcommand is used to create a new catalog role. - -``` -input: polaris catalog-roles create --help -options: - create - Named arguments: - --catalog The name of an existing catalog - --property A key/value pair such as: tag=value. Multiple can be provided by specifying this option more than once - Positional arguments: - catalog_role -``` - -##### Examples - -``` -polaris catalog-roles create --property key=value --catalog some_catalog sales_data - -polaris catalog-roles create --catalog other_catalog sales_data -``` - -#### delete - -The `delete` subcommand is used to delete a catalog role. - -``` -input: polaris catalog-roles delete --help -options: - delete - Named arguments: - --catalog The name of an existing catalog - Positional arguments: - catalog_role -``` - -##### Examples - -``` -polaris catalog-roles delete --catalog some_catalog sales_data - -polaris catalog-roles delete --catalog other_catalog sales_data -``` - -#### get - -The `get` subcommand retrieves details about a catalog role. - -``` -input: polaris catalog-roles get --help -options: - get - Named arguments: - --catalog The name of an existing catalog - Positional arguments: - catalog_role -``` - -##### Examples - -``` -polaris catalog-roles get --catalog some_catalog inventory_data - -polaris catalog-roles get --catalog other_catalog inventory_data -``` - -#### list - -The `list` subcommand is used to print all catalog roles. Alternatively, if a principal role is provided, only catalog roles associated with that principal are shown. - -``` -input: polaris catalog-roles list --help -options: - list - Named arguments: - --principal-role The name of a principal role - Positional arguments: - catalog -``` - -##### Examples - -``` -polaris catalog-roles list - -polaris catalog-roles list --principal-role data_engineer -``` - -#### update - -The `update` subcommand is used to update a catalog role. Currently, only updating properties associated with the catalog role is supported. - -``` -input: polaris catalog-roles update --help -options: - update - Named arguments: - --catalog The name of an existing catalog - --property A key/value pair such as: tag=value. Multiple can be provided by specifying this option more than once - Positional arguments: - catalog_role -``` - -##### Examples - -``` -polaris catalog-roles update --property contains_pii=true --catalog some_catalog sales_data - -polaris catalog-roles update sales_data --catalog some_catalog --property key=value -``` - -#### grant - -The `grant` subcommand is used to grant a catalog role to a principal role. - -``` -input: polaris catalog-roles grant --help -options: - grant - Named arguments: - --catalog The name of an existing catalog - --principal-role The name of a catalog role - Positional arguments: - catalog_role -``` - -##### Examples - -``` -polaris catalog-roles grant sensitive_data --catalog some_catalog --principal-role power_user - -polaris catalog-roles grant --catalog sales_data contains_cc_info_catalog_role --principal-role financial_analyst_role -``` - -#### revoke - -The `revoke` subcommand is used to revoke a catalog role from a principal role. - -``` -input: polaris catalog-roles revoke --help -options: - revoke - Named arguments: - --catalog The name of an existing catalog - --principal-role The name of a catalog role - Positional arguments: - catalog_role -``` - -##### Examples - -``` -polaris catalog-roles revoke sensitive_data --catalog some_catalog --principal-role power_user - -polaris catalog-roles revoke --catalog sales_data contains_cc_info_catalog_role --principal-role financial_analyst_role -``` - -### Namespaces - -The `namespaces` command is used to manage namespaces within Polaris. - -`namespaces` supports the following subcommands: - -1. create -2. delete -3. get -4. list - -#### create - -The `create` subcommand is used to create a new namespace. - -When creating a namespace with an explicit location, that location must reside within the parent catalog or namespace. - -``` -input: polaris namespaces create --help -options: - create - Named arguments: - --catalog The name of an existing catalog - --location If specified, the location at which to store the namespace and entities inside it - --property A key/value pair such as: tag=value. Multiple can be provided by specifying this option more than once - Positional arguments: - namespace -``` - -##### Examples - -``` -polaris namespaces create --catalog my_catalog outer - -polaris namespaces create --catalog my_catalog --location 's3://bucket/outer/inner_SUFFIX' outer.inner -``` - -#### delete - -The `delete` subcommand is used to delete a namespace. - -``` -input: polaris namespaces delete --help -options: - delete - Named arguments: - --catalog The name of an existing catalog - Positional arguments: - namespace -``` - -##### Examples - -``` -polaris namespaces delete outer_namespace.inner_namespace --catalog my_catalog - -polaris namespaces delete --catalog my_catalog outer_namespace -``` - -#### get - -The `get` subcommand retrieves details about a namespace. - -``` -input: polaris namespaces get --help -options: - get - Named arguments: - --catalog The name of an existing catalog - Positional arguments: - namespace -``` - -##### Examples - -``` -polaris namespaces get --catalog some_catalog a.b - -polaris namespaces get a.b.c --catalog some_catalog -``` - -#### list - -The `list` subcommand shows details about all namespaces directly within a catalog or, optionally, within some parent prefix in that catalog. - -``` -input: polaris namespaces list --help -options: - list - Named arguments: - --catalog The name of an existing catalog - --parent If specified, list namespaces inside this parent namespace -``` - -##### Examples - -``` -polaris namespaces list --catalog my_catalog - -polaris namespaces list --catalog my_catalog --parent a - -polaris namespaces list --catalog my_catalog --parent a.b -``` - -### Privileges - -The `privileges` command is used to grant various privileges to a catalog role, or to revoke those privileges. Privileges can be on the level of a catalog, a namespace, a table, or a view. For more information on privileges, please refer to the [docs]({{% ref "entities#privilege" %}}). - -Note that when using the `privileges` command, the user specifies the relevant catalog and catalog role before selecting a subcommand. - -`privileges` supports the following subcommands: - -1. list -2. catalog -3. namespace -4. table -5. view - -Each of these subcommands, except `list`, supports the `grant` and `revoke` actions and requires an action to be specified. - -Note that each subcommand's `revoke` action always accepts the same options that the corresponding `grant` action does, but with the addition of the `cascade` option. `cascade` is used to revoke all other privileges that depend on the specified privilege. - -#### list - -The `list` subcommand shows details about all privileges for a catalog role. - -``` -input: polaris privileges list --help -options: - list - Named arguments: - --catalog The name of an existing catalog - --catalog-role The name of a catalog role -``` - -##### Examples - -``` -polaris privileges list --catalog my_catalog --catalog-role my_role - -polaris privileges my_role list --catalog-role my_other_role --catalog my_catalog -``` - -#### catalog - -The `catalog` subcommand manages privileges at the catalog level. `grant` is used to grant catalog privileges to the specified catalog role, and `revoke` is used to revoke them. - -``` -input: polaris privileges catalog --help -options: - catalog - grant - Named arguments: - --catalog The name of an existing catalog - --catalog-role The name of a catalog role - Positional arguments: - privilege - revoke - Named arguments: - --cascade When revoking privileges, additionally revoke privileges that depend on the specified privilege - --catalog The name of an existing catalog - --catalog-role The name of a catalog role - Positional arguments: - privilege -``` - -##### Examples - -``` -polaris privileges \ - catalog \ - grant \ - --catalog my_catalog \ - --catalog-role catalog_role \ - TABLE_CREATE - -polaris privileges \ - catalog \ - revoke \ - --catalog my_catalog \ - --catalog-role catalog_role \ - --cascade \ - TABLE_CREATE -``` - -#### namespace - -The `namespace` subcommand manages privileges at the namespace level. - -``` -input: polaris privileges namespace --help -options: - namespace - grant - Named arguments: - --namespace A period-delimited namespace - --catalog The name of an existing catalog - --catalog-role The name of a catalog role - Positional arguments: - privilege - revoke - Named arguments: - --namespace A period-delimited namespace - --cascade When revoking privileges, additionally revoke privileges that depend on the specified privilege - --catalog The name of an existing catalog - --catalog-role The name of a catalog role - Positional arguments: - privilege -``` - -##### Examples - -``` -polaris privileges \ - namespace \ - grant \ - --catalog my_catalog \ - --catalog-role catalog_role \ - --namespace a.b \ - TABLE_LIST - -polaris privileges \ - namespace \ - revoke \ - --catalog my_catalog \ - --catalog-role catalog_role \ - --namespace a.b \ - TABLE_LIST -``` - -#### table - -The `table` subcommand manages privileges at the table level. - -``` -input: polaris privileges table --help -options: - table - grant - Named arguments: - --namespace A period-delimited namespace - --table The name of a table - --catalog The name of an existing catalog - --catalog-role The name of a catalog role - Positional arguments: - privilege - revoke - Named arguments: - --namespace A period-delimited namespace - --table The name of a table - --cascade When revoking privileges, additionally revoke privileges that depend on the specified privilege - --catalog The name of an existing catalog - --catalog-role The name of a catalog role - Positional arguments: - privilege -``` - -##### Examples - -``` -polaris privileges \ - table \ - grant \ - --catalog my_catalog \ - --catalog-role catalog_role \ - --namespace a.b \ - --table t \ - TABLE_DROP - -polaris privileges \ - table \ - grant \ - --catalog my_catalog \ - --catalog-role catalog_role \ - --namespace a.b \ - --table t \ - --cascade \ - TABLE_DROP -``` - -#### view - -The `view` subcommand manages privileges at the view level. - -``` -input: polaris privileges view --help -options: - view - grant - Named arguments: - --namespace A period-delimited namespace - --view The name of a view - --catalog The name of an existing catalog - --catalog-role The name of a catalog role - Positional arguments: - privilege - revoke - Named arguments: - --namespace A period-delimited namespace - --view The name of a view - --cascade When revoking privileges, additionally revoke privileges that depend on the specified privilege - --catalog The name of an existing catalog - --catalog-role The name of a catalog role - Positional arguments: - privilege -``` - -##### Examples - -``` -polaris privileges \ - view \ - grant \ - --catalog my_catalog \ - --catalog-role catalog_role \ - --namespace a.b.c \ - --view v \ - VIEW_FULL_METADATA - -polaris privileges \ - view \ - grant \ - --catalog my_catalog \ - --catalog-role catalog_role \ - --namespace a.b.c \ - --view v \ - --cascade \ - VIEW_FULL_METADATA -``` - -### profiles - -The `profiles` command is used to manage stored authentication profiles in Polaris. Profiles allow authentication credentials to be saved and reused, eliminating the need to pass credentials with every command. - -`profiles` supports the following subcommands: - -1. create -2. delete -3. get -4. list -5. update - -#### create - -The `create` subcommand is used to create a new authentication profile. - -``` -input: polaris profiles create --help -options: - create - Positional arguments: - profile -``` - -##### Examples - -``` -polaris profiles create dev -``` - -#### delete - -The `delete` subcommand removes a stored profile. - -``` -input: polaris profiles delete --help -options: - delete - Positional arguments: - profile -``` - -##### Examples - -``` -polaris profiles delete dev -``` - -#### get - -The `get` subcommand removes a stored profile. - -``` -input: polaris profiles get --help -options: - get - Positional arguments: - profile -``` - -##### Examples - -``` -polaris profiles get dev -``` - -#### list - -The `list` subcommand displays all stored profiles. - -``` -input: polaris profiles list --help -options: - list -``` - -##### Examples - -``` -polaris profiles list -``` - -#### update - -The `update` subcommand modifies an existing profile. - -``` -input: polaris profiles update --help -options: - update - Positional arguments: - profile -``` - -##### Examples - -``` -polaris profiles update dev -``` - -## Examples - -This section outlines example code for a few common operations as well as for some more complex ones. - -For especially complex operations, you may wish to instead directly use the Python API. - -### Creating a principal and a catalog - -``` -polaris principals create my_user - -polaris catalogs create \ - --type internal \ - --storage-type s3 \ - --default-base-location s3://iceberg-bucket/polaris-base \ - --role-arn arn:aws:iam::111122223333:role/ExampleCorpRole \ - --allowed-location s3://iceberg-bucket/polaris-alt-location-1 \ - --allowed-location s3://iceberg-bucket/polaris-alt-location-2 \ - my_catalog -``` - -### Granting a principal the ability to manage the content of a catalog - -``` -polaris principal-roles create power_user -polaris principal-roles grant --principal my_user power_user - -polaris catalog-roles create --catalog my_catalog my_catalog_role -polaris catalog-roles grant \ - --catalog my_catalog \ - --principal-role power_user \ - my_catalog_role - -polaris privileges \ - catalog \ - --catalog my_catalog \ - --catalog-role my_catalog_role \ - grant \ - CATALOG_MANAGE_CONTENT -``` - -### Identifying the tables a given principal has been granted explicit access to read - -_Note that some other privileges, such as `CATALOG_MANAGE_CONTENT`, subsume `TABLE_READ_DATA` and would not be discovered here._ - -``` -principal_roles=$(polaris principal-roles list --principal my_principal) -for principal_role in ${principal_roles}; do - catalog_roles=$(polaris catalog-roles --list --principal-role "${principal_role}") - for catalog_role in ${catalog_roles}; do - grants=$(polaris privileges list --catalog-role "${catalog_role}" --catalog "${catalog}") - for grant in $(echo "${grants}" | jq -c '.[] | select(.privilege == "TABLE_READ_DATA")'); do - echo "${grant}" - fi - done - done -done -``` - - diff --git a/1.0.0/configuration.md b/1.0.0/configuration.md deleted file mode 100644 index 95d77230f9..0000000000 --- a/1.0.0/configuration.md +++ /dev/null @@ -1,187 +0,0 @@ ---- -# -# Licensed to the Apache Software Foundation (ASF) under one -# or more contributor license agreements. See the NOTICE file -# distributed with this work for additional information -# regarding copyright ownership. The ASF licenses this file -# to you under the Apache License, Version 2.0 (the -# "License"); you may not use this file except in compliance -# with the License. You may obtain a copy of the License at -# -# http://www.apache.org/licenses/LICENSE-2.0 -# -# Unless required by applicable law or agreed to in writing, -# software distributed under the License is distributed on an -# "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY -# KIND, either express or implied. See the License for the -# specific language governing permissions and limitations -# under the License. -# -title: Configuring Polaris -type: docs -weight: 550 ---- - -## Overview - -This page provides information on how to configure Apache Polaris (Incubating). Unless stated -otherwise, this information is valid both for Polaris Docker images (and Kubernetes deployments) as -well as for Polaris binary distributions. - -> Note: for Production tips and best practices, refer to [Configuring Polaris for Production]({{% ref "configuring-polaris-for-production.md" %}}). - -First off, Polaris server runs on Quarkus, and uses its configuration mechanisms. Read Quarkus -[configuration guide](https://quarkus.io/guides/config) to get familiar with the basics. - -Quarkus aggregates configuration properties from multiple sources, applying them in a specific order -of precedence. When a property is defined in multiple sources, the value from the source with the -higher priority overrides those from lower-priority sources. - -The sources are listed below, from highest to lowest priority: - -1. System properties: properties set via the Java command line using `-Dproperty.name=value`. -2. Environment variables (see below for important details). -3. Settings in `$PWD/config/application.properties` file. -4. The `application.properties` files packaged in Polaris. -5. Default values: hardcoded defaults within the application. - -When using environment variables, there are two naming conventions: - -1. If possible, just use the property name as the environment variable name. This works fine in most - cases, e.g. in Kubernetes deployments. For example, `polaris.realm-context.realms` can be - included as is in a container YAML definition: - ```yaml - env: - - name: "polaris.realm-context.realms" - value: "realm1,realm2" - ``` - -2. If running from a script or shell prompt, however, stricter naming rules apply: variable names - can consist solely of uppercase letters, digits, and the `_` (underscore) sign. In such - situations, the environment variable name must be derived from the property name, by using - uppercase letters, and replacing all dots, dashes and quotes by underscores. For example, - `polaris.realm-context.realms` becomes `POLARIS_REALM_CONTEXT_REALMS`. See - [here](https://smallrye.io/smallrye-config/Main/config/environment-variables/) for more details. - -> [!IMPORTANT] -> While convenient, uppercase-only environment variables can be problematic for complex property -> names. In these situations, it's preferable to use system properties or a configuration file. - -As stated above, a configuration file can also be provided at runtime; it should be available -(mounted) at `$PWD/config/application.properties` for Polaris server to recognize it. In Polaris -official Docker images, this location is `/deployment/config/application.properties`. - -For Kubernetes deployments, the configuration file is typically defined as a `ConfigMap`, then -mounted in the container at `/deployment/config/application.properties`. It can be mounted in -read-only mode, as Polaris only reads the configuration file once, at startup. - -## Polaris Configuration Options Reference - -| Configuration Property | Default Value | Description | -|----------------------------------------------------------------------------------------|-----------------------|--------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------| -| `polaris.persistence.type` | `relational-jdbc` | Define the persistence backend used by Polaris (`in-memory`, `relational-jdbc`, `eclipse-link` (deprecated)). See [Configuring Apache Polaris for Production)[{{% ref "configuring-polaris-for-production.md" %}}) | -| `polaris.persistence.relational.jdbc.max-retries` | `1` | Total number of retries JDBC persistence will attempt on connection resets or serialization failures before giving up. | -| `polaris.persistence.relational.jdbc.max_duaration_in_ms` | `5000 ms` | Max time interval (ms) since the start of a transaction when retries can be attempted. | -| `polaris.persistence.relational.jdbc.initial_delay_in_ms` | `100 ms` | Initial delay before retrying. The delay is doubled after each retry. | -| `polaris.persistence.eclipselink.configurationFile` | | Define the location of the `persistence.xml`. By default, it's the built-in `persistence.xml` in use. | -| `polaris.persistence.eclipselink.persistenceUnit` | `polaris` | Define the name of the persistence unit to use, as defined in the `persistence.xml`. | -| `polaris.realm-context.type` | `default` | Define the type of the Polaris realm to use. | -| `polaris.realm-context.realms` | `POLARIS` | Define the list of realms to use. | -| `polaris.realm-context.header-name` | `Polaris-Realm` | Define the header name defining the realm context. | -| `polaris.features."ENFORCE_PRINCIPAL_CREDENTIAL_ROTATION_REQUIRED_CHECKING"` | `false` | Flag to enforce check if credential rotation. | -| `polaris.features."SUPPORTED_CATALOG_STORAGE_TYPES"` | `FILE` | Define the catalog supported storage. Supported values are `S3`, `GCS`, `AZURE`, `FILE`. | -| `polaris.features.realm-overrides."my-realm"."SKIP_CREDENTIAL_SUBSCOPING_INDIRECTION"` | `true` | "Override" realm features, here the skip credential subscoping indirection flag. | -| `polaris.authentication.authenticator.type` | `default` | Define the Polaris authenticator type. | -| `polaris.authentication.token-service.type` | `default` | Define the Polaris token service type. | -| `polaris.authentication.token-broker.type` | `rsa-key-pair` | Define the Polaris token broker type. Also configure the location of the key files. For RSA: if the locations of the key files are not configured, an ephemeral key-pair will be created on each Polaris server instance startup, which breaks existing tokens after server restarts and is also incompatible with running multiple Polaris server instances. | -| `polaris.authentication.token-broker.max-token-generation` | `PT1H` | Define the max token generation policy on the token broker. | -| `polaris.authentication.token-broker.rsa-key-pair.private-key-file` | | Define the location of the RSA-256 private key file, if present the `public-key` file must be specified, too. | -| `polaris.authentication.token-broker.rsa-key-pair.public-key-file` | | Define the location of the RSA-256 public key file, if present the `private-key` file must be specified, too. | -| `polaris.authentication.token-broker.symmetric-key.secret` | `secret` | Define the secret of the symmetric key. | -| `polaris.authentication.token-broker.symmetric-key.file` | `/tmp/symmetric.key` | Define the location of the symmetric key file. | -| `polaris.storage.aws.access-key` | `accessKey` | Define the AWS S3 access key. If unset, the default credential provider chain will be used. | -| `polaris.storage.aws.secret-key` | `secretKey` | Define the AWS S3 secret key. If unset, the default credential provider chain will be used. | -| `polaris.storage.gcp.token` | `token` | Define the Google Cloud Storage token. If unset, the default credential provider chain will be used. | -| `polaris.storage.gcp.lifespan` | `PT1H` | Define the Google Cloud Storage lifespan type. If unset, the default credential provider chain will be used. | -| `polaris.log.request-id-header-name` | `Polaris-Request-Id` | Define the header name to match request ID in the log. | -| `polaris.log.mdc.aid` | `polaris` | Define the log context (e.g. MDC) AID. | -| `polaris.log.mdc.sid` | `polaris-service` | Define the log context (e.g. MDC) SID. | -| `polaris.rate-limiter.filter.type` | `no-op` | Define the Polaris rate limiter. Supported values are `no-op`, `token-bucket`. | -| `polaris.rate-limiter.token-bucket.type` | `default` | Define the token bucket rate limiter. | -| `polaris.rate-limiter.token-bucket.requests-per-second` | `9999` | Define the number of requests per second for the token bucket rate limiter. | -| `polaris.rate-limiter.token-bucket.window` | `PT10S` | Define the window type for the token bucket rate limiter. | -| `polaris.metrics.tags.=` | `application=Polaris` | Define arbitrary metric tags to include in every request. | -| `polaris.metrics.realm-id-tag.api-metrics-enabled` | `false` | Whether to enable the `realm_id` metric tag in API metrics. | -| `polaris.metrics.realm-id-tag.http-metrics-enabled` | `false` | Whether to enable the `realm_id` metric tag in HTTP request metrics. | -| `polaris.metrics.realm-id-tag.http-metrics-max-cardinality` | `100` | The maximum cardinality for the `realm_id` tag in HTTP request metrics. | -| `polaris.tasks.max-concurrent-tasks` | `100` | Define the max number of concurrent tasks. | -| `polaris.tasks.max-queued-tasks` | `1000` | Define the max number of tasks in queue. | - -There are non Polaris configuration properties that can be useful: - -| Configuration Property | Default Value | Description | -|------------------------------------------------------|---------------------------------|-----------------------------------------------------------------------------| -| `quarkus.log.level` | `INFO` | Define the root log level. | -| `quarkus.log.category."org.apache.polaris".level` | | Define the log level for a specific category. | -| `quarkus.default-locale` | System locale | Force the use of a specific locale, for instance `en_US`. | -| `quarkus.http.port` | `8181` | Define the HTTP port number. | -| `quarkus.http.auth.basic` | `false` | Enable the HTTP basic authentication. | -| `quarkus.http.limits.max-body-size` | `10240K` | Define the HTTP max body size limit. | -| `quarkus.http.cors.origins` | | Define the HTTP CORS origins. | -| `quarkus.http.cors.methods` | `PATCH, POST, DELETE, GET, PUT` | Define the HTTP CORS covered methods. | -| `quarkus.http.cors.headers` | `*` | Define the HTTP CORS covered headers. | -| `quarkus.http.cors.exposed-headers` | `*` | Define the HTTP CORS covered exposed headers. | -| `quarkus.http.cors.access-control-max-age` | `PT10M` | Define the HTTP CORS access control max age. | -| `quarkus.http.cors.access-control-allow-credentials` | `true` | Define the HTTP CORS access control allow credentials flag. | -| `quarkus.management.enabled` | `true` | Enable the management server. | -| `quarkus.management.port` | `8182` | Define the port number of the Polaris management server. | -| `quarkus.management.root-path` | | Define the root path where `/metrics` and `/health` endpoints are based on. | -| `quarkus.otel.sdk.disabled` | `true` | Enable the OpenTelemetry layer. | - -> Note: This section is only relevant for Polaris Docker images and Kubernetes deployments. - -There are many other actionable environment variables available in the official Polaris Docker -image; they come from the base image used by Polaris, [ubi9/openjdk-21-runtime]. They should be used -to fine-tune the Java runtime directly, e.g. to enable debugging or to set the heap size. These -variables are not specific to Polaris, but are inherited from the base image. If in doubt, leave -everything at its default! - -[ubi9/openjdk-21-runtime]: https://catalog.redhat.com/software/containers/ubi9/openjdk-21-runtime/6501ce769a0d86945c422d5f - -| Environment variable | Description | -|----------------------------------|----------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------| -| `JAVA_OPTS` or `JAVA_OPTIONS` | **NOT RECOMMENDED**. JVM options passed to the `java` command (example: "-verbose:class"). Setting this variable will override all options set by any of the other variables in this table. To pass extra settings, use `JAVA_OPTS_APPEND` instead. | -| `JAVA_OPTS_APPEND` | User specified Java options to be appended to generated options in `JAVA_OPTS` (example: "-Dsome.property=foo"). | -| `JAVA_TOOL_OPTIONS` | This variable is defined and honored by all OpenJDK distros, see [here](https://bugs.openjdk.org/browse/JDK-4971166). Options defined here take precedence over all else; using this variable is generally not necessary, but can be useful e.g. to enforce JVM startup parameters, to set up remote debug, or to define JVM agents. | -| `JAVA_MAX_MEM_RATIO` | Is used to calculate a default maximal heap memory based on a containers restriction. If used in a container without any memory constraints for the container then this option has no effect. If there is a memory constraint then `-XX:MaxRAMPercentage` is set to a ratio of the container available memory as set here. The default is `80` which means 80% of the available memory is used as an upper boundary. You can skip this mechanism by setting this value to `0` in which case no `-XX:MaxRAMPercentage` option is added. | -| `JAVA_DEBUG` | If set remote debugging will be switched on. Disabled by default (example: true"). | -| `JAVA_DEBUG_PORT` | Port used for remote debugging. Defaults to "5005" (tip: use "*:5005" to enable debugging on all network interfaces). | -| `GC_MIN_HEAP_FREE_RATIO` | Minimum percentage of heap free after GC to avoid expansion. Default is 10. | -| `GC_MAX_HEAP_FREE_RATIO` | Maximum percentage of heap free after GC to avoid shrinking. Default is 20. | -| `GC_TIME_RATIO` | Specifies the ratio of the time spent outside the garbage collection. Default is 4. | -| `GC_ADAPTIVE_SIZE_POLICY_WEIGHT` | The weighting given to the current GC time versus previous GC times. Default is 90. | -| `GC_METASPACE_SIZE` | The initial metaspace size. There is no default (example: "20"). | -| `GC_MAX_METASPACE_SIZE` | The maximum metaspace size. There is no default (example: "100"). | -| `GC_CONTAINER_OPTIONS` | Specify Java GC to use. The value of this variable should contain the necessary JRE command-line options to specify the required GC, which will override the default of `-XX:+UseParallelGC` (example: `-XX:+UseG1GC`). | -Here are some examples: - -| Example | `docker run` option | -|--------------------------------------------|---------------------------------------------------------------------------------------------------------------------| -| Using another GC | `-e GC_CONTAINER_OPTIONS="-XX:+UseShenandoahGC"` lets Polaris use Shenandoah GC instead of the default parallel GC. | -| Set the Java heap size to a _fixed_ amount | `-e JAVA_OPTS_APPEND="-Xms8g -Xmx8g"` lets Polaris use a Java heap of 8g. | -| Set the maximum heap percentage | `-e JAVA_MAX_MEM_RATIO="70"` lets Polaris use 70% percent of the available memory. | - - -## Troubleshooting Configuration Issues - -If you encounter issues with the configuration, you can ask Polaris to print out the configuration it -is using. To do this, set the log level for the `io.smallrye.config` category to `DEBUG`, and also -set the console appender level to `DEBUG`: - -```properties -quarkus.log.console.level=DEBUG -quarkus.log.category."io.smallrye.config".level=DEBUG -``` - -> [!IMPORTANT] This will print out all configuration values, including sensitive ones like -> passwords. Don't do this in production, and don't share this output with anyone you don't trust! diff --git a/1.0.0/configuring-polaris-for-production.md b/1.0.0/configuring-polaris-for-production.md deleted file mode 100644 index fac51b40f9..0000000000 --- a/1.0.0/configuring-polaris-for-production.md +++ /dev/null @@ -1,222 +0,0 @@ ---- -# -# Licensed to the Apache Software Foundation (ASF) under one -# or more contributor license agreements. See the NOTICE file -# distributed with this work for additional information -# regarding copyright ownership. The ASF licenses this file -# to you under the Apache License, Version 2.0 (the -# "License"); you may not use this file except in compliance -# with the License. You may obtain a copy of the License at -# -# http://www.apache.org/licenses/LICENSE-2.0 -# -# Unless required by applicable law or agreed to in writing, -# software distributed under the License is distributed on an -# "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY -# KIND, either express or implied. See the License for the -# specific language governing permissions and limitations -# under the License. -# -title: Configuring Polaris for Production -linkTitle: Production Configuration -type: docs -weight: 600 ---- - -The default server configuration is intended for development and testing. When you deploy Polaris in production, -review and apply the following checklist: -- [ ] Configure OAuth2 keys -- [ ] Enforce realm header validation (`require-header=true`) -- [ ] Use a durable metastore (JDBC + PostgreSQL) -- [ ] Bootstrap valid realms in the metastore -- [ ] Disable local FILE storage - -### Configure OAuth2 - -Polaris authentication requires specifying a token broker factory type. Two implementations are -supported out of the box: - -- [rsa-key-pair] uses a pair of public and private keys; -- [symmetric-key] uses a shared secret. - -[rsa-key-pair]: https://github.com/apache/polaris/blob/390f1fa57bb1af24a21aa95fdbff49a46e31add7/service/common/src/main/java/org/apache/polaris/service/auth/JWTRSAKeyPairFactory.java -[symmetric-key]: https://github.com/apache/polaris/blob/390f1fa57bb1af24a21aa95fdbff49a46e31add7/service/common/src/main/java/org/apache/polaris/service/auth/JWTSymmetricKeyFactory.java - -By default, Polaris uses `rsa-key-pair`, with randomly generated keys. - -> [!IMPORTANT] -> The default `rsa-key-pair` configuration is not suitable when deploying many replicas of Polaris, -> as each replica will have its own set of keys. This will cause token validation to fail when a -> request is routed to a different replica than the one that issued the token. - -It is highly recommended to configure Polaris with previously-generated RSA keys. This can be done -by setting the following properties: - -```properties -polaris.authentication.token-broker.type=rsa-key-pair -polaris.authentication.token-broker.rsa-key-pair.public-key-file=/tmp/public.key -polaris.authentication.token-broker.rsa-key-pair.private-key-file=/tmp/private.key -``` - -To generate an RSA key pair, you can use the following commands: - -```shell -openssl genrsa -out private.key 2048 -openssl rsa -in private.key -pubout -out public.key -``` - -Alternatively, you can use a symmetric key by setting the following properties: - -```properties -polaris.authentication.token-broker.type=symmetric-key -polaris.authentication.token-broker.symmetric-key.file=/tmp/symmetric.key -``` - -Note: it is also possible to set the symmetric key secret directly in the configuration file. If -possible, pass the secret as an environment variable to avoid storing sensitive information in the -configuration file: - -```properties -polaris.authentication.token-broker.symmetric-key.secret=${POLARIS_SYMMETRIC_KEY_SECRET} -``` - -Finally, you can also configure the token broker to use a maximum lifespan by setting the following -property: - -```properties -polaris.authentication.token-broker.max-token-generation=PT1H -``` - -Typically, in Kubernetes, you would define the keys as a `Secret` and mount them as files in the -container. - -### Realm Context Resolver - -By default, Polaris resolves realms based on incoming request headers. You can configure the realm -context resolver by setting the following properties in `application.properties`: - -```properties -polaris.realm-context.realms=POLARIS,MY-REALM -polaris.realm-context.header-name=Polaris-Realm -``` - -Where: - -- `realms` is a comma-separated list of allowed realms. This setting _must_ be correctly configured. - At least one realm must be specified. -- `header-name` is the name of the header used to resolve the realm; by default, it is - `Polaris-Realm`. - -If a request contains the specified header, Polaris will use the realm specified in the header. If -the realm is not in the list of allowed realms, Polaris will return a `404 Not Found` response. - -If a request _does not_ contain the specified header, however, by default Polaris will use the first -realm in the list as the default realm. In the above example, `POLARIS` is the default realm and -would be used if the `Polaris-Realm` header is not present in the request. - -This is not recommended for production use, as it may lead to security vulnerabilities. To avoid -this, set the following property to `true`: - -```properties -polaris.realm-context.require-header=true -``` - -This will cause Polaris to also return a `404 Not Found` response if the realm header is not present -in the request. - -### Metastore Configuration - -A metastore should be configured with an implementation that durably persists Polaris entities. By -default, Polaris uses an in-memory metastore. - -> [!IMPORTANT] -> The default in-memory metastore is not suitable for production use, as it will lose all data -> when the server is restarted; it is also unusable when multiple Polaris replicas are used. - -To enable a durable metastore, configure your system to use the Relational JDBC-backed metastore. -This implementation leverages Quarkus for datasource management and supports configuration through -environment variables or JVM -D flags at startup. For more information, refer to the [Quarkus configuration reference](https://quarkus.io/guides/config-reference#env-file). - -Configure the metastore by setting the following ENV variables: - -``` -POLARIS_PERSISTENCE_TYPE=relational-jdbc - -QUARKUS_DATASOURCE_DB_KIND=postgresql -QUARKUS_DATASOURCE_USERNAME= -QUARKUS_DATASOURCE_PASSWORD= -QUARKUS_DATASOURCE_JDBC_URL= -``` - - -The relational JDBC metastore is a Quarkus-managed datasource and only supports Postgres and H2 as of now. -Please refer to the documentation here: -[Configure data sources in Quarkus](https://quarkus.io/guides/datasource) - -> [!IMPORTANT] -> Be sure to secure your metastore backend since it will be storing sensitive data and catalog -> metadata. - -Note: Polaris will always create schema 'polaris_schema' during bootstrap under the configured database. - -### Bootstrapping - -Before using Polaris, you must **bootstrap** the metastore. This is a manual operation that must be -performed **only once** for each realm in order to prepare the metastore to integrate with Polaris. - -By default, when bootstrapping a new realm, Polaris will create randomised `CLIENT_ID` and -`CLIENT_SECRET` for the `root` principal and store their hashes in the metastore backend. - -Depending on your database, this may not be convenient as the generated credentials are not stored -in clear text in the database. - -In order to provide your own credentials for `root` principal (so you can request tokens via -`api/catalog/v1/oauth/tokens`), use the [Polaris Admin Tool]({{% ref "admin-tool" %}}) - -You can verify the setup by attempting a token issue for the `root` principal: - -```bash -curl -X POST http://localhost:8181/api/catalog/v1/oauth/tokens \ - -d "grant_type=client_credentials" \ - -d "client_id=my-client-id" \ - -d "client_secret=my-client-secret" \ - -d "scope=PRINCIPAL_ROLE:ALL" -``` - -Which should return an access token: - -```json -{ - "access_token": "...", - "token_type": "bearer", - "issued_token_type": "urn:ietf:params:oauth:token-type:access_token", - "expires_in": 3600 -} -``` - -If you used a non-default realm name, add the appropriate request header to the `curl` command, -otherwise Polaris will resolve the realm to the first one in the configuration -`polaris.realm-context.realms`. Here is an example to set realm header: - -```bash -curl -X POST http://localhost:8181/api/catalog/v1/oauth/tokens \ - -H "Polaris-Realm: my-realm" \ - -d "grant_type=client_credentials" \ - -d "client_id=my-client-id" \ - -d "client_secret=my-client-secret" \ - -d "scope=PRINCIPAL_ROLE:ALL" -``` - -### Disable FILE Storage Type -By default, Polaris allows using the local file system (`FILE`) for catalog storage. This is fine for testing, -but **not recommended for production**. To disable it, set the supported storage types like this: -```hocon -polaris.features."SUPPORTED_CATALOG_STORAGE_TYPES" = [ "S3", "Azure" ] -``` -Leave out `FILE` to prevent its use. Only include the storage types your setup needs. - -### Upgrade Considerations - -The [Polaris Evolution](../evolution) page discusses backward compatibility and -upgrade concerns. - diff --git a/1.0.0/entities.md b/1.0.0/entities.md deleted file mode 100644 index 04d625bb94..0000000000 --- a/1.0.0/entities.md +++ /dev/null @@ -1,95 +0,0 @@ ---- -# -# Licensed to the Apache Software Foundation (ASF) under one -# or more contributor license agreements. See the NOTICE file -# distributed with this work for additional information -# regarding copyright ownership. The ASF licenses this file -# to you under the Apache License, Version 2.0 (the -# "License"); you may not use this file except in compliance -# with the License. You may obtain a copy of the License at -# -# http://www.apache.org/licenses/LICENSE-2.0 -# -# Unless required by applicable law or agreed to in writing, -# software distributed under the License is distributed on an -# "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY -# KIND, either express or implied. See the License for the -# specific language governing permissions and limitations -# under the License. -# -Title: Entities -type: docs -weight: 400 ---- - -This page documents various entities that can be managed in Apache Polaris (Incubating). - -## Catalog - -A catalog is a top-level entity in Polaris that may contain other entities like [namespaces](#namespace) and [tables](#table). These map directly to [Apache Iceberg catalogs](https://iceberg.apache.org/concepts/catalog/). - -For information on managing catalogs with the REST API or for more information on what data can be associated with a catalog, see [the API docs]({{% github-polaris "client/python/docs/CreateCatalogRequest.md" %}}). - -### Storage Type - -All catalogs in Polaris are associated with a _storage type_. Valid Storage Types are `S3`, `Azure`, and `GCS`. The `FILE` type is also additionally available for testing. Each of these types relates to a different storage provider where data within the catalog may reside. Depending on the storage type, various other configurations may be set for a catalog including credentials to be used when accessing data inside the catalog. - -For details on how to use Storage Types in the REST API, see [the API docs]({{% github-polaris "client/python/docs/StorageConfigInfo.md" %}}). - -For usage examples of storage types, see [docs]({{% ref "command-line-interface" %}}). - -## Namespace - -A namespace is a logical entity that resides within a [catalog](#catalog) and can contain other entities such as [tables](#table) or [views](#view). Some other systems may refer to namespaces as _schemas_ or _databases_. - -In Polaris, namespaces can be nested. For example, `a.b.c.d.e.f.g` is a valid namespace. `b` is said to reside within `a`, and so on. - -For information on managing namespaces with the REST API or for more information on what data can be associated with a namespace, see [the API docs]({{% github-polaris "client/python/docs/CreateNamespaceRequest.md" %}}). - -## Table - -Polaris tables are entities that map to [Apache Iceberg tables](https://iceberg.apache.org/docs/nightly/configuration/), [Delta tables](https://docs.databricks.com/aws/en/delta/table-properties), or [Hudi tables](https://hudi.apache.org/docs/next/configurations#TABLE_CONFIG). - -For information on managing tables with the REST API or for more information on what data can be associated with a table, see [the API docs]({{% github-polaris "client/python/docs/CreateTableRequest.md" %}}). - -## View - -Polaris views are entities that map to [Apache Iceberg views](https://iceberg.apache.org/view-spec/). - -For information on managing views with the REST API or for more information on what data can be associated with a view, see [the API docs]({{% github-polaris "client/python/docs/CreateViewRequest.md" %}}). - -## Principal - -Polaris principals are unique identities that can be used to represent users or services. Each principal may have one or more [principal roles](#principal-role) assigned to it for the purpose of accessing catalogs and the entities within them. - -For information on managing principals with the REST API or for more information on what data can be associated with a principal, see [the API docs]({{% github-polaris "client/python/docs/CreatePrincipalRequest.md" %}}). - -## Principal Role - -Polaris principal roles are labels that may be granted to [principals](#principal). Each principal may have one or more principal roles, and the same principal role may be granted to multiple principals. Principal roles may be assigned based on the persona or responsibilities of a given principal, or on how that principal will need to access different entities within Polaris. - -For information on managing principal roles with the REST API or for more information on what data can be associated with a principal role, see [the API docs]({{% github-polaris "client/python/docs/CreatePrincipalRoleRequest.md" %}}). - -## Catalog Role - -Polaris catalog roles are labels that may be granted to [catalogs](#catalog). Each catalog may have one or more catalog roles, and the same catalog role may be granted to multiple catalogs. Catalog roles may be assigned based on the nature of data that will reside in a catalog, or by the groups of users and services that might need to access that data. - -Each catalog role may have multiple [privileges](#privilege) granted to it, and each catalog role can be granted to one or more [principal roles](#principal-role). This is the mechanism by which principals are granted access to entities inside a catalog such as namespaces and tables. - -## Policy - -Polaris policy is a set of rules governing actions on specified resources under predefined conditions. Polaris support policy for Iceberg table compaction, snapshot expiry, row-level access control, and custom policy definitions. - -Policy can be applied at catalog level, namespace level, or table level. Policy inheritance can be achieved by attaching one to a higher-level scope, such as namespace or catalog. As a result, tables registered under those entities do not need to be declared individually for the same policy. If a table or a namespace requires a different policy, user can assign a different policy, hence overriding policy of the same type declared at the higher level entities. - -## Privilege - -Polaris privileges are granted to [catalog roles](#catalog-role) in order to grant principals with a given principal role some degree of access to catalogs with a given catalog role. When a privilege is granted to a catalog role, any principal roles granted that catalog role receive the privilege. In turn, any principals who are granted that principal role receive it. - -A privilege can be scoped to any entity inside a catalog, including the catalog itself. - -For a list of supported privileges for each privilege class, see the API docs: -* [Table Privileges]({{% github-polaris "client/python/docs/TablePrivilege.md" %}}) -* [View Privileges]({{% github-polaris "client/python/docs/ViewPrivilege.md" %}}) -* [Namespace Privileges]({{% github-polaris "client/python/docs/NamespacePrivilege.md" %}}) -* [Catalog Privileges]({{% github-polaris "client/python/docs/CatalogPrivilege.md" %}}) diff --git a/1.0.0/evolution.md b/1.0.0/evolution.md deleted file mode 100644 index ea29badc84..0000000000 --- a/1.0.0/evolution.md +++ /dev/null @@ -1,115 +0,0 @@ ---- -# -# Licensed to the Apache Software Foundation (ASF) under one -# or more contributor license agreements. See the NOTICE file -# distributed with this work for additional information -# regarding copyright ownership. The ASF licenses this file -# to you under the Apache License, Version 2.0 (the -# "License"); you may not use this file except in compliance -# with the License. You may obtain a copy of the License at -# -# http://www.apache.org/licenses/LICENSE-2.0 -# -# Unless required by applicable law or agreed to in writing, -# software distributed under the License is distributed on an -# "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY -# KIND, either express or implied. See the License for the -# specific language governing permissions and limitations -# under the License. -# -title: Polaris Evolution -type: docs -weight: 1000 ---- - -This page discusses what can be expected from Apache Polaris as the project evolves. - -## Using Polaris as a Catalog - -Polaris is primarily intended to be used as a Catalog of Tables and Views. As such, -it implements the Iceberg REST Catalog API and its own REST APIs. - -Revisions of the Iceberg REST Catalog API are controlled by the [Apache Iceberg](https://iceberg.apache.org/) -community. Polaris attempts to accurately implement this specification. Nonetheless, -optional REST Catalog features may or may not be supported immediately. In general, -there is no guarantee that Polaris releases always implement the latest version of -the Iceberg REST Catalog API. - -Any API under Polaris control that is not in an "experimental" or "beta" state -(e.g. the Management API) is maintained as a versioned REST API. New releases of Polaris -may include changes to the current version of the API. When that happens those changes -are intended to be compatible with prior versions of Polaris clients. Certain endpoints -and parameters may be deprecated. - -In case a major change is required to an API that cannot be implemented in a -backward-compatible way, new endpoints (URI paths) may be introduced. New URI "roots" may -be introduced too (e.g. `api/catalog/v2`). - -Note that those "v1", "v2", etc. URI path segments are not meant to be 1:1 with Polaris -releases or Polaris project version numbers (e.g. a "v2" path segment does not mean that -it is added in Polaris 2.0). - -Polaris servers will support deprecated API endpoints / parameters / versions / etc. -for some transition period to allow clients to migrate. - -### Managing Polaris Database - -Polaris stores its data in a database, which is sometimes referred to as "Metastore" or -"Persistence" in other docs. - -Each Polaris release may support multiple Persistence [implementations](../metastores), -for example, "EclipseLink" (deprecated) and "JDBC" (current). - -Each type of Persistence evolves individually. Within each Persistence type, Polaris -attempts to support rolling upgrades (both version X and X + 1 servers running at the -same time). - -However, migrating between different Persistence types is not supported in a rolling -upgrade manner (for example, migrating from "EclipseLink" to "JDBC"). Polaris provides -[tools](https://github.com/apache/polaris-tools/) for migrating between different -catalogs and those tools may be used to migrate between different Persistence types -as well. Service interruption (downtime) should be expected in those cases. - -## Using Polaris as a Build-Time Dependency - -Polaris produces several jars. These jars or custom builds of Polaris code may be used in -downstream projects according to the terms of the license included into Polaris distributions. - -The minimal version of the JRE required by Polaris code (compilation target) may be updated in -any release. Different Polaris jars may have different minimal JRE version requirements. - -Changes in Java class should be expected at any time regardless of the module name or -whether the class / method is `public` or not. - -This approach is not meant to discourage the use of Polaris code in downstream projects, but -to allow more flexibility in evolving the codebase to support new catalog-level features -and improve code efficiency. Maintainers of downstream projects are encouraged to join Polaris -mailing lists to monitor project changes, suggest improvements, and engage with the Polaris -community in case of specific compatibility concerns. - -## Semantic Versioning - -Polaris strives to follow [Semantic Versioning](https://semver.org/) conventions both with -respect to REST APIs (beta and experimental APIs excepted), [Polaris Policies](../policy/) -and user-facing [configuration](../configuration/). - -The following are some examples of Polaris approach to SemVer in REST APIs / configuration. -These examples are for illustration purposes and should not be considered to be -exhaustive. - -* Polaris implementing an optional Iceberg REST Catalog feature that was unimplemented -in the previous release is not considered a major change. - -* Supporting a new revision of the Iceberg REST Catalog spec in a backward-compatible way -is not considered a major change. Specifically, supporting new REST API prefixes (e.g. `v2`) -is not a major change because it does not affect older clients. - -* Changing the implementation of an Iceberg REST Catalog feature / endpoint in a non-backward -compatible way (e.g. removing or renaming a request parameter) is a major change. - -* Dropping support for a configuration property with the `polaris.` name prefix is a major change. - -* Dropping support for any previously defined [Policy](../policy/) type or property is a major change. - -* Upgrading Quarkus Runtime to its next major version is a major change (because -Quarkus-managed configuration may change). diff --git a/1.0.0/generic-table.md b/1.0.0/generic-table.md deleted file mode 100644 index 2e0e3fe8e6..0000000000 --- a/1.0.0/generic-table.md +++ /dev/null @@ -1,169 +0,0 @@ ---- -# -# Licensed to the Apache Software Foundation (ASF) under one -# or more contributor license agreements. See the NOTICE file -# distributed with this work for additional information -# regarding copyright ownership. The ASF licenses this file -# to you under the Apache License, Version 2.0 (the -# "License"); you may not use this file except in compliance -# with the License. You may obtain a copy of the License at -# -# http://www.apache.org/licenses/LICENSE-2.0 -# -# Unless required by applicable law or agreed to in writing, -# software distributed under the License is distributed on an -# "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY -# KIND, either express or implied. See the License for the -# specific language governing permissions and limitations -# under the License. -# -title: Generic Table (Beta) -type: docs -weight: 435 ---- - -The Generic Table in Apache Polaris is designed to provide support for non-Iceberg tables across different table formats includes delta, csv etc. It currently provides the following capabilities: -- Create a generic table under a namespace -- Load a generic table -- Drop a generic table -- List all generic tables under a namespace - -**NOTE** The current generic table is in beta release. Please use it with caution and report any issue if encountered. - -## What is a Generic Table? - -A generic table in Polaris is an entity that defines the following fields: - -- **name** (required): A unique identifier for the table within a namespace -- **format** (required): The format for the generic table, i.e. "delta", "csv" -- **base-location** (optional): Table base location in URI format. For example: s3:///path/to/table - - The table base location is a location that includes all files for the table - - A table with multiple disjoint locations (i.e. containing files that are outside the configured base location) is not compliant with the current generic table support in Polaris. - - If no location is provided, clients or users are responsible for managing the location. -- **properties** (optional): Properties for the generic table passed on creation. - - Currently, there is no reserved property key defined. - - The property definition and interpretation is delegated to client or engine implementations. -- **doc** (optional): Comment or description for the table - -## Generic Table API Vs. Iceberg Table API - -Generic Table provides a different set of APIs to operate on the generic table entities while Iceberg APIs operates on -the Iceberg table entities. - -| Operations | **Iceberg Table API** | **Generic Table API** | -|--------------|---------------------------------------------------------------------------------------------------------------------------------------------------------------------|-------------------------------------------------------------------------------------------------------------------------------| -| Create Table | Create an Iceberg table | Create a generic table | -| Load Table | Load an Iceberg table. If the table to load is a generic table, you need to call the Generic Table loadTable API, otherwise a TableNotFoundException will be thrown | Load a generic table. Similarly, try to load an Iceberg table through Generic Table API will thrown a TableNotFoundException. | -| Drop Table | Drop an Iceberg table. Similar as load table, if the table to drop is a Generic table, a tableNotFoundException will be thrown. | Drop a generic table. Drop an Iceberg table through Generic table endpoint will thrown an TableNotFound Exception | -| List Table | List all Iceberg tables | List all generic tables | - -Note that generic table shares the same namespace with Iceberg tables, the table name has to be unique under the same namespace. Furthermore, since -there is currently no support for Update Generic Table, any update to the existing table requires a drop and re-create. - -## Working with Generic Table - -There are two ways to work with Polaris Generic Tables today: -1) Directly communicate with Polaris through REST API calls using tools such as `curl`. Details will be described in the later section. -2) Use the Spark client provided if you are working with Spark. Please refer to [Polaris Spark Client]({{% ref "polaris-spark-client" %}}) for detailed instructions. - -### Create a Generic Table - -To create a generic table, you need to provide the corresponding fields as described in [What is a Generic Table](#what-is-a-generic-table). - -The REST API for creating a generic Table is `POST /polaris/v1/{prefix}/namespaces/{namespace}/generic-tables`, and the -request body looks like the following: - -```json -{ - "name": "", - "format": "", - "base-location": "", - "doc": "", - "properties": { - "": "" - } -} -``` - -Here is an example to create a generic table with name `delta_table` and format as `delta` under a namespace `delta_ns` -for catalog `delta_catalog` using curl: - -```shell -curl -X POST http://localhost:8181/api/catalog/polaris/v1/delta_catalog/namespaces/delta_ns/generic-tables \ - -H "Content-Type: application/json" \ - -d '{ - "name": "delta_table", - "format": "delta", - "base-location": "s3:///path/to/table", - "doc": "delta table example", - "properties": { - "key1": "value1" - } - }' -``` - -### Load a Generic Table -The REST endpoint for load a generic table is `GET /polaris/v1/{prefix}/namespaces/{namespace}/generic-tables/{generic-table}`. - -Here is an example to load the table `delta_table` using curl: -```shell -curl -X GET http://localhost:8181/api/catalog/polaris/v1/delta_catalog/namespaces/delta_ns/generic-tables/delta_table -``` -And the response looks like the following: -```json -{ - "table": { - "name": "delta_table", - "format": "delta", - "base-location": "s3:///path/to/table", - "doc": "delta table example", - "properties": { - "key1": "value1" - } - } -} -``` - -### List Generic Tables -The REST endpoint for listing the generic tables under a given -namespace is `GET /polaris/v1/{prefix}/namespaces/{namespace}/generic-tables/`. - -Following curl command lists all tables under namespace delta_namespace: -```shell -curl -X GET http://localhost:8181/api/catalog/polaris/v1/delta_catalog/namespaces/delta_ns/generic-tables/ -``` -Example Response: -```json -{ - "identifiers": [ - { - "namespace": ["delta_ns"], - "name": "delta_table" - } - ], - "next-page-token": null -} -``` - -### Drop a Generic Table -The drop generic table REST endpoint is `DELETE /polaris/v1/{prefix}/namespaces/{namespace}/generic-tables/{generic-table}` - -The following curl call drops the table `delat_table`: -```shell -curl -X DELETE http://localhost:8181/api/catalog/polaris/v1/delta_catalog/namespaces/delta_ns/generic-tables/{generic-table} -``` - -### API Reference - -For the complete and up-to-date API specification, see the [Catalog API Spec](https://editor-next.swagger.io/?url=https://raw.githubusercontent.com/apache/polaris/refs/heads/main/spec/generated/bundled-polaris-catalog-service.yaml). - -## Limitations - -Current limitations of Generic Table support: -1) Limited spec information. Currently, there is no spec for information like Schema, Partition etc. -2) No commit coordination or update capability provided at the catalog service level. - -Therefore, the catalog itself is unaware of anything about the underlying table except some of the loosely defined metadata. -It is the responsibility of the engine (and plugins used by the engine) to determine exactly how loading or commiting data -should look like based on the metadata. For example, with the delta support, th delta log serialization, deserialization -and update all happens at client side. diff --git a/1.0.0/getting-started/_index.md b/1.0.0/getting-started/_index.md deleted file mode 100644 index 515d211538..0000000000 --- a/1.0.0/getting-started/_index.md +++ /dev/null @@ -1,23 +0,0 @@ ---- -# -# Licensed to the Apache Software Foundation (ASF) under one -# or more contributor license agreements. See the NOTICE file -# distributed with this work for additional information -# regarding copyright ownership. The ASF licenses this file -# to you under the Apache License, Version 2.0 (the -# "License"); you may not use this file except in compliance -# with the License. You may obtain a copy of the License at -# -# http://www.apache.org/licenses/LICENSE-2.0 -# -# Unless required by applicable law or agreed to in writing, -# software distributed under the License is distributed on an -# "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY -# KIND, either express or implied. See the License for the -# specific language governing permissions and limitations -# under the License. -# -title: 'Getting Started' -type: docs -weight: 101 ---- \ No newline at end of file diff --git a/1.0.0/getting-started/deploying-polaris/_index.md b/1.0.0/getting-started/deploying-polaris/_index.md deleted file mode 100644 index 32fd5dafd6..0000000000 --- a/1.0.0/getting-started/deploying-polaris/_index.md +++ /dev/null @@ -1,27 +0,0 @@ ---- -# -# Licensed to the Apache Software Foundation (ASF) under one -# or more contributor license agreements. See the NOTICE file -# distributed with this work for additional information -# regarding copyright ownership. The ASF licenses this file -# to you under the Apache License, Version 2.0 (the -# "License"); you may not use this file except in compliance -# with the License. You may obtain a copy of the License at -# -# http://www.apache.org/licenses/LICENSE-2.0 -# -# Unless required by applicable law or agreed to in writing, -# software distributed under the License is distributed on an -# "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY -# KIND, either express or implied. See the License for the -# specific language governing permissions and limitations -# under the License. -# -Title: Deploying Polaris on Cloud Providers -type: docs -weight: 300 ---- - -We will now demonstrate how to deploy Polaris locally, as well as with all supported Cloud Providers: Amazon Web Services (AWS), Azure, and Google Cloud Platform (GCP). - -Locally, Polaris can be deployed using both Docker and local build. On the cloud, this tutorial will deploy Polaris using Docker only - but local builds can also be executed. \ No newline at end of file diff --git a/1.0.0/getting-started/deploying-polaris/quickstart-deploy-aws.md b/1.0.0/getting-started/deploying-polaris/quickstart-deploy-aws.md deleted file mode 100644 index fd95b72b0c..0000000000 --- a/1.0.0/getting-started/deploying-polaris/quickstart-deploy-aws.md +++ /dev/null @@ -1,57 +0,0 @@ ---- -# -# Licensed to the Apache Software Foundation (ASF) under one -# or more contributor license agreements. See the NOTICE file -# distributed with this work for additional information -# regarding copyright ownership. The ASF licenses this file -# to you under the Apache License, Version 2.0 (the -# "License"); you may not use this file except in compliance -# with the License. You may obtain a copy of the License at -# -# http://www.apache.org/licenses/LICENSE-2.0 -# -# Unless required by applicable law or agreed to in writing, -# software distributed under the License is distributed on an -# "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY -# KIND, either express or implied. See the License for the -# specific language governing permissions and limitations -# under the License. -# -Title: Deploying Polaris on Amazon Web Services (AWS) -type: docs -weight: 310 ---- - -Build and launch Polaris using the AWS Startup Script at the location provided in the command below. This script will start an [Amazon RDS for PostgreSQL](https://aws.amazon.com/rds/postgresql/) instance, which will be used as the backend Postgres instance holding all Polaris data. -Additionally, Polaris will be bootstrapped to use this database and Docker containers will be spun up for Spark SQL and Trino. - -The requirements to run the script below are: -* There must be at least two subnets created in the VPC and region in which your EC2 instance reside. The span of subnets MUST include at least 2 availability zones (AZs) within the same region. -* Your EC2 instance must be enabled with [IMDSv1 or IMDSv2 with 2+ hop limit](https://docs.aws.amazon.com/AWSEC2/latest/UserGuide/configuring-IMDS-new-instances.html#configure-IMDS-new-instances-instance-settings). -* The AWS identity that you will use to run this script must have the following AWS permissions: - * "ec2:DescribeInstances" - * "rds:CreateDBInstance" - * "rds:DescribeDBInstances" - * "rds:CreateDBSubnetGroup" - * "sts:AssumeRole" on the same role as the Instance Profile role of the EC2 instance on which you are running this script. Additionally, you should ensure that the Instance Profile contains a trust policy that allows the role to trust itself to be assumed. - -```shell -chmod +x getting-started/assets/cloud_providers/deploy-aws.sh -export ASSETS_PATH=$(pwd)/getting-started/assets/ -export CLIENT_ID=root -export CLIENT_SECRET=s3cr3t -./getting-started/assets/cloud_providers/deploy-aws.sh -``` - -## Next Steps -Congrats, you now have a running instance of1 Polaris! For details on how to use Polaris, check out the [Using Polaris]({{% relref "../using-polaris.md" %}}) page. - -## Cleanup Instructions -To shut down the Polaris server, run the following commands: - -```shell -export ASSETS_PATH=$(pwd)/getting-started/assets/ -docker compose -p polaris -f getting-started/eclipselink/docker-compose.yml down -``` - -To deploy Polaris in a production setting, please review further recommendations at the [Configuring Polaris for Production]({{% relref "../../configuring-polaris-for-production" %}}) page. \ No newline at end of file diff --git a/1.0.0/getting-started/deploying-polaris/quickstart-deploy-azure.md b/1.0.0/getting-started/deploying-polaris/quickstart-deploy-azure.md deleted file mode 100644 index 74df725db0..0000000000 --- a/1.0.0/getting-started/deploying-polaris/quickstart-deploy-azure.md +++ /dev/null @@ -1,52 +0,0 @@ ---- -# -# Licensed to the Apache Software Foundation (ASF) under one -# or more contributor license agreements. See the NOTICE file -# distributed with this work for additional information -# regarding copyright ownership. The ASF licenses this file -# to you under the Apache License, Version 2.0 (the -# "License"); you may not use this file except in compliance -# with the License. You may obtain a copy of the License at -# -# http://www.apache.org/licenses/LICENSE-2.0 -# -# Unless required by applicable law or agreed to in writing, -# software distributed under the License is distributed on an -# "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY -# KIND, either express or implied. See the License for the -# specific language governing permissions and limitations -# under the License. -# -Title: Deploying Polaris on Azure -type: docs -weight: 320 ---- - -Build and launch Polaris using the AWS Startup Script at the location provided in the command below. This script will start an [Azure Database for PostgreSQL - Flexible Server](https://learn.microsoft.com/en-us/azure/postgresql/flexible-server/overview) instance, which will be used as the backend Postgres instance holding all Polaris data. -Additionally, Polaris will be bootstrapped to use this database and Docker containers will be spun up for Spark SQL and Trino. - -The requirements to run the script below are: -* Install the AZ CLI, if it is not already installed on the Azure VM. Instructions to download the AZ CLI can be found [here](https://learn.microsoft.com/en-us/cli/azure/install-azure-cli). -* You must be logged into the AZ CLI. Please run `az account show` to ensure that you are logged in prior to running this script. -* Assign a System-Assigned Managed Identity to the Azure VM. - -```shell -chmod +x getting-started/assets/cloud_providers/deploy-azure.sh -export ASSETS_PATH=$(pwd)/getting-started/assets/ -export CLIENT_ID=root -export CLIENT_SECRET=s3cr3t -./getting-started/assets/cloud_providers/deploy-azure.sh -``` - -## Next Steps -Congrats, you now have a running instance of Polaris! For further information regarding how to use Polaris, check out the [Using Polaris]({{% relref "../using-polaris" %}}) page. - -## Cleanup Instructions -To shut down the Polaris server, run the following commands: - -```shell -export ASSETS_PATH=$(pwd)/getting-started/assets/ -docker compose -p polaris -f getting-started/eclipselink/docker-compose.yml down -``` - -To deploy Polaris in a production setting, please review further recommendations at the [Configuring Polaris for Production]({{% relref "../../configuring-polaris-for-production" %}}) page. \ No newline at end of file diff --git a/1.0.0/getting-started/deploying-polaris/quickstart-deploy-gcp.md b/1.0.0/getting-started/deploying-polaris/quickstart-deploy-gcp.md deleted file mode 100644 index 9641ad7282..0000000000 --- a/1.0.0/getting-started/deploying-polaris/quickstart-deploy-gcp.md +++ /dev/null @@ -1,52 +0,0 @@ ---- -# -# Licensed to the Apache Software Foundation (ASF) under one -# or more contributor license agreements. See the NOTICE file -# distributed with this work for additional information -# regarding copyright ownership. The ASF licenses this file -# to you under the Apache License, Version 2.0 (the -# "License"); you may not use this file except in compliance -# with the License. You may obtain a copy of the License at -# -# http://www.apache.org/licenses/LICENSE-2.0 -# -# Unless required by applicable law or agreed to in writing, -# software distributed under the License is distributed on an -# "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY -# KIND, either express or implied. See the License for the -# specific language governing permissions and limitations -# under the License. -# -Title: Deploying Polaris on Google Cloud Platform (GCP) -type: docs -weight: 330 ---- - -Build and launch Polaris using the AWS Startup Script at the location provided in the command below. This script will start a [Cloud SQL for PostgreSQL](https://cloud.google.com/sql/docs/postgres) instance, which will be used as the backend Postgres instance holding all Polaris data. -Additionally, Polaris will be bootstrapped to use this database and Docker containers will be spun up for Spark SQL and Trino. - -The requirements to run the script below are: -* Install the `gcloud` CLI, if it is not already installed on the GCP VM. Instructions to download the `gcloud` CLI can be found [here](https://cloud.google.com/sdk/docs/install). -* Ensure the `Cloud SQL Admin API` has been enabled in your project and that your VM's Principal has access to the correct role: `roles/cloudsql.admin`. -* Ensure the VM's Principal has access to at least Read-only scope on Compute Engine: `compute.readonly`. - -```shell -chmod +x getting-started/assets/cloud_providers/deploy-gcp.sh -export ASSETS_PATH=$(pwd)/getting-started/assets/ -export CLIENT_ID=root -export CLIENT_SECRET=s3cr3t -./getting-started/assets/cloud_providers/deploy-gcp.sh -``` - -## Next Steps -Congrats, you now have a running instance of Polaris! For further information regarding how to use Polaris, check out the [Using Polaris]({{% relref "../using-polaris" %}}) page. - -## Cleanup Instructions -To shut down the Polaris server, run the following commands: - -```shell -export ASSETS_PATH=$(pwd)/getting-started/assets/ -docker compose -p polaris -f getting-started/eclipselink/docker-compose.yml down -``` - -To deploy Polaris in a production setting, please review further recommendations at the [Configuring Polaris for Production]({{% relref "../../configuring-polaris-for-production" %}}) page. \ No newline at end of file diff --git a/1.0.0/getting-started/install-dependencies.md b/1.0.0/getting-started/install-dependencies.md deleted file mode 100644 index 7341118868..0000000000 --- a/1.0.0/getting-started/install-dependencies.md +++ /dev/null @@ -1,118 +0,0 @@ ---- -# -# Licensed to the Apache Software Foundation (ASF) under one -# or more contributor license agreements. See the NOTICE file -# distributed with this work for additional information -# regarding copyright ownership. The ASF licenses this file -# to you under the Apache License, Version 2.0 (the -# "License"); you may not use this file except in compliance -# with the License. You may obtain a copy of the License at -# -# http://www.apache.org/licenses/LICENSE-2.0 -# -# Unless required by applicable law or agreed to in writing, -# software distributed under the License is distributed on an -# "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY -# KIND, either express or implied. See the License for the -# specific language governing permissions and limitations -# under the License. -# -Title: Installing Dependencies -type: docs -weight: 100 ---- - -This guide serves as an introduction to several key entities that can be managed with Apache Polaris (Incubating), describes how to build and deploy Polaris locally, and finally includes examples of how to use Polaris with Apache Spark™. - -# Prerequisites - -This guide covers building Polaris, deploying it locally or via [Docker](https://www.docker.com/), and interacting with it using the command-line interface and [Apache Spark](https://spark.apache.org/). Before proceeding with Polaris, be sure to satisfy the relevant prerequisites listed here. - -## Git - -To get the latest Polaris code, you'll need to clone the repository using [git](https://git-scm.com/). You can install git using [homebrew](https://brew.sh/) on MacOS: - -```shell -brew install git -``` - -Please follow instructions from the [Git Documentation](https://git-scm.com/book/en/v2/Getting-Started-Installing-Git) for instructions on installing Git on other platforms. - -Then, use git to clone the Polaris repo: - -```shell -cd ~ -git clone https://github.com/apache/polaris.git -``` - -## Docker - -It is recommended to deploy Polaris inside [Docker](https://www.docker.com/) for the Quickstart workflow. Instructions for deploying the Quickstart workflow on the supported Cloud Providers (AWS, Azure, GCP) will be provided only with Docker. However, non-Docker deployment instructions for local deployments can also be followed on Cloud Providers. - -Instructions to install Docker can be found on the [Docker website](https://docs.docker.com/engine/install/). Ensure that Docker and the Docker Compose plugin are both installed. - -### Docker on MacOS -Docker can be installed using [homebrew](https://brew.sh/): - -```shell -brew install --cask docker -``` - -There could be a [Docker permission issues](https://github.com/apache/polaris/pull/971) related to seccomp configuration. To resolve these issues, set the `seccomp` profile to "unconfined" when running a container. For example: - -```shell -docker run --security-opt seccomp=unconfined apache/polaris:latest -``` - -Note: Setting the seccomp profile to "unconfined" disables the default system call filtering, which may pose security risks. Use this configuration with caution, especially in production environments. - -### Docker on Amazon Linux -Docker can be installed using a modification to the CentOS instructions. For example: - -```shell -sudo dnf update -y -# Remove old version -sudo dnf remove -y docker docker-client docker-client-latest docker-common docker-latest docker-latest-logrotate docker-logrotate docker-engine -# Install dnf plugin -sudo dnf -y install dnf-plugins-core -# Add CentOS repository -sudo dnf config-manager --add-repo https://download.docker.com/linux/centos/docker-ce.repo -# Adjust release server version in the path as it will not match with Amazon Linux 2023 -sudo sed -i 's/$releasever/9/g' /etc/yum.repos.d/docker-ce.repo -# Install as usual -sudo dnf -y install docker-ce docker-ce-cli containerd.io docker-buildx-plugin docker-compose-plugin -``` - -### Confirm Docker Installation - -Once installed, make sure that both Docker and the Docker Compose plugin are installed: - -```shell -docker version -docker compose version -``` - -Also make sure Docker is running and is able to run a sample Docker container: - -```shell -docker run hello-world -``` - -## Java - -If you plan to build Polaris from source yourself or using this tutorial's instructions on a Cloud Provider, you will need to satisfy a few prerequisites first. - -Polaris is built using [gradle](https://gradle.org/) and is compatible with Java 21. We recommend the use of [jenv](https://www.jenv.be/) to manage multiple Java versions. For example, to install Java 21 via [homebrew](https://brew.sh/) and configure it with jenv: - -```shell -cd ~/polaris -brew install openjdk@21 jenv -jenv add $(brew --prefix openjdk@21) -jenv local 21 -``` - -Ensure that `java --version` and `javac` both return non-zero responses. - -## jq - -Most Polaris Quickstart scripts require `jq`. Follow the instructions from the [jq](https://jqlang.org/download/) website to download this tool. \ No newline at end of file diff --git a/1.0.0/getting-started/quickstart.md b/1.0.0/getting-started/quickstart.md deleted file mode 100644 index a9fd43f906..0000000000 --- a/1.0.0/getting-started/quickstart.md +++ /dev/null @@ -1,116 +0,0 @@ ---- -# -# Licensed to the Apache Software Foundation (ASF) under one -# or more contributor license agreements. See the NOTICE file -# distributed with this work for additional information -# regarding copyright ownership. The ASF licenses this file -# to you under the Apache License, Version 2.0 (the -# "License"); you may not use this file except in compliance -# with the License. You may obtain a copy of the License at -# -# http://www.apache.org/licenses/LICENSE-2.0 -# -# Unless required by applicable law or agreed to in writing, -# software distributed under the License is distributed on an -# "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY -# KIND, either express or implied. See the License for the -# specific language governing permissions and limitations -# under the License. -# -Title: Quickstart -type: docs -weight: 200 ---- - -Polaris can be deployed via a docker image or as a standalone process. Before starting, be sure that you've satisfied the relevant prerequisites detailed in the previous page. - -## Common Setup -Before running Polaris, ensure you have completed the following setup steps: - -1. **Build Polaris** -```shell -cd ~/polaris -./gradlew \ - :polaris-server:assemble \ - :polaris-server:quarkusAppPartsBuild \ - :polaris-admin:assemble --rerun \ - -Dquarkus.container-image.tag=postgres-latest \ - -Dquarkus.container-image.build=true -``` -- **For standalone**: Omit the `-Dquarkus.container-image.tag` and `-Dquarkus.container-image.build` options if you do not need to build a Docker image. - -## Running Polaris with Docker - -To start using Polaris in Docker and launch Polaris, which is packaged with a Postgres instance, Apache Spark, and Trino. - -```shell -export ASSETS_PATH=$(pwd)/getting-started/assets/ -export QUARKUS_DATASOURCE_JDBC_URL=jdbc:postgresql://postgres:5432/POLARIS -export QUARKUS_DATASOURCE_USERNAME=postgres -export QUARKUS_DATASOURCE_PASSWORD=postgres -export CLIENT_ID=root -export CLIENT_SECRET=s3cr3t -docker compose -p polaris -f getting-started/assets/postgres/docker-compose-postgres.yml \ - -f getting-started/jdbc/docker-compose-bootstrap-db.yml \ - -f getting-started/jdbc/docker-compose.yml up -d -``` - -You should see output for some time as Polaris, Spark, and Trino build and start up. Eventually, you won’t see any more logs and see some logs relating to Spark, resembling the following: - -``` -spark-sql-1 | Spark Web UI available at http://8bc4de8ed854:4040 -spark-sql-1 | Spark master: local[*], Application Id: local-1743745174604 -spark-sql-1 | 25/04/04 05:39:38 WARN SparkSQLCLIDriver: WARNING: Directory for Hive history file: /home/spark does not exist. History will not be available during this session. -spark-sql-1 | 25/04/04 05:39:39 WARN RESTSessionCatalog: Iceberg REST client is missing the OAuth2 server URI configuration and defaults to http://polaris:8181/api/catalogv1/oauth/tokens. This automatic fallback will be removed in a future Iceberg release.It is recommended to configure the OAuth2 endpoint using the 'oauth2-server-uri' property to be prepared. This warning will disappear if the OAuth2 endpoint is explicitly configured. See https://github.com/apache/iceberg/issues/10537 -``` - -The Docker image pre-configures a sample catalog called `quickstart_catalog` that uses a local file system. - -## Running Polaris as a Standalone Process - -You can also start Polaris through Gradle (packaged within the Polaris repository): - -1. **Start the Server** - -Run the following command to start Polaris: - -```shell -./gradlew run -``` - -You should see output for some time as Polaris builds and starts up. Eventually, you won’t see any more logs and should see messages that resemble the following: - -``` -INFO [io.quarkus] [,] [,,,] (Quarkus Main Thread) polaris-runtime-service on JVM (powered by Quarkus ) started in 2.656s. Listening on: http://localhost:8181. Management interface listening on http://0.0.0.0:8182. -INFO [io.quarkus] [,] [,,,] (Quarkus Main Thread) Profile prod activated. Live Coding activated. -INFO [io.quarkus] [,] [,,,] (Quarkus Main Thread) Installed features: [...] -``` - -At this point, Polaris is running. - -When using a Gradle-launched Polaris instance in this tutorial, we'll launch an instance of Polaris that stores entities only in-memory. This means that any entities that you define will be destroyed when Polaris is shut down. -For more information on how to configure Polaris for production usage, see the [docs]({{% relref "../configuring-polaris-for-production" %}}). - -When Polaris is run using the `./gradlew run` command, the root principal credentials are `root` and `secret` for the `CLIENT_ID` and `CLIENT_SECRET`, respectively. - -### Installing Apache Spark and Trino Locally for Testing - -#### Apache Spark - -If you want to connect to Polaris with [Apache Spark](https://spark.apache.org/), you'll need to start by cloning Spark. As in the [prerequisites]({{% ref "install-dependencies#git" %}}), make sure [git](https://git-scm.com/) is installed first. - -Then, clone Spark and check out a versioned branch. This guide uses [Spark 3.5](https://spark.apache.org/releases/spark-release-3-5-0.html). - -```shell -git clone --branch branch-3.5 https://github.com/apache/spark.git ~/spark -``` - -#### Trino -If you want to connect to Polaris with [Trino](https://trino.io/), it is recommended to set up a test instance of Trino using Docker. As in the [prerequisites]({{% ref "install-dependencies#docker" %}}), make sure [Docker](https://www.docker.com/) is installed first - -```shell -docker run --name trino -d -p 8080:8080 trinodb/trino -``` - -## Next Steps -Congrats, you now have a running instance of Polaris! For further information regarding how to use Polaris, check out the [Using Polaris]({{% ref "using-polaris" %}}) page. \ No newline at end of file diff --git a/1.0.0/getting-started/using-polaris.md b/1.0.0/getting-started/using-polaris.md deleted file mode 100644 index 35f0bae336..0000000000 --- a/1.0.0/getting-started/using-polaris.md +++ /dev/null @@ -1,315 +0,0 @@ ---- -# -# Licensed to the Apache Software Foundation (ASF) under one -# or more contributor license agreements. See the NOTICE file -# distributed with this work for additional information -# regarding copyright ownership. The ASF licenses this file -# to you under the Apache License, Version 2.0 (the -# "License"); you may not use this file except in compliance -# with the License. You may obtain a copy of the License at -# -# http://www.apache.org/licenses/LICENSE-2.0 -# -# Unless required by applicable law or agreed to in writing, -# software distributed under the License is distributed on an -# "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY -# KIND, either express or implied. See the License for the -# specific language governing permissions and limitations -# under the License. -# -Title: Using Polaris -type: docs -weight: 400 ---- - -## Setup - -Ensure your `CLIENT_ID` & `CLIENT_SECRET` variables are already defined, as they were required for starting the Polaris server earlier. - -```shell -export CLIENT_ID=YOUR_CLIENT_ID -export CLIENT_SECRET=YOUR_CLIENT_SECRET -``` - -## Defining a Catalog - -In Polaris, the [catalog]({{% relref "../entities#catalog" %}}) is the top-level entity that objects like [tables]({{% relref "../entities#table" %}}) and [views]({{% relref "../entities#view" %}}) are organized under. With a Polaris service running, you can create a catalog like so: - -```shell -cd ~/polaris - -./polaris \ - --client-id ${CLIENT_ID} \ - --client-secret ${CLIENT_SECRET} \ - catalogs \ - create \ - --storage-type s3 \ - --default-base-location ${DEFAULT_BASE_LOCATION} \ - --role-arn ${ROLE_ARN} \ - quickstart_catalog -``` - -This will create a new catalog called **quickstart_catalog**. If you are using one of the Getting Started locally-built Docker images, we have already created a catalog named `quickstart_catalog` for you. - -The `DEFAULT_BASE_LOCATION` you provide will be the default location that objects in this catalog should be stored in, and the `ROLE_ARN` you provide should be a [Role ARN](https://docs.aws.amazon.com/IAM/latest/UserGuide/reference-arns.html) with access to read and write data in that location. These credentials will be provided to engines reading data from the catalog once they have authenticated with Polaris using credentials that have access to those resources. - -If you’re using a storage type other than S3, such as Azure, you’ll provide a different type of credential than a Role ARN. For more details on supported storage types, see the [docs]({{% relref "../entities#storage-type" %}}). - -Additionally, if Polaris is running somewhere other than `localhost:8181`, you can specify the correct hostname and port by providing `--host` and `--port` flags. For the full set of options supported by the CLI, please refer to the [docs]({{% relref "../command-line-interface" %}}). - - -### Creating a Principal and Assigning it Privileges - -With a catalog created, we can create a [principal]({{% relref "../entities#principal" %}}) that has access to manage that catalog. For details on how to configure the Polaris CLI, see [the section above](#defining-a-catalog) or refer to the [docs]({{% relref "../command-line-interface" %}}). - -```shell -./polaris \ - --client-id ${CLIENT_ID} \ - --client-secret ${CLIENT_SECRET} \ - principals \ - create \ - quickstart_user - -./polaris \ - --client-id ${CLIENT_ID} \ - --client-secret ${CLIENT_SECRET} \ - principal-roles \ - create \ - quickstart_user_role - -./polaris \ - --client-id ${CLIENT_ID} \ - --client-secret ${CLIENT_SECRET} \ - catalog-roles \ - create \ - --catalog quickstart_catalog \ - quickstart_catalog_role -``` - -Be sure to provide the necessary credentials, hostname, and port as before. - -When the `principals create` command completes successfully, it will return the credentials for this new principal. Export them for future use. For example: - -```shell -./polaris ... principals create example -{"clientId": "XXXX", "clientSecret": "YYYY"} -export USER_CLIENT_ID=XXXX -export USER_CLIENT_SECRET=YYYY -``` - -Now, we grant the principal the [principal role]({{% relref "../entities#principal-role" %}}) we created, and grant the [catalog role]({{% relref "../entities#catalog-role" %}}) the principal role we created. For more information on these entities, please refer to the linked documentation. - -```shell -./polaris \ - --client-id ${CLIENT_ID} \ - --client-secret ${CLIENT_SECRET} \ - principal-roles \ - grant \ - --principal quickstart_user \ - quickstart_user_role - -./polaris \ - --client-id ${CLIENT_ID} \ - --client-secret ${CLIENT_SECRET} \ - catalog-roles \ - grant \ - --catalog quickstart_catalog \ - --principal-role quickstart_user_role \ - quickstart_catalog_role -``` - -Now, we’ve linked our principal to the catalog via roles like so: - -![Principal to Catalog](/img/quickstart/privilege-illustration-1.png "Principal to Catalog") - -In order to give this principal the ability to interact with the catalog, we must assign some [privileges]({{% relref "../entities#privilege" %}}). For the time being, we will give this principal the ability to fully manage content in our new catalog. We can do this with the CLI like so: - -```shell -./polaris \ - --client-id ${CLIENT_ID} \ - --client-secret ${CLIENT_SECRET} \ - privileges \ - catalog \ - grant \ - --catalog quickstart_catalog \ - --catalog-role quickstart_catalog_role \ - CATALOG_MANAGE_CONTENT -``` - -This grants the [catalog privileges]({{% relref "../entities#privilege" %}}) `CATALOG_MANAGE_CONTENT` to our catalog role, linking everything together like so: - -![Principal to Catalog with Catalog Role](/img/quickstart/privilege-illustration-2.png "Principal to Catalog with Catalog Role") - -`CATALOG_MANAGE_CONTENT` has create/list/read/write privileges on all entities within the catalog. The same privilege could be granted to a namespace, in which case the principal could create/list/read/write any entity under that namespace. - -## Using Iceberg & Polaris - -At this point, we’ve created a principal and granted it the ability to manage a catalog. We can now use an external engine to assume that principal, access our catalog, and store data in that catalog using [Apache Iceberg](https://iceberg.apache.org/). Polaris is compatible with any [Apache Iceberg](https://iceberg.apache.org/) client that supports the REST API. Depending on the client you plan to use, refer to the respective examples below. - -### Connecting with Spark - -#### Using a Local Build of Spark - -To use a Polaris-managed catalog in [Apache Spark](https://spark.apache.org/), we can configure Spark to use the Iceberg catalog REST API. - -This guide uses [Apache Spark 3.5](https://spark.apache.org/releases/spark-release-3-5-0.html), but be sure to find [the appropriate iceberg-spark package for your Spark version](https://mvnrepository.com/artifact/org.apache.iceberg/iceberg-spark). From a local Spark clone on the `branch-3.5` branch we can run the following: - -_Note: the credentials provided here are those for our principal, not the root credentials._ - -```shell -bin/spark-sql \ ---packages org.apache.iceberg:iceberg-spark-runtime-3.5_2.12:1.9.0,org.apache.iceberg:iceberg-aws-bundle:1.9.0 \ ---conf spark.sql.extensions=org.apache.iceberg.spark.extensions.IcebergSparkSessionExtensions \ ---conf spark.sql.catalog.quickstart_catalog.warehouse=quickstart_catalog \ ---conf spark.sql.catalog.quickstart_catalog.header.X-Iceberg-Access-Delegation=vended-credentials \ ---conf spark.sql.catalog.quickstart_catalog=org.apache.iceberg.spark.SparkCatalog \ ---conf spark.sql.catalog.quickstart_catalog.catalog-impl=org.apache.iceberg.rest.RESTCatalog \ ---conf spark.sql.catalog.quickstart_catalog.uri=http://localhost:8181/api/catalog \ ---conf spark.sql.catalog.quickstart_catalog.credential='${USER_CLIENT_ID}:${USER_CLIENT_SECRET}' \ ---conf spark.sql.catalog.quickstart_catalog.scope='PRINCIPAL_ROLE:ALL' \ ---conf spark.sql.catalog.quickstart_catalog.token-refresh-enabled=true \ ---conf spark.sql.catalog.quickstart_catalog.client.region=us-west-2 -``` - -Similar to the CLI commands above, this configures Spark to use the Polaris running at `localhost:8181`. If your Polaris server is running elsewhere, but sure to update the configuration appropriately. - -Finally, note that we include the `iceberg-aws-bundle` package here. If your table is using a different filesystem, be sure to include the appropriate dependency. - -#### Using Spark SQL from a Docker container - -Refresh the Docker container with the user's credentials: -```shell -docker compose -p polaris -f getting-started/eclipselink/docker-compose.yml stop spark-sql -docker compose -p polaris -f getting-started/eclipselink/docker-compose.yml rm -f spark-sql -docker compose -p polaris -f getting-started/eclipselink/docker-compose.yml up -d --no-deps spark-sql -``` - -Attach to the running spark-sql container: - -```shell -docker attach $(docker ps -q --filter name=spark-sql) -``` - -#### Sample Commands - -Once the Spark session starts, we can create a namespace and table within the catalog: - -```sql -USE quickstart_catalog; -CREATE NAMESPACE IF NOT EXISTS quickstart_namespace; -CREATE NAMESPACE IF NOT EXISTS quickstart_namespace.schema; -USE NAMESPACE quickstart_namespace.schema; -CREATE TABLE IF NOT EXISTS quickstart_table (id BIGINT, data STRING) USING ICEBERG; -``` - -We can now use this table like any other: - -``` -INSERT INTO quickstart_table VALUES (1, 'some data'); -SELECT * FROM quickstart_table; -. . . -+---+---------+ -|id |data | -+---+---------+ -|1 |some data| -+---+---------+ -``` - -If at any time access is revoked... - -```shell -./polaris \ - --client-id ${CLIENT_ID} \ - --client-secret ${CLIENT_SECRET} \ - privileges \ - catalog \ - revoke \ - --catalog quickstart_catalog \ - --catalog-role quickstart_catalog_role \ - CATALOG_MANAGE_CONTENT -``` - -Spark will lose access to the table: - -``` -INSERT INTO quickstart_table VALUES (1, 'some data'); - -org.apache.iceberg.exceptions.ForbiddenException: Forbidden: Principal 'quickstart_user' with activated PrincipalRoles '[]' and activated grants via '[quickstart_catalog_role, quickstart_user_role]' is not authorized for op LOAD_TABLE_WITH_READ_DELEGATION -``` - -### Connecting with Trino - -Refresh the Docker container with the user's credentials: - -```shell -docker compose -p polaris -f getting-started/eclipselink/docker-compose.yml stop trino -docker compose -p polaris -f getting-started/eclipselink/docker-compose.yml rm -f trino -docker compose -p polaris -f getting-started/eclipselink/docker-compose.yml up -d --no-deps trino -``` - -Attach to the running Trino container: - -```shell -docker exec -it $(docker ps -q --filter name=trino) trino -``` - -You may not see Trino's prompt immediately, type ENTER to see it. A few commands that you can try: - -```sql -SHOW CATALOGS; -SHOW SCHEMAS FROM iceberg; -CREATE SCHEMA iceberg.quickstart_schema; -CREATE TABLE iceberg.quickstart_schema.quickstart_table AS SELECT 1 x; -SELECT * FROM iceberg.quickstart_schema.quickstart_table; -``` - -If at any time access is revoked... - -```shell -./polaris \ - --client-id ${CLIENT_ID} \ - --client-secret ${CLIENT_SECRET} \ - privileges \ - catalog \ - revoke \ - --catalog quickstart_catalog \ - --catalog-role quickstart_catalog_role \ - CATALOG_MANAGE_CONTENT -``` - -Trino will lose access to the table: - -```sql -SELECT * FROM iceberg.quickstart_schema.quickstart_table; - -org.apache.iceberg.exceptions.ForbiddenException: Forbidden: Principal 'quickstart_user' with activated PrincipalRoles '[]' and activated grants via '[quickstart_catalog_role, quickstart_user_role]' is not authorized for op LOAD_TABLE_WITH_READ_DELEGATION -``` - -### Connecting Using REST APIs - -To access Polaris from the host machine, first request an access token: - -```shell -export POLARIS_TOKEN=$(curl -s http://polaris:8181/api/catalog/v1/oauth/tokens \ - --resolve polaris:8181:127.0.0.1 \ - --user ${CLIENT_ID}:${CLIENT_SECRET} \ - -d 'grant_type=client_credentials' \ - -d 'scope=PRINCIPAL_ROLE:ALL' | jq -r .access_token) -``` - -Then, use the access token in the Authorization header when accessing Polaris: - -```shell -curl -v http://127.0.0.1:8181/api/management/v1/principal-roles -H "Authorization: Bearer $POLARIS_TOKEN" -curl -v http://127.0.0.1:8181/api/management/v1/catalogs/quickstart_catalog -H "Authorization: Bearer $POLARIS_TOKEN" -``` - -## Next Steps -* Visit [Configuring Polaris for Production]({{% relref "../configuring-polaris-for-production" %}}). -* A Getting Started experience for using Spark with Jupyter Notebooks is documented [here](https://github.com/apache/polaris/blob/main/getting-started/spark/README.md). -* To shut down a locally-deployed Polaris server and clean up all related Docker containers, run the command listed below. Cloud Deployments have their respective termination commands on their Deployment page, while Polaris running on Gradle will terminate when the Gradle process terminates. -```shell -docker compose -p polaris -f getting-started/assets/postgres/docker-compose-postgres.yml -f getting-started/jdbc/docker-compose-bootstrap-db.yml -f getting-started/jdbc/docker-compose.yml down -``` - - diff --git a/1.0.0/metastores.md b/1.0.0/metastores.md deleted file mode 100644 index 4810b124a0..0000000000 --- a/1.0.0/metastores.md +++ /dev/null @@ -1,151 +0,0 @@ ---- -# -# Licensed to the Apache Software Foundation (ASF) under one -# or more contributor license agreements. See the NOTICE file -# distributed with this work for additional information -# regarding copyright ownership. The ASF licenses this file -# to you under the Apache License, Version 2.0 (the -# "License"); you may not use this file except in compliance -# with the License. You may obtain a copy of the License at -# -# http://www.apache.org/licenses/LICENSE-2.0 -# -# Unless required by applicable law or agreed to in writing, -# software distributed under the License is distributed on an -# "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY -# KIND, either express or implied. See the License for the -# specific language governing permissions and limitations -# under the License. -# -title: Metastores -type: docs -weight: 700 ---- - -This page explains how to configure and use Polaris metastores with either the recommended Relational JDBC or the -deprecated EclipseLink persistence backends. - -## Relational JDBC -This implementation leverages Quarkus for datasource management and supports configuration through -environment variables or JVM -D flags at startup. For more information, refer to the [Quarkus configuration reference](https://quarkus.io/guides/config-reference#env-file). - - -``` -POLARIS_PERSISTENCE_TYPE=relational-jdbc - -QUARKUS_DATASOURCE_DB_KIND=postgresql -QUARKUS_DATASOURCE_USERNAME= -QUARKUS_DATASOURCE_PASSWORD= -QUARKUS_DATASOURCE_JDBC_URL= -``` - -The Relational JDBC metastore currently relies on a Quarkus-managed datasource and supports only PostgresSQL and H2 databases. This limitation is similar to that of EclipseLink, primarily due to underlying schema differences. At this time, official documentation is provided exclusively for usage with PostgreSQL. -Please refer to the documentation here: -[Configure data sources in Quarkus](https://quarkus.io/guides/datasource) - -Additionally the retries can be configured via `polaris.persistence.relational.jdbc.*` properties please ref [configuration](./configuration.md) - -## EclipseLink (Deprecated) -> [!IMPORTANT] Eclipse link is deprecated, its recommend to use Relational JDBC as persistence instead. - -Polaris includes EclipseLink plugin by default with PostgresSQL driver. - -Configure the `polaris.persistence` section in your Polaris configuration file -(`application.properties`) as follows: - -``` -polaris.persistence.type=eclipse-link -polaris.persistence.eclipselink.configuration-file=/path/to/persistence.xml -polaris.persistence.eclipselink.persistence-unit=polaris -``` - -Alternatively, configuration can also be done with environment variables or system properties. Refer -to the [Quarkus Configuration Reference] for more information. - -The `configuration-file` option must point to an [EclipseLink configuration file]. This file, named -`persistence.xml`, is used to set up the database connection properties, which can differ depending -on the type of database and its configuration. - -> Note: You have to locate the `persistence.xml` at least two folders down to the root folder, e.g. `/deployments/config/persistence.xml` is OK, whereas `/deployments/persistence.xml` will cause an infinity loop. -[Quarkus Configuration Reference]: https://quarkus.io/guides/config-reference -[EclipseLink configuration file]: https://eclipse.dev/eclipselink/documentation/4.0/solutions/solutions.html#TESTINGJPA002 - -Polaris creates and connects to a separate database for each realm. Specifically, the `{realm}` placeholder in `jakarta.persistence.jdbc.url` is substituted with the actual realm name, allowing the Polaris server to connect to different databases based on the realm. - -> Note: some database systems such as Postgres don't create databases automatically. Database admins need to create them manually before running Polaris server. - -A single `persistence.xml` can describe multiple [persistence units](https://eclipse.dev/eclipselink/documentation/4.0/concepts/concepts.html#APPDEV001). For example, with both a `polaris-dev` and `polaris` persistence unit defined, you could use a single `persistence.xml` to easily switch between development and production databases. Use the `persistence-unit` option in the Polaris server configuration to easily switch between persistence units. - -### Using H2 - -> [!IMPORTANT] H2 is an in-memory database and is not suitable for production! - -The default [persistence.xml] in Polaris is already configured for H2, but you can easily customize -your H2 configuration using the persistence unit template below: - -[persistence.xml]: https://github.com/apache/polaris/blob/main/extension/persistence/eclipselink/src/main/resources/META-INF/persistence.xml - -```xml - - org.eclipse.persistence.jpa.PersistenceProvider - org.apache.polaris.extension.persistence.impl.eclipselink.models.ModelEntity - org.apache.polaris.extension.persistence.impl.eclipselink.models.ModelEntityActive - org.apache.polaris.extension.persistence.impl.eclipselink.models.ModelEntityChangeTracking - org.apache.polaris.extension.persistence.impl.eclipselink.models.ModelEntityDropped - org.apache.polaris.extension.persistence.impl.eclipselink.models.ModelGrantRecord - org.apache.polaris.extension.persistence.impl.eclipselink.models.ModelPrincipalSecrets - org.apache.polaris.extension.persistence.impl.eclipselink.models.ModelSequenceId - NONE - - - - - - - -``` - -To build Polaris with the necessary H2 dependency and start the Polaris service, run the following: - -```shell -./gradlew \ - :polaris-server:assemble \ - :polaris-server:quarkusAppPartsBuild --rerun \ - -PeclipseLinkDeps=com.h2database:h2:2.3.232 -java -Dpolaris.persistence.type=eclipse-link \ - -Dpolaris.persistence.eclipselink.configuration-file=/path/to/persistence.xml \ - -Dpolaris.persistence.eclipselink.persistence-unit=polaris \ - -jar runtime/server/build/quarkus-app/quarkus-run.jar -``` - -### Using Postgres - -PostgreSQL is included by default in the Polaris server distribution. - -The following shows a sample configuration for integrating Polaris with Postgres. - -```xml - - org.eclipse.persistence.jpa.PersistenceProvider - org.apache.polaris.extension.persistence.impl.eclipselink.models.ModelEntity - org.apache.polaris.extension.persistence.impl.eclipselink.models.ModelEntityActive - org.apache.polaris.extension.persistence.impl.eclipselink.models.ModelEntityChangeTracking - org.apache.polaris.extension.persistence.impl.eclipselink.models.ModelEntityDropped - org.apache.polaris.extension.persistence.impl.eclipselink.models.ModelGrantRecord - org.apache.polaris.extension.persistence.impl.eclipselink.models.ModelPrincipalSecrets - org.apache.polaris.extension.persistence.impl.eclipselink.models.ModelSequenceId - NONE - - - - - - - - - - -``` - diff --git a/1.0.0/polaris-management-service.md b/1.0.0/polaris-management-service.md deleted file mode 100644 index 0b66b9daa4..0000000000 --- a/1.0.0/polaris-management-service.md +++ /dev/null @@ -1,27 +0,0 @@ ---- -# -# Licensed to the Apache Software Foundation (ASF) under one -# or more contributor license agreements. See the NOTICE file -# distributed with this work for additional information -# regarding copyright ownership. The ASF licenses this file -# to you under the Apache License, Version 2.0 (the -# "License"); you may not use this file except in compliance -# with the License. You may obtain a copy of the License at -# -# http://www.apache.org/licenses/LICENSE-2.0 -# -# Unless required by applicable law or agreed to in writing, -# software distributed under the License is distributed on an -# "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY -# KIND, either express or implied. See the License for the -# specific language governing permissions and limitations -# under the License. -# -title: 'Apache Polaris Management Service OpenAPI' -linkTitle: 'Management OpenAPI' -weight: 800 -params: - show_page_toc: false ---- - -{{< redoc-polaris "polaris-management-service.yml" >}} diff --git a/1.0.0/polaris-spark-client.md b/1.0.0/polaris-spark-client.md deleted file mode 100644 index a34bceeced..0000000000 --- a/1.0.0/polaris-spark-client.md +++ /dev/null @@ -1,141 +0,0 @@ ---- -# -# Licensed to the Apache Software Foundation (ASF) under one -# or more contributor license agreements. See the NOTICE file -# distributed with this work for additional information -# regarding copyright ownership. The ASF licenses this file -# to you under the Apache License, Version 2.0 (the -# "License"); you may not use this file except in compliance -# with the License. You may obtain a copy of the License at -# -# http://www.apache.org/licenses/LICENSE-2.0 -# -# Unless required by applicable law or agreed to in writing, -# software distributed under the License is distributed on an -# "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY -# KIND, either express or implied. See the License for the -# specific language governing permissions and limitations -# under the License. -# -Title: Polaris Spark Client -type: docs -weight: 650 ---- - -Apache Polaris now provides Catalog support for Generic Tables (non-Iceberg tables), please check out -the [Catalog API Spec]({{% ref "polaris-catalog-service" %}}) for Generic Table API specs. - -Along with the Generic Table Catalog support, Polaris is also releasing a Spark client, which helps to -provide an end-to-end solution for Apache Spark to manage Delta tables using Polaris. - -Note the Polaris Spark client is able to handle both Iceberg and Delta tables, not just Delta. - -This page documents how to connect Spark with Polaris Service using the Polaris Spark client. - -## Quick Start with Local Polaris service -If you want to quickly try out the functionality with a local Polaris service, simply check out the Polaris repo -and follow the instructions in the Spark plugin getting-started -[README](https://github.com/apache/polaris/blob/main/plugins/spark/v3.5/getting-started/README.md). - -Check out the Polaris repo: -```shell -cd ~ -git clone https://github.com/apache/polaris.git -``` - -## Start Spark against a deployed Polaris service -Before starting, ensure that the deployed Polaris service supports Generic Tables, and that Spark 3.5(version 3.5.3 or later is installed). -Spark 3.5.5 is recommended, and you can follow the instructions below to get a Spark 3.5.5 distribution. -```shell -cd ~ -wget https://archive.apache.org/dist/spark/spark-3.5.5/spark-3.5.5-bin-hadoop3.tgz -mkdir spark-3.5 -tar xzvf spark-3.5.5-bin-hadoop3.tgz -C spark-3.5 --strip-components=1 -cd spark-3.5 -``` - -### Connecting with Spark using the Polaris Spark client -The following CLI command can be used to start the Spark with connection to the deployed Polaris service using -a released Polaris Spark client. - -```shell -bin/spark-shell \ ---packages ,org.apache.iceberg:iceberg-aws-bundle:1.9.0,io.delta:delta-spark_2.12:3.3.1 \ ---conf spark.sql.extensions=org.apache.iceberg.spark.extensions.IcebergSparkSessionExtensions,io.delta.sql.DeltaSparkSessionExtension \ ---conf spark.sql.catalog.spark_catalog=org.apache.spark.sql.delta.catalog.DeltaCatalog \ ---conf spark.sql.catalog..warehouse= \ ---conf spark.sql.catalog..header.X-Iceberg-Access-Delegation=vended-credentials \ ---conf spark.sql.catalog.=org.apache.polaris.spark.SparkCatalog \ ---conf spark.sql.catalog..uri= \ ---conf spark.sql.catalog..credential=':' \ ---conf spark.sql.catalog..scope='PRINCIPAL_ROLE:ALL' \ ---conf spark.sql.catalog..token-refresh-enabled=true -``` -Assume the released Polaris Spark client you want to use is `org.apache.polaris:polaris-spark-3.5_2.12:1.0.0`, -replace the `polaris-spark-client-package` field with the release. - -The `spark-catalog-name` is the catalog name you will use with Spark, and `polaris-catalog-name` is the catalog name used -by Polaris service, for simplicity, you can use the same name. - -Replace the `polaris-service-uri` with the uri of the deployed Polaris service. For example, with a locally deployed -Polaris service, the uri would be `http://localhost:8181/api/catalog`. - -For `client-id` and `client-secret` values, you can refer to [Using Polaris]({{% ref "getting-started/using-polaris" %}}) -for more details. - -You can also start the connection by programmatically initialize a SparkSession, following is an example with PySpark: -```python -from pyspark.sql import SparkSession - -spark = SparkSession.builder - .config("spark.jars.packages", ",org.apache.iceberg:iceberg-aws-bundle:1.9.0,io.delta:delta-spark_2.12:3.3.1") - .config("spark.sql.catalog.spark_catalog", "org.apache.spark.sql.delta.catalog.DeltaCatalog") - .config("spark.sql.extensions", "org.apache.iceberg.spark.extensions.IcebergSparkSessionExtensions,io.delta.sql.DeltaSparkSessionExtension") - .config("spark.sql.catalog.", "org.apache.polaris.spark.SparkCatalog") - .config("spark.sql.catalog..uri", ) - .config("spark.sql.catalog..token-refresh-enabled", "true") - .config("spark.sql.catalog..credential", ":") - .config("spark.sql.catalog..warehouse", ) - .config("spark.sql.catalog.polaris.scope", 'PRINCIPAL_ROLE:ALL') - .config("spark.sql.catalog.polaris.header.X-Iceberg-Access-Delegation", 'vended-credentials') - .getOrCreate() -``` -Similar as the CLI command, make sure the corresponding fields are replaced correctly. - -### Create tables with Spark -After Spark is started, you can use it to create and access Iceberg and Delta tables, for example: -```python -spark.sql("USE polaris") -spark.sql("CREATE NAMESPACE IF NOT EXISTS DELTA_NS") -spark.sql("CREATE NAMESPACE IF NOT EXISTS DELTA_NS.PUBLIC") -spark.sql("USE NAMESPACE DELTA_NS.PUBLIC") -spark.sql("""CREATE TABLE IF NOT EXISTS PEOPLE ( - id int, name string) -USING delta LOCATION 'file:///tmp/var/delta_tables/people'; -""") -``` - -## Connecting with Spark using local Polaris Spark client jar -If you would like to use a version of the Spark client that is currently not yet released, you can -build a Spark client jar locally from source. Please check out the Polaris repo and refer to the Spark plugin -[README](https://github.com/apache/polaris/blob/main/plugins/spark/README.md) for detailed instructions. - -## Limitations -The Polaris Spark client has the following functionality limitations: -1) Create table as select (CTAS) is not supported for Delta tables. As a result, the `saveAsTable` method of `Dataframe` - is also not supported, since it relies on the CTAS support. -2) Create a Delta table without explicit location is not supported. -3) Rename a Delta table is not supported. -4) ALTER TABLE ... SET LOCATION is not supported for DELTA table. -5) For other non-Iceberg tables like csv, it is not supported. - -## Iceberg Spark Client compatibility with Polaris Spark Client -The Polaris Spark client today depends on a specific Iceberg client version, and the version dependency is described -in the following table: - -| Spark Client Version | Iceberg Spark Client Version | -|----------------------|------------------------------| -| 1.0.0 | 1.9.0 | - -The Iceberg dependency is automatically downloaded when the Polaris package is downloaded, so there is no need to -add the Iceberg Spark client in the `packages` configuration. diff --git a/1.0.0/policy.md b/1.0.0/policy.md deleted file mode 100644 index 3f49353884..0000000000 --- a/1.0.0/policy.md +++ /dev/null @@ -1,197 +0,0 @@ ---- -# -# Licensed to the Apache Software Foundation (ASF) under one -# or more contributor license agreements. See the NOTICE file -# distributed with this work for additional information -# regarding copyright ownership. The ASF licenses this file -# to you under the Apache License, Version 2.0 (the -# "License"); you may not use this file except in compliance -# with the License. You may obtain a copy of the License at -# -# http://www.apache.org/licenses/LICENSE-2.0 -# -# Unless required by applicable law or agreed to in writing, -# software distributed under the License is distributed on an -# "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY -# KIND, either express or implied. See the License for the -# specific language governing permissions and limitations -# under the License. -# -title: Policy -type: docs -weight: 425 ---- - -The Polaris Policy framework empowers organizations to centrally define, manage, and enforce fine-grained governance, lifecycle, and operational rules across all data resources in the catalog. - -With the policy API, you can: -- Create and manage policies -- Attach policies to specific resources (catalogs, namespaces, tables, or views) -- Check applicable policies for any given resource - -## What is a Policy? - -A policy in Apache Polaris is a structured entity that defines rules governing actions on specified resources under -predefined conditions. Each policy contains: - -- **Name**: A unique identifier within a namespace -- **Type**: Determines the semantics and expected format of the policy content -- **Description**: Explains the purpose of the policy -- **Content**: Contains the actual rules defining the policy behavior -- **Version**: An automatically tracked revision number -- **Inheritable**: Whether the policy can be inherited by child resources, decided by its type - -### Policy Types - -Polaris supports several predefined system policy types (prefixed with `system.`): - -| Policy Type | Purpose | JSON-Schema | Applies To | -|-------------|-------------------------------------------------------|-------------|------------| -| **`system.data-compaction`** | Defines rules for data file compaction operations | [`data-compaction/2025-02-03.json`](https://polaris.apache.org/schemas/policies/system/data-compaction/2025-02-03.json) | Iceberg **table**, **namespace**, **catalog** | -| **`system.metadata-compaction`** | Defines rules for metadata file compaction operations | [`metadata-compaction/2025-02-03.json`](https://polaris.apache.org/schemas/policies/system/metadata-compaction/2025-02-03.json) | Iceberg **table**, **namespace**, **catalog** | -| **`system.orphan-file-removal`** | Defines rules for removing orphaned files | [`orphan-file-removal/2025-02-03.json`](https://polaris.apache.org/schemas/policies/system/orphan-file-removal/2025-02-03.json) | Iceberg **table**, **namespace**, **catalog** | -| **`system.snapshot-expiry`** | Defines rules for snapshot expiration | [`snapshot-expiry/2025-02-03.json`](https://polaris.apache.org/schemas/policies/system/snapshot-expiry/2025-02-03.json) | Iceberg **table**, **namespace**, **catalog** | - -Support for additional predefined system policy types and custom policy type definitions is in progress. -For more details, please refer to the [roadmap](https://github.com/apache/polaris/discussions/1028). - -### Policy Inheritance - -The entity hierarchy in Polaris is structured as follows: - -``` - Catalog - | - Namespace - | - +-----------+----------+ - | | | -Iceberg Iceberg Generic - Table View Table -``` - -Policies can be attached at any level, and inheritance flows from catalog down to namespace, then to tables and views. - -Policies can be inheritable or non-inheritable: - -- **Inheritable policies**: Apply to the target resource and all its applicable child resources -- **Non-inheritable policies**: Apply only to the specific target resource - -The inheritance follows an override mechanism: -1. Table-level policies override namespace and catalog policies -2. Namespace-level policies override parent namespace and catalog policies - -> **Important:** Because an override completely replaces the same policy type at higher levels, -> **only one instance of a given policy type can be attached to (and therefore affect) a resource**. - -## Working with Policies - -### Creating a Policy - -To create a policy, you need to provide a name, type, and optionally a description and content: - -```json -POST /polaris/v1/{prefix}/namespaces/{namespace}/policies -{ - "name": "compaction-policy", - "type": "system.data-compaction", - "description": "Policy for optimizing table storage", - "content": "{\"version\": \"2025-02-03\", \"enable\": true, \"config\": {\"target_file_size_bytes\": 134217728}}" -} -``` - -The policy content is validated against a schema specific to its type. Here are a few policy content examples: -- Data Compaction Policy -```json -{ - "version": "2025-02-03", - "enable": true, - "config": { - "target_file_size_bytes": 134217728, - "compaction_strategy": "bin-pack", - "max-concurrent-file-group-rewrites": 5 - } -} -``` -- Orphan File Removal Policy -```json -{ - "version": "2025-02-03", - "enable": true, - "max_orphan_file_age_in_days": 30, - "locations": ["s3://my-bucket/my-table-location"], - "config": { - "prefix_mismatch_mode": "ignore" - } -} -``` - -### Attaching Policies to Resources - -Policies can be attached to different resource levels: - -1. **Catalog level**: Applies to the entire catalog -2. **Namespace level**: Applies to a specific namespace -3. **Table-like level**: Applies to individual tables or views - -Example of attaching a policy to a table: - -```json -PUT /polaris/v1/{prefix}/namespaces/{namespace}/policies/{policy-name}/mappings -{ - "target": { - "type": "table-like", - "path": ["NS1", "NS2", "test_table_1"] - } -} -``` - -For inheritable policies, only one policy of a given type can be attached to a resource. For non-inheritable policies, -multiple policies of the same type can be attached. - -### Retrieving Applicable Policies -A user can view applicable policies on a resource (e.g., table, namespace, or catalog) as long as they have -read permission on that resource. - -Here is an example to find all policies that apply to a specific resource (including inherited policies): -``` -GET /polaris/v1/catalog/applicable-policies?namespace=finance%1Fquarterly&target-name=transactions -``` - -**Sample response:** -```json -{ - "policies": [ - { - "name": "snapshot-expiry-policy", - "type": "system.snapshot-expiry", - "appliedAt": "namespace", - "content": { - "version": "2025-02-03", - "enable": true, - "config": { - "min_snapshot_to_keep": 1, - "max_snapshot_age_days": 2, - "max_ref_age_days": 3 - } - } - }, - { - "name": "compaction-policy", - "type": "system.data-compaction", - "appliedAt": "catalog", - "content": { - "version": "2025-02-03", - "enable": true, - "config": { - "target_file_size_bytes": 134217728 - } - } - } - ] -} -``` - -### API Reference - -For the complete and up-to-date API specification, see the [policy-api.yaml](https://github.com/apache/polaris/blob/main/spec/polaris-catalog-apis/policy-apis.yaml). \ No newline at end of file diff --git a/1.0.0/realm.md b/1.0.0/realm.md deleted file mode 100644 index 9da5e7e25b..0000000000 --- a/1.0.0/realm.md +++ /dev/null @@ -1,53 +0,0 @@ ---- -# -# Licensed to the Apache Software Foundation (ASF) under one -# or more contributor license agreements. See the NOTICE file -# distributed with this work for additional information -# regarding copyright ownership. The ASF licenses this file -# to you under the Apache License, Version 2.0 (the -# "License"); you may not use this file except in compliance -# with the License. You may obtain a copy of the License at -# -# http://www.apache.org/licenses/LICENSE-2.0 -# -# Unless required by applicable law or agreed to in writing, -# software distributed under the License is distributed on an -# "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY -# KIND, either express or implied. See the License for the -# specific language governing permissions and limitations -# under the License. -# -Title: Realm -type: docs -weight: 350 ---- - -This page explains what a realm is and what it is used for in Polaris. - -### What is it? - -A realm in Polaris serves as logical partitioning mechanism within the catalog system. This isolation allows for multitenancy, enabling different teams, environments or organizations to operate independently within the same Polaris deployment. - -### Key Characteristics - -**Isolation:** Each realm encapsulates its own set of resources, ensuring that operations, policies in one realm do not affect others. - -**Authentication Context:** When configuring Polaris, principals credentials are associated with a specific realm. This allows for the separation of security concerns across different realms. - -**Configuration Scope:** Realm identifiers are used in various configurations, such as connection strings feature configurations, etc. - -An example of this is: - -`jdbc:postgresql://localhost:5432/{realm} -` -This ensures that each realm's data is stored separately. - -### How is it used in the system? - -**RealmContext:** It is a key concept used to identify and resolve the context in which operations are performed. For example `DefaultRealmContextResolver`, a realm is resolved from request headers, and operations are performed based on the resolved realm identifier. - -**Authentication and Authorization:** For example, in `BasePolarisAuthenticator`, `RealmContext` is used to provide context about the current security domain, which is used to retrieve the correct `PolarisMetastoreManager` that manages all Polaris entities and associated grant records metadata for -authorization. - -**Isolation:** In methods like `createEntityManagerFactory(@Nonnull RealmContext realmContext)` from `PolarisEclipseLinkPersistenceUnit` interface, the realm context influence how resources are created or managed based on the security policies of that realm. -An example of this is the way a realm name can be used to create a database connection url so that you have one database instance per realm, when applicable. Or it can be more granular and applied at primary key level (within the same database instance). \ No newline at end of file diff --git a/1.0.0/telemetry.md b/1.0.0/telemetry.md deleted file mode 100644 index 8df97f505d..0000000000 --- a/1.0.0/telemetry.md +++ /dev/null @@ -1,192 +0,0 @@ ---- -# -# Licensed to the Apache Software Foundation (ASF) under one -# or more contributor license agreements. See the NOTICE file -# distributed with this work for additional information -# regarding copyright ownership. The ASF licenses this file -# to you under the Apache License, Version 2.0 (the -# "License"); you may not use this file except in compliance -# with the License. You may obtain a copy of the License at -# -# http://www.apache.org/licenses/LICENSE-2.0 -# -# Unless required by applicable law or agreed to in writing, -# software distributed under the License is distributed on an -# "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY -# KIND, either express or implied. See the License for the -# specific language governing permissions and limitations -# under the License. -# -title: Telemetry -type: docs -weight: 450 ---- - -## Metrics - -Metrics are published using [Micrometer]; they are available from Polaris's management interface -(port 8282 by default) under the path `/q/metrics`. For example, if the server is running on -localhost, the metrics can be accessed via http://localhost:8282/q/metrics. - -[Micrometer]: https://quarkus.io/guides/telemetry-micrometer - -Metrics can be scraped by Prometheus or any compatible metrics scraping server. See: -[Prometheus](https://prometheus.io) for more information. - -Additional tags can be added to the metrics by setting the `polaris.metrics.tags.*` property. Each -tag is a key-value pair, where the key is the tag name and the value is the tag value. For example, -to add a tag `environment=prod` to all metrics, set `polaris.metrics.tags.environment=prod`. Many -tags can be added, such as below: - -```properties -polaris.metrics.tags.service=polaris -polaris.metrics.tags.environment=prod -polaris.metrics.tags.region=us-west-2 -``` - -Note that by default Polaris adds one tag: `application=Polaris`. You can override this tag by -setting the `polaris.metrics.tags.application=` property. - -### Realm ID Tag - -Polaris can add the realm ID as a tag to all API and HTTP request metrics. This is disabled by -default to prevent high cardinality issues, but can be enabled by setting the following properties: - -```properties -polaris.metrics.realm-id-tag.enable-in-api-metrics=true -polaris.metrics.realm-id-tag.enable-in-http-metrics=true -``` - -You should be particularly careful when enabling the realm ID tag in HTTP request metrics, as these -metrics typically have a much higher cardinality than API request metrics. - -In order to prevent the number of tags from growing indefinitely and causing performance issues or -crashing the server, the number of unique realm IDs in HTTP request metrics is limited to 100 by -default. If the number of unique realm IDs exceeds this value, a warning will be logged and no more -HTTP request metrics will be recorded. This threshold can be changed by setting the -`polaris.metrics.realm-id-tag.http-metrics-max-cardinality` property. - -## Traces - -Traces are published using [OpenTelemetry]. - -[OpenTelemetry]: https://quarkus.io/guides/opentelemetry-tracing - -By default OpenTelemetry is disabled in Polaris, because there is no reasonable default -for the collector endpoint for all cases. - -To enable OpenTelemetry and publish traces for Polaris set `quarkus.otel.sdk.disabled=false` -and configure a valid collector endpoint URL with `http://` or `https://` as the server property -`quarkus.otel.exporter.otlp.traces.endpoint`. - -_If these properties are not set, the server will not publish traces._ - -The collector must talk the OpenTelemetry protocol (OTLP) and the port must be its gRPC port -(by default 4317), e.g. "http://otlp-collector:4317". - -By default, Polaris adds a few attributes to the [OpenTelemetry Resource] to identify the server, -and notably: - -- `service.name`: set to `Apache Polaris Server (incubating)`; -- `service.version`: set to the Polaris version. - -[OpenTelemetry Resource]: https://opentelemetry.io/docs/languages/js/resources/ - -You can override the default resource attributes or add additional ones by setting the -`quarkus.otel.resource.attributes` property. - -This property expects a comma-separated list of key-value pairs, where the key is the attribute name -and the value is the attribute value. For example, to change the service name to `Polaris` and add -an attribute `deployment.environment=dev`, set the following property: - -```properties -quarkus.otel.resource.attributes=service.name=Polaris,deployment.environment=dev -``` - -The alternative syntax below can also be used: - -```properties -quarkus.otel.resource.attributes[0]=service.name=Polaris -quarkus.otel.resource.attributes[1]=deployment.environment=dev -``` - -Finally, two additional span attributes are added to all request parent spans: - -- `polaris.request.id`: The unique identifier of the request, if set by the caller through the - `Polaris-Request-Id` header. -- `polaris.realm`: The unique identifier of the realm. Always set (unless the request failed because - of a realm resolution error). - -### Troubleshooting Traces - -If the server is unable to publish traces, check first for a log warning message like the following: - -``` -SEVERE [io.ope.exp.int.grp.OkHttpGrpcExporter] (OkHttp http://localhost:4317/...) Failed to export spans. -The request could not be executed. Full error message: Failed to connect to localhost/0:0:0:0:0:0:0:1:4317 -``` - -This means that the server is unable to connect to the collector. Check that the collector is -running and that the URL is correct. - -## Logging - -Polaris relies on [Quarkus](https://quarkus.io/guides/logging) for logging. - -By default, logs are written to the console and to a file located in the `./logs` directory. The log -file is rotated daily and compressed. The maximum size of the log file is 10MB, and the maximum -number of backup files is 14. - -JSON logging can be enabled by setting the `quarkus.log.console.json` and `quarkus.log.file.json` -properties to `true`. By default, JSON logging is disabled. - -The log level can be set for the entire application or for specific packages. The default log level -is `INFO`. To set the log level for the entire application, use the `quarkus.log.level` property. - -To set the log level for a specific package, use the `quarkus.log.category."package-name".level`, -where `package-name` is the name of the package. For example, the package `io.smallrye.config` has a -useful logger to help debugging configuration issues; but it needs to be set to the `DEBUG` level. -This can be done by setting the following property: - -```properties -quarkus.log.category."io.smallrye.config".level=DEBUG -``` - -The log message format for both console and file output is highly configurable. The default format -is: - -``` -%d{yyyy-MM-dd HH:mm:ss,SSS} %-5p [%c{3.}] [%X{requestId},%X{realmId}] [%X{traceId},%X{parentId},%X{spanId},%X{sampled}] (%t) %s%e%n -``` - -Refer to the [Logging format](https://quarkus.io/guides/logging#logging-format) guide for more -information on placeholders and how to customize the log message format. - -### MDC Logging - -Polaris uses Mapped Diagnostic Context (MDC) to enrich log messages with additional context. The -following MDC keys are available: - -- `requestId`: The unique identifier of the request, if set by the caller through the - `Polaris-Request-Id` header. -- `realmId`: The unique identifier of the realm. Always set. -- `traceId`: The unique identifier of the trace. Present if tracing is enabled and the message is - originating from a traced context. -- `parentId`: The unique identifier of the parent span. Present if tracing is enabled and the - message is originating from a traced context. -- `spanId`: The unique identifier of the span. Present if tracing is enabled and the message is - originating from a traced context. -- `sampled`: Whether the trace has been sampled. Present if tracing is enabled and the message is - originating from a traced context. - -Other MDC keys can be added by setting the `polaris.log.mdc.*` property. Each property is a -key-value pair, where the key is the MDC key name and the value is the MDC key value. For example, -to add the MDC keys `environment=prod` and `region=us-west-2` to all log messages, set the following -properties: - -```properties -polaris.log.mdc.environment=prod -polaris.log.mdc.region=us-west-2 -``` - -MDC context is propagated across threads, including in `TaskExecutor` threads. \ No newline at end of file diff --git a/1.0.1/_index.md b/1.0.1/_index.md deleted file mode 100644 index c721a8c180..0000000000 --- a/1.0.1/_index.md +++ /dev/null @@ -1,180 +0,0 @@ ---- -# -# Licensed to the Apache Software Foundation (ASF) under one -# or more contributor license agreements. See the NOTICE file -# distributed with this work for additional information -# regarding copyright ownership. The ASF licenses this file -# to you under the Apache License, Version 2.0 (the -# "License"); you may not use this file except in compliance -# with the License. You may obtain a copy of the License at -# -# http://www.apache.org/licenses/LICENSE-2.0 -# -# Unless required by applicable law or agreed to in writing, -# software distributed under the License is distributed on an -# "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY -# KIND, either express or implied. See the License for the -# specific language governing permissions and limitations -# under the License. -# -linkTitle: '1.0.1' -title: 'Overview' -type: docs -weight: 200 -params: - top_hidden: true - show_page_toc: false -cascade: - type: docs - params: - show_page_toc: true -# This file will NOT be copied into a new release's versioned docs folder. ---- - -Apache Polaris (Incubating) is a catalog implementation for Apache Iceberg™ tables and is built on the open source Apache Iceberg™ REST protocol. - -With Polaris, you can provide centralized, secure read and write access to your Iceberg tables across different REST-compatible query engines. - -![Conceptual diagram of Apache Polaris (Incubating).](/img/overview.svg "Apache Polaris (Incubating) overview") - -## Key concepts - -This section introduces key concepts associated with using Apache Polaris (Incubating). - -In the following diagram, a sample [Apache Polaris (Incubating) structure](#catalog) with nested [namespaces](#namespace) is shown for Catalog1. No tables -or namespaces have been created yet for Catalog2 or Catalog3. - -![Diagram that shows an example Apache Polaris (Incubating) structure.](/img/sample-catalog-structure.svg "Sample Apache Polaris (Incubating) structure") - -### Catalog - -In Polaris, you can create one or more catalog resources to organize Iceberg tables. - -Configure your catalog by setting values in the storage configuration for S3, Azure, or Google Cloud Storage. An Iceberg catalog enables a -query engine to manage and organize tables. The catalog forms the first architectural layer in the [Apache Iceberg™ table specification](https://iceberg.apache.org/spec/#overview) and must support the following tasks: - -- Storing the current metadata pointer for one or more Iceberg tables. A metadata pointer maps a table name to the location of that table's - current metadata file. - -- Performing atomic operations so that you can update the current metadata pointer for a table to the metadata pointer of a new version of - the table. - -To learn more about Iceberg catalogs, see the [Apache Iceberg™ documentation](https://iceberg.apache.org/concepts/catalog/). - -#### Catalog types - -A catalog can be one of the following two types: - -- Internal: The catalog is managed by Polaris. Tables from this catalog can be read and written in Polaris. - -- External: The catalog is externally managed by another Iceberg catalog provider (for example, Snowflake, Glue, Dremio Arctic). Tables from - this catalog are synced to Polaris. These tables are read-only in Polaris. - -A catalog is configured with a storage configuration that can point to S3, Azure storage, or GCS. - -### Namespace - -You create *namespaces* to logically group Iceberg tables within a catalog. A catalog can have multiple namespaces. You can also create -nested namespaces. Iceberg tables belong to namespaces. - -> **Important** -> -> For the access privileges defined for a catalog to be enforced correctly, the following conditions must be met: -> -> - The directory only contains the data files that belong to a single table. -> - The directory hierarchy matches the namespace hierarchy for the catalog. -> -> For example, if a catalog includes the following items: -> -> - Top-level namespace namespace1 -> - Nested namespace namespace1a -> - A customers table, which is grouped under nested namespace namespace1a -> - An orders table, which is grouped under nested namespace namespace1a -> -> The directory hierarchy for the catalog must follow this structure: -> -> - /namespace1/namespace1a/customers/ -> - /namespace1/namespace1a/orders/ - -### Storage configuration - -A storage configuration stores a generated identity and access management (IAM) entity for your cloud storage and is created -when you create a catalog. The storage configuration is used to set the values to connect Polaris to your cloud storage. During the -catalog creation process, an IAM entity is generated and used to create a trust relationship between the cloud storage provider and Polaris -Catalog. - -When you create a catalog, you supply the following information about your cloud storage: - -| Cloud storage provider | Information | -| -----------------------| ----------- | -| Amazon S3 |
  • Default base location for your Amazon S3 bucket
  • Locations for your Amazon S3 bucket
  • S3 role ARN
  • External ID (optional)
| -| Google Cloud Storage (GCS) |
  • Default base location for your GCS bucket
  • Locations for your GCS bucket
| -| Azure |
  • Default base location for your Microsoft Azure container
  • Locations for your Microsoft Azure container
  • Azure tenant ID
| - -## Example workflow - -In the following example workflow, Bob creates an Apache Iceberg™ table named Table1 and Alice reads data from Table1. - -1. Bob uses Apache Spark™ to create the Table1 table under the - Namespace1 namespace in the Catalog1 catalog and insert values into - Table1. - - Bob can create Table1 and insert data into it because he is using a - service connection with a service principal that has - the privileges to perform these actions. - -2. Alice uses Snowflake to read data from Table1. - - Alice can read data from Table1 because she is using a service - connection with a service principal with a catalog integration that - has the privileges to perform this action. Alice - creates an unmanaged table in Snowflake to read data from Table1. - -![Diagram that shows an example workflow for Apache Polaris (Incubating)](/img/example-workflow.svg "Example workflow for Apache Polaris (Incubating)") - -## Security and access control - -### Credential vending - -To secure interactions with service connections, Polaris vends temporary storage credentials to the query engine during query -execution. These credentials allow the query engine to run the query without requiring access to your cloud storage for -Iceberg tables. This process is called credential vending. - -As of now, the following limitation is known regarding Apache Iceberg support: - -- **remove_orphan_files:** Apache Spark can't use credential vending - for this due to a known issue. See [apache/iceberg#7914](https://github.com/apache/iceberg/pull/7914) for details. - -### Identity and access management (IAM) - -Polaris uses the identity and access management (IAM) entity to securely connect to your storage for accessing table data, Iceberg -metadata, and manifest files that store the table schema, partitions, and other metadata. Polaris retains the IAM entity for your -storage location. - -### Access control - -Polaris enforces the access control that you configure across all tables registered with the service and governs security for all -queries from query engines in a consistent manner. - -Polaris uses a role-based access control (RBAC) model that lets you centrally configure access for Polaris service principals to catalogs, -namespaces, and tables. - -Polaris RBAC uses two different role types to delegate privileges: - -- **Principal roles:** Granted to Polaris service principals and - analogous to roles in other access control systems that you grant to - service principals. - -- **Catalog roles:** Configured with certain privileges on Polaris - catalog resources and granted to principal roles. - -For more information, see [Access control]({{% ref "access-control" %}}). - -## Legal Notices - -Apache®, Apache Iceberg™, Apache Spark™, Apache Flink®, and Flink® are either registered trademarks or trademarks of the Apache Software Foundation in the United States and/or other countries. - - - diff --git a/1.0.1/access-control.md b/1.0.1/access-control.md deleted file mode 100644 index f8c21ab781..0000000000 --- a/1.0.1/access-control.md +++ /dev/null @@ -1,212 +0,0 @@ ---- -# -# Licensed to the Apache Software Foundation (ASF) under one -# or more contributor license agreements. See the NOTICE file -# distributed with this work for additional information -# regarding copyright ownership. The ASF licenses this file -# to you under the Apache License, Version 2.0 (the -# "License"); you may not use this file except in compliance -# with the License. You may obtain a copy of the License at -# -# http://www.apache.org/licenses/LICENSE-2.0 -# -# Unless required by applicable law or agreed to in writing, -# software distributed under the License is distributed on an -# "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY -# KIND, either express or implied. See the License for the -# specific language governing permissions and limitations -# under the License. -# -Title: Access Control -type: docs -weight: 500 ---- - -This section provides information about how access control works for Apache Polaris (Incubating). - -Polaris uses a role-based access control (RBAC) model in which the Polaris administrator assigns access privileges to catalog roles -and then grants access to resources to service principals by assigning catalog roles to principal roles. - -These are the key concepts to understanding access control in Polaris: - -- **Securable object** -- **Principal role** -- **Catalog role** -- **Privilege** - -## Securable object - -A securable object is an object to which access can be granted. Polaris -has the following securable objects: - -- Catalog -- Namespace -- Iceberg table -- View - -## Principal role - -A principal role is a resource in Polaris that you can use to logically group Polaris service principals together and grant privileges on -securable objects. - -Polaris supports a many-to-one relationship between service principals and principal roles. For example, to grant the same privileges to -multiple service principals, you can grant a single principal role to those service principals. A service principal can be granted one -principal role. When registering a service connection, the Polaris administrator specifies the principal role that is granted to the -service principal. - -You don't grant privileges directly to a principal role. Instead, you configure object permissions at the catalog role level, and then grant -catalog roles to a principal role. - -The following table shows examples of principal roles that you might configure in Polaris: - -| Principal role name | Description | -| -----------------------| ----------- | -| Data_engineer | A role that is granted to multiple service principals for running data engineering jobs. | -| Data_scientist | A role that is granted to multiple service principals for running data science or AI jobs. | - -## Catalog role - -A catalog role belongs to a particular catalog resource in Polaris and specifies a set of permissions for actions on the catalog or objects -in the catalog, such as catalog namespaces or tables. You can create one or more catalog roles for a catalog. - -You grant privileges to a catalog role and then grant the catalog role to a principal role to bestow the privileges to one or more service -principals. - -> **Note** -> -> If you update the privileges bestowed to a service principal, the updates won't take effect for up to one hour. This means that if you -> revoke or grant some privileges for a catalog, the updated privileges won't take effect on any service principal with access to that catalog -> for up to one hour. - -Polaris also supports a many-to-many relationship between catalog roles and principal roles. You can grant the same catalog role to one or more -principal roles. Likewise, a principal role can be granted to one or more catalog roles. - -The following table displays examples of catalog roles that you might -configure in Polaris: - -| Example Catalog role | Description| -| -----------------------|-----------| -| Catalog administrators | A role that has been granted multiple privileges to emulate full access to the catalog.
Principal roles that have been granted this role are permitted to create, alter, read, write, and drop tables in the catalog. | -| Catalog readers | A role that has been granted read-only privileges to tables in the catalog.
Principal roles that have been granted this role are allowed to read from tables in the catalog. | -| Catalog contributor | A role that has been granted read and write access privileges to all tables that belong to the catalog.
Principal roles that have been granted this role are allowed to perform read and write operations on tables in the catalog. | - -## RBAC model - -The following diagram illustrates the RBAC model used by Polaris. For each catalog, the Polaris administrator assigns access -privileges to catalog roles and then grants service principals access to resources by assigning catalog roles to principal roles. Polaris -supports a many-to-one relationship between service principals and principal roles. - -![Diagram that shows the RBAC model for Apache Polaris.](/img/rbac-model.svg "Apache Polaris RBAC model") - -## Access control privileges - -This section describes the privileges that are available in the Polaris access control model. Privileges are granted to catalog roles, catalog -roles are granted to principal roles, and principal roles are granted to service principals to specify the operations that service principals can -perform on objects in Polaris. - -> **Important** -> -> You can only grant privileges at the catalog level. Fine-grained access controls are not available. For example, you can grant read -> privileges to all tables in a catalog but not to an individual table in the catalog. - -To grant the full set of privileges (drop, list, read, write, etc.) on an object, you can use the *full privilege* option. - -### Table privileges - -| Privilege | Description | -| --------- | ----------- | -| TABLE_CREATE | Enables registering a table with the catalog. | -| TABLE_DROP | Enables dropping a table from the catalog. | -| TABLE_LIST | Enables listing any table in the catalog. | -| TABLE_READ_PROPERTIES | Enables reading properties of the table. | -| TABLE_WRITE_PROPERTIES | Enables configuring properties for the table. | -| TABLE_READ_DATA | Enables reading data from the table by receiving short-lived read-only storage credentials from the catalog. | -| TABLE_WRITE_DATA | Enables writing data to the table by receiving short-lived read+write storage credentials from the catalog. | -| TABLE_FULL_METADATA | Grants all table privileges, except TABLE_READ_DATA and TABLE_WRITE_DATA, which need to be granted individually. | -| TABLE_ATTACH_POLICY | Enables attaching policy to a table. | -| TABLE_DETACH_POLICY | Enables detaching policy from a table. | - -### View privileges - -| Privilege | Description | -| --------- | ----------- | -| VIEW_CREATE | Enables registering a view with the catalog. | -| VIEW_DROP | Enables dropping a view from the catalog. | -| VIEW_LIST | Enables listing any views in the catalog. | -| VIEW_READ_PROPERTIES | Enables reading all the view properties. | -| VIEW_WRITE_PROPERTIES | Enables configuring view properties. | -| VIEW_FULL_METADATA | Grants all view privileges. | - -### Namespace privileges - -| Privilege | Description | -| --------- | ----------- | -| NAMESPACE_CREATE | Enables creating a namespace in a catalog. | -| NAMESPACE_DROP | Enables dropping the namespace from the catalog. | -| NAMESPACE_LIST | Enables listing any object in the namespace, including nested namespaces and tables. | -| NAMESPACE_READ_PROPERTIES | Enables reading all the namespace properties. | -| NAMESPACE_WRITE_PROPERTIES | Enables configuring namespace properties. | -| NAMESPACE_FULL_METADATA | Grants all namespace privileges. | -| NAMESPACE_ATTACH_POLICY | Enables attaching policy to a namespace. | -| NAMESPACE_DETACH_POLICY | Enables detaching policy from a namespace. | - -### Catalog privileges - -| Privilege | Description | -| -----------------------| ----------- | -| CATALOG_MANAGE_ACCESS | Includes the ability to grant or revoke privileges on objects in a catalog to catalog roles, and the ability to grant or revoke catalog roles to or from principal roles. | -| CATALOG_MANAGE_CONTENT | Enables full management of content for the catalog. This privilege encompasses the following privileges:
  • CATALOG_MANAGE_METADATA
  • TABLE_FULL_METADATA
  • NAMESPACE_FULL_METADATA
  • VIEW_FULL_METADATA
  • TABLE_WRITE_DATA
  • TABLE_READ_DATA
  • CATALOG_READ_PROPERTIES
  • CATALOG_WRITE_PROPERTIES
| -| CATALOG_MANAGE_METADATA | Enables full management of the catalog, catalog roles, namespaces, and tables. | -| CATALOG_READ_PROPERTIES | Enables listing catalogs and reading properties of the catalog. | -| CATALOG_WRITE_PROPERTIES | Enables configuring catalog properties. | -| CATALOG_ATTACH_POLICY | Enables attaching policy to a catalog. | -| CATALOG_DETACH_POLICY | Enables detaching policy from a catalog. | - -### Policy privileges - -| Privilege | Description | -| -----------------------| ----------- | -| POLICY_CREATE | Enables creating a policy under specified namespace. | -| POLICY_READ | Enables reading policy content and metadata. | -| POLICY_WRITE | Enables updating the policy details such as its content or description. | -| POLICY_LIST | Enables listing any policy from the catalog. | -| POLICY_DROP | Enables dropping a policy if it is not attached to any resource entity. | -| POLICY_FULL_METADATA | Grants all policy privileges. | -| POLICY_ATTACH | Enables policy to be attached to entities. | -| POLICY_DETACH | Enables policy to be detached from entities. | - -## RBAC example - -The following diagram illustrates how RBAC works in Polaris and -includes the following users: - -- **Alice:** A service admin who signs up for Polaris. Alice can - create service principals. She can also create catalogs and - namespaces and configure access control for Polaris resources. - -- **Bob:** A data engineer who uses Apache Spark™ to - interact with Polaris. - - - Alice has created a service principal for Bob. It has been - granted the Data_engineer principal role, which in turn has been - granted the following catalog roles: Catalog contributor and - Data administrator (for both the Silver and Gold zone catalogs - in the following diagram). - - - The Catalog contributor role grants permission to create - namespaces and tables in the Bronze zone catalog. - - - The Data administrator roles grant full administrative rights to - the Silver zone catalog and Gold zone catalog. - -- **Mark:** A data scientist who uses trains models with data managed - by Polaris. - - - Alice has created a service principal for Mark. It has been - granted the Data_scientist principal role, which in turn has - been granted the catalog role named Catalog reader. - - - The Catalog reader role grants read-only access for a catalog - named Gold zone catalog. - -![Diagram that shows an example of how RBAC works in Apache Polaris.](/img/rbac-example.svg "Apache Polaris RBAC example") diff --git a/1.0.1/admin-tool.md b/1.0.1/admin-tool.md deleted file mode 100644 index 14f37b6f0f..0000000000 --- a/1.0.1/admin-tool.md +++ /dev/null @@ -1,142 +0,0 @@ ---- -# -# Licensed to the Apache Software Foundation (ASF) under one -# or more contributor license agreements. See the NOTICE file -# distributed with this work for additional information -# regarding copyright ownership. The ASF licenses this file -# to you under the Apache License, Version 2.0 (the -# "License"); you may not use this file except in compliance -# with the License. You may obtain a copy of the License at -# -# http://www.apache.org/licenses/LICENSE-2.0 -# -# Unless required by applicable law or agreed to in writing, -# software distributed under the License is distributed on an -# "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY -# KIND, either express or implied. See the License for the -# specific language governing permissions and limitations -# under the License. -# -title: Admin Tool -type: docs -weight: 300 ---- - -Polaris includes a tool for administrators to manage the metastore. - -The tool must be built with the necessary JDBC drivers to access the metastore database. For -example, to build the tool with support for Postgres, run the following: - -```shell -./gradlew \ - :polaris-admin:assemble \ - :polaris-admin:quarkusAppPartsBuild --rerun \ - -Dquarkus.container-image.build=true -``` - -The above command will generate: - -- One standalone JAR in `runtime/admin/build/polaris-admin-*-runner.jar` -- Two distribution archives in `runtime/admin/build/distributions` -- Two Docker images named `apache/polaris-admin-tool:latest` and `apache/polaris-admin-tool:` - -## Usage - -Please make sure the admin tool and Polaris server are with the same version before using it. -To run the standalone JAR, use the following command: - -```shell -java -jar runtime/admin/build/polaris-admin-*-runner.jar --help -``` - -To run the Docker image, use the following command: - -```shell -docker run apache/polaris-admin-tool:latest --help -``` - -The basic usage of the Polaris Admin Tool is outlined below: - -``` -Usage: polaris-admin-runner.jar [-hV] [COMMAND] -Polaris Admin Tool - -h, --help Show this help message and exit. - -V, --version Print version information and exit. -Commands: - help Display help information about the specified command. - bootstrap Bootstraps realms and principal credentials. - purge Purge principal credentials. -``` - -## Configuration - -The Polaris Admin Tool must be executed with the same configuration as the Polaris server. The -configuration can be done via environment variables or system properties. - -At a minimum, it is necessary to configure the Polaris Admin Tool to connect to the same database -used by the Polaris server. - -See the [metastore documentation]({{% ref "metastores" %}}) for more information on configuring the -database connection. - -Note: Polaris will always create schema 'polaris_schema' during bootstrap under the configured database. - -## Bootstrapping Realms and Principal Credentials - -The `bootstrap` command is used to bootstrap realms and create the necessary principal credentials -for the Polaris server. This command is idempotent and can be run multiple times without causing any -issues. If a realm is already bootstrapped, running the `bootstrap` command again will not have any -effect on that realm. - -```shell -java -jar runtime/admin/build/polaris-admin-*-runner.jar bootstrap --help -``` - -The basic usage of the `bootstrap` command is outlined below: - -``` -Usage: polaris-admin-runner.jar bootstrap [-hV] [-c=]... -r= [-r=]... -Bootstraps realms and root principal credentials. - -c, --credential= - Root principal credentials to bootstrap. Must be of the form - 'realm,clientId,clientSecret'. - -h, --help Show this help message and exit. - -r, --realm= The name of a realm to bootstrap. - -V, --version Print version information and exit. -``` - -For example, to bootstrap the `realm1` realm and create its root principal credential with the -client ID `admin` and client secret `admin`, you can run the following command: - -```shell -java -jar runtime/admin/build/polaris-admin-*-runner.jar bootstrap -r realm1 -c realm1,admin,admin -``` - -## Purging Realms and Principal Credentials - -The `purge` command is used to remove realms and principal credentials from the Polaris server. - -> Warning: Running the `purge` command will remove all data associated with the specified realms! - This includes all entities (catalogs, namespaces, tables, views, roles), all principal - credentials, grants, and any other data associated with the realms. - -```shell -java -jar runtime/admin/build/polaris-admin-*-runner.jar purge --help -``` - -The basic usage of the `purge` command is outlined below: - -``` -Usage: polaris-admin-runner.jar purge [-hV] -r= [-r=]... -Purge realms and all associated entities. - -h, --help Show this help message and exit. - -r, --realm= The name of a realm to purge. - -V, --version Print version information and exit. -``` - -For example, to purge the `realm1` realm, you can run the following command: - -```shell -java -jar runtime/admin/build/polaris-admin-*-runner.jar purge -r realm1 -``` \ No newline at end of file diff --git a/1.0.1/command-line-interface.md b/1.0.1/command-line-interface.md deleted file mode 100644 index f20210e2c6..0000000000 --- a/1.0.1/command-line-interface.md +++ /dev/null @@ -1,1224 +0,0 @@ ---- -# -# Licensed to the Apache Software Foundation (ASF) under one -# or more contributor license agreements. See the NOTICE file -# distributed with this work for additional information -# regarding copyright ownership. The ASF licenses this file -# to you under the Apache License, Version 2.0 (the -# "License"); you may not use this file except in compliance -# with the License. You may obtain a copy of the License at -# -# http://www.apache.org/licenses/LICENSE-2.0 -# -# Unless required by applicable law or agreed to in writing, -# software distributed under the License is distributed on an -# "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY -# KIND, either express or implied. See the License for the -# specific language governing permissions and limitations -# under the License. -# -title: Command Line Interface -type: docs -weight: 300 ---- - -In order to help administrators quickly set up and manage their Polaris server, Polaris provides a simple command-line interface (CLI) for common tasks. - -The basic syntax of the Polaris CLI is outlined below: - -``` -polaris [options] COMMAND ... - -options: ---host ---port ---base-url ---client-id ---client-secret ---access-token ---profile -``` - -`COMMAND` must be one of the following: -1. catalogs -2. principals -3. principal-roles -4. catalog-roles -5. namespaces -6. privileges -7. profiles - -Each _command_ supports several _subcommands_, and some _subcommands_ have _actions_ that come after the subcommand in turn. Finally, _arguments_ follow to form a full invocation. Within a set of named arguments at the end of an invocation ordering is generally not important. Many invocations also have a required positional argument of the type that the _command_ refers to. Again, the ordering of this positional argument relative to named arguments is not important. - -Some example full invocations: - -``` -polaris principals list -polaris catalogs delete some_catalog_name -polaris catalogs update --property foo=bar some_other_catalog -polaris catalogs update another_catalog --property k=v -polaris privileges namespace grant --namespace some.schema --catalog fourth_catalog --catalog-role some_catalog_role TABLE_READ_DATA -polaris profiles list -``` - -### Authentication - -As outlined above, the Polaris CLI may take credentials using the `--client-id` and `--client-secret` options. For example: - -``` -polaris --client-id 4b5ed1ca908c3cc2 --client-secret 07ea8e4edefb9a9e57c247e8d1a4f51c principals ... -``` - -If `--client-id` and `--client-secret` are not provided, the Polaris CLI will try to read the client ID and client secret from environment variables called `CLIENT_ID` and `CLIENT_SECRET` respectively. If these flags are not provided and the environment variables are not set, the CLI will fail. - -Alternatively, the `--access-token` option can be used instead of `--client-id` and `--client-secret`, but both authentication methods cannot be used simultaneously. - -Additionally, the `--profile` option can be used to specify a saved profile instead of providing authentication details directly. If `--profile` is not provided, the CLI will check the `CLIENT_PROFILE` environment variable. Profiles store authentication details and connection settings, simplifying repeated CLI usage. - -If the `--host` and `--port` options are not provided, the CLI will default to communicating with `localhost:8181`. - -Alternatively, the `--base-url` option can be used instead of `--host` and `--port`, but both options cannot be used simultaneously. This allows specifying arbitrary Polaris URLs, including HTTPS ones, that have additional base prefixes before the `/api/*/v1` subpaths. - -### PATH - -These examples assume the Polaris CLI is on the PATH and so can be invoked just by the command `polaris`. You can add the CLI to your PATH environment variable with a command like the following: - -``` -export PATH="~/polaris:$PATH" -``` - -Alternatively, you can run the CLI by providing a path to it, such as with the following invocation: - -``` -~/polaris principals list -``` - -## Commands - -Each of the commands `catalogs`, `principals`, `principal-roles`, `catalog-roles`, and `privileges` is used to manage a different type of entity within Polaris. - -In addition to these, the `profiles` command is available for managing stored authentication profiles, allowing login credentials to be configured for reuse. This provides an alternative to passing authentication details with every command. - -To find details on the options that can be provided to a particular command or subcommand ad-hoc, you may wish to use the `--help` flag. For example: - -``` -polaris catalogs --help -polaris principals create --help -polaris profiles --help -``` - -### catalogs - -The `catalogs` command is used to create, discover, and otherwise manage catalogs within Polaris. - -`catalogs` supports the following subcommands: - -1. create -2. delete -3. get -4. list -5. update - -#### create - -The `create` subcommand is used to create a catalog. - -``` -input: polaris catalogs create --help -options: - create - Named arguments: - --type The type of catalog to create in [INTERNAL, EXTERNAL]. INTERNAL by default. - --storage-type (Required) The type of storage to use for the catalog - --default-base-location (Required) Default base location of the catalog - --allowed-location An allowed location for files tracked by the catalog. Multiple locations can be provided by specifying this option more than once. - --role-arn (Required for S3) A role ARN to use when connecting to S3 - --external-id (Only for S3) The external ID to use when connecting to S3 - --tenant-id (Required for Azure) A tenant ID to use when connecting to Azure Storage - --multi-tenant-app-name (Only for Azure) The app name to use when connecting to Azure Storage - --consent-url (Only for Azure) A consent URL granting permissions for the Azure Storage location - --service-account (Only for GCS) The service account to use when connecting to GCS - --property A key/value pair such as: tag=value. Multiple can be provided by specifying this option more than once - Positional arguments: - catalog -``` - -##### Examples - -``` -polaris catalogs create \ - --storage-type s3 \ - --default-base-location s3://example-bucket/my_data \ - --role-arn ${ROLE_ARN} \ - my_catalog - -polaris catalogs create \ - --storage-type s3 \ - --default-base-location s3://example-bucket/my_other_data \ - --allowed-location s3://example-bucket/second_location \ - --allowed-location s3://other-bucket/third_location \ - --role-arn ${ROLE_ARN} \ - my_other_catalog - -polaris catalogs create \ - --storage-type file \ - --default-base-location file:///example/tmp \ - quickstart_catalog -``` - -#### delete - -The `delete` subcommand is used to delete a catalog. - -``` -input: polaris catalogs delete --help -options: - delete - Positional arguments: - catalog -``` - -##### Examples - -``` -polaris catalogs delete some_catalog -``` - -#### get - -The `get` subcommand is used to retrieve details about a catalog. - -``` -input: polaris catalogs get --help -options: - get - Positional arguments: - catalog -``` - -##### Examples - -``` -polaris catalogs get some_catalog - -polaris catalogs get another_catalog -``` - -#### list - -The `list` subcommand is used to show details about all catalogs, or those that a certain principal role has access to. The principal used to perform this operation must have the `CATALOG_LIST` privilege. - -``` -input: polaris catalogs list --help -options: - list - Named arguments: - --principal-role The name of a principal role -``` - -##### Examples - -``` -polaris catalogs list - -polaris catalogs list --principal-role some_user -``` - -#### update - -The `update` subcommand is used to update a catalog. Currently, this command supports changing the properties of a catalog or updating its storage configuration. - -``` -input: polaris catalogs update --help -options: - update - Named arguments: - --default-base-location (Required) Default base location of the catalog - --allowed-location An allowed location for files tracked by the catalog. Multiple locations can be provided by specifying this option more than once. - --property A key/value pair such as: tag=value. Multiple can be provided by specifying this option more than once - Positional arguments: - catalog -``` - -##### Examples - -``` -polaris catalogs update --property tag=new_value my_catalog - -polaris catalogs update --default-base-location s3://new-bucket/my_data my_catalog -``` - -### Principals - -The `principals` command is used to manage principals within Polaris. - -`principals` supports the following subcommands: - -1. create -2. delete -3. get -4. list -5. rotate-credentials -6. update -7. access - -#### create - -The `create` subcommand is used to create a new principal. - -``` -input: polaris principals create --help -options: - create - Named arguments: - --type The type of principal to create in [SERVICE] - --property A key/value pair such as: tag=value. Multiple can be provided by specifying this option more than once - Positional arguments: - principal -``` - -##### Examples - -``` -polaris principals create some_user - -polaris principals create --client-id ${CLIENT_ID} --property admin=true some_admin_user -``` - -#### delete - -The `delete` subcommand is used to delete a principal. - -``` -input: polaris principals delete --help -options: - delete - Positional arguments: - principal -``` - -##### Examples - -``` -polaris principals delete some_user - -polaris principals delete some_admin_user -``` - -#### get - -The `get` subcommand retrieves details about a principal. - -``` -input: polaris principals get --help -options: - get - Positional arguments: - principal -``` - -##### Examples - -``` -polaris principals get some_user - -polaris principals get some_admin_user -``` - -#### list - -The `list` subcommand shows details about all principals. - -##### Examples - -``` -polaris principals list -``` - -#### rotate-credentials - -The `rotate-credentials` subcommand is used to update the credentials used by a principal. After this command runs successfully, the new credentials will be printed to stdout. - -``` -input: polaris principals rotate-credentials --help -options: - rotate-credentials - Positional arguments: - principal -``` - -##### Examples - -``` -polaris principals rotate-credentials some_user - -polaris principals rotate-credentials some_admin_user -``` - -#### update - -The `update` subcommand is used to update a principal. Currently, this supports rewriting the properties associated with a principal. - -``` -input: polaris principals update --help -options: - update - Named arguments: - --property A key/value pair such as: tag=value. Multiple can be provided by specifying this option more than once - Positional arguments: - principal -``` - -##### Examples - -``` -polaris principals update --property key=value --property other_key=other_value some_user - -polaris principals update --property are_other_keys_removed=yes some_user -``` - -#### access - -The `access` subcommand retrieves entities relation about a principal. - -``` -input: polaris principals access --help -options: - access - Positional arguments: - principal -``` - -##### Examples - -``` -polaris principals access quickstart_user -``` - -### Principal Roles - -The `principal-roles` command is used to create, discover, and manage principal roles within Polaris. Additionally, this command can identify principals or catalog roles associated with a principal role, and can be used to grant a principal role to a principal. - -`principal-roles` supports the following subcommands: - -1. create -2. delete -3. get -4. list -5. update -6. grant -7. revoke - -#### create - -The `create` subcommand is used to create a new principal role. - -``` -input: polaris principal-roles create --help -options: - create - Named arguments: - --property A key/value pair such as: tag=value. Multiple can be provided by specifying this option more than once - Positional arguments: - principal_role -``` - -##### Examples - -``` -polaris principal-roles create data_engineer - -polaris principal-roles create --property key=value data_analyst -``` - -#### delete - -The `delete` subcommand is used to delete a principal role. - -``` -input: polaris principal-roles delete --help -options: - delete - Positional arguments: - principal_role -``` - -##### Examples - -``` -polaris principal-roles delete data_engineer - -polaris principal-roles delete data_analyst -``` - -#### get - -The `get` subcommand retrieves details about a principal role. - -``` -input: polaris principal-roles get --help -options: - get - Positional arguments: - principal_role -``` - -##### Examples - -``` -polaris principal-roles get data_engineer - -polaris principal-roles get data_analyst -``` - -#### list - -The list subcommand is used to print out all principal roles or, alternatively, to list all principal roles associated with a given principal or with a given catalog role. - -``` -input: polaris principal-roles list --help -options: - list - Named arguments: - --catalog-role The name of a catalog role. If provided, show only principal roles assigned to this catalog role. - --principal The name of a principal. If provided, show only principal roles assigned to this principal. -``` - -##### Examples - -``` -polaris principal-roles list - -polaris principal-roles --principal d.knuth - -polaris principal-roles --catalog-role super_secret_data -``` - -#### update - -The `update` subcommand is used to update a principal role. Currently, this supports updating the properties tied to a principal role. - -``` -input: polaris principal-roles update --help -options: - update - Named arguments: - --property A key/value pair such as: tag=value. Multiple can be provided by specifying this option more than once - Positional arguments: - principal_role -``` - -##### Examples - -``` -polaris principal-roles update --property key=value2 data_engineer - -polaris principal-roles update data_analyst --property key=value3 -``` - -#### grant - -The `grant` subcommand is used to grant a principal role to a principal. - -``` -input: polaris principal-roles grant --help -options: - grant - Named arguments: - --principal A principal to grant this principal role to - Positional arguments: - principal_role -``` - -##### Examples - -``` -polaris principal-roles grant --principal d.knuth data_engineer - -polaris principal-roles grant data_scientist --principal a.ng -``` - -#### revoke - -The `revoke` subcommand is used to revoke a principal role from a principal. - -``` -input: polaris principal-roles revoke --help -options: - revoke - Named arguments: - --principal A principal to revoke this principal role from - Positional arguments: - principal_role -``` - -##### Examples - -``` -polaris principal-roles revoke --principal former.employee data_engineer - -polaris principal-roles revoke data_scientist --principal changed.role -``` - -### Catalog Roles - -The catalog-roles command is used to create, discover, and manage catalog roles within Polaris. Additionally, this command can be used to grant a catalog role to a principal role. - -`catalog-roles` supports the following subcommands: - -1. create -2. delete -3. get -4. list -5. update -6. grant -7. revoke - -#### create - -The `create` subcommand is used to create a new catalog role. - -``` -input: polaris catalog-roles create --help -options: - create - Named arguments: - --catalog The name of an existing catalog - --property A key/value pair such as: tag=value. Multiple can be provided by specifying this option more than once - Positional arguments: - catalog_role -``` - -##### Examples - -``` -polaris catalog-roles create --property key=value --catalog some_catalog sales_data - -polaris catalog-roles create --catalog other_catalog sales_data -``` - -#### delete - -The `delete` subcommand is used to delete a catalog role. - -``` -input: polaris catalog-roles delete --help -options: - delete - Named arguments: - --catalog The name of an existing catalog - Positional arguments: - catalog_role -``` - -##### Examples - -``` -polaris catalog-roles delete --catalog some_catalog sales_data - -polaris catalog-roles delete --catalog other_catalog sales_data -``` - -#### get - -The `get` subcommand retrieves details about a catalog role. - -``` -input: polaris catalog-roles get --help -options: - get - Named arguments: - --catalog The name of an existing catalog - Positional arguments: - catalog_role -``` - -##### Examples - -``` -polaris catalog-roles get --catalog some_catalog inventory_data - -polaris catalog-roles get --catalog other_catalog inventory_data -``` - -#### list - -The `list` subcommand is used to print all catalog roles. Alternatively, if a principal role is provided, only catalog roles associated with that principal are shown. - -``` -input: polaris catalog-roles list --help -options: - list - Named arguments: - --principal-role The name of a principal role - Positional arguments: - catalog -``` - -##### Examples - -``` -polaris catalog-roles list - -polaris catalog-roles list --principal-role data_engineer -``` - -#### update - -The `update` subcommand is used to update a catalog role. Currently, only updating properties associated with the catalog role is supported. - -``` -input: polaris catalog-roles update --help -options: - update - Named arguments: - --catalog The name of an existing catalog - --property A key/value pair such as: tag=value. Multiple can be provided by specifying this option more than once - Positional arguments: - catalog_role -``` - -##### Examples - -``` -polaris catalog-roles update --property contains_pii=true --catalog some_catalog sales_data - -polaris catalog-roles update sales_data --catalog some_catalog --property key=value -``` - -#### grant - -The `grant` subcommand is used to grant a catalog role to a principal role. - -``` -input: polaris catalog-roles grant --help -options: - grant - Named arguments: - --catalog The name of an existing catalog - --principal-role The name of a catalog role - Positional arguments: - catalog_role -``` - -##### Examples - -``` -polaris catalog-roles grant sensitive_data --catalog some_catalog --principal-role power_user - -polaris catalog-roles grant --catalog sales_data contains_cc_info_catalog_role --principal-role financial_analyst_role -``` - -#### revoke - -The `revoke` subcommand is used to revoke a catalog role from a principal role. - -``` -input: polaris catalog-roles revoke --help -options: - revoke - Named arguments: - --catalog The name of an existing catalog - --principal-role The name of a catalog role - Positional arguments: - catalog_role -``` - -##### Examples - -``` -polaris catalog-roles revoke sensitive_data --catalog some_catalog --principal-role power_user - -polaris catalog-roles revoke --catalog sales_data contains_cc_info_catalog_role --principal-role financial_analyst_role -``` - -### Namespaces - -The `namespaces` command is used to manage namespaces within Polaris. - -`namespaces` supports the following subcommands: - -1. create -2. delete -3. get -4. list - -#### create - -The `create` subcommand is used to create a new namespace. - -When creating a namespace with an explicit location, that location must reside within the parent catalog or namespace. - -``` -input: polaris namespaces create --help -options: - create - Named arguments: - --catalog The name of an existing catalog - --location If specified, the location at which to store the namespace and entities inside it - --property A key/value pair such as: tag=value. Multiple can be provided by specifying this option more than once - Positional arguments: - namespace -``` - -##### Examples - -``` -polaris namespaces create --catalog my_catalog outer - -polaris namespaces create --catalog my_catalog --location 's3://bucket/outer/inner_SUFFIX' outer.inner -``` - -#### delete - -The `delete` subcommand is used to delete a namespace. - -``` -input: polaris namespaces delete --help -options: - delete - Named arguments: - --catalog The name of an existing catalog - Positional arguments: - namespace -``` - -##### Examples - -``` -polaris namespaces delete outer_namespace.inner_namespace --catalog my_catalog - -polaris namespaces delete --catalog my_catalog outer_namespace -``` - -#### get - -The `get` subcommand retrieves details about a namespace. - -``` -input: polaris namespaces get --help -options: - get - Named arguments: - --catalog The name of an existing catalog - Positional arguments: - namespace -``` - -##### Examples - -``` -polaris namespaces get --catalog some_catalog a.b - -polaris namespaces get a.b.c --catalog some_catalog -``` - -#### list - -The `list` subcommand shows details about all namespaces directly within a catalog or, optionally, within some parent prefix in that catalog. - -``` -input: polaris namespaces list --help -options: - list - Named arguments: - --catalog The name of an existing catalog - --parent If specified, list namespaces inside this parent namespace -``` - -##### Examples - -``` -polaris namespaces list --catalog my_catalog - -polaris namespaces list --catalog my_catalog --parent a - -polaris namespaces list --catalog my_catalog --parent a.b -``` - -### Privileges - -The `privileges` command is used to grant various privileges to a catalog role, or to revoke those privileges. Privileges can be on the level of a catalog, a namespace, a table, or a view. For more information on privileges, please refer to the [docs]({{% ref "entities#privilege" %}}). - -Note that when using the `privileges` command, the user specifies the relevant catalog and catalog role before selecting a subcommand. - -`privileges` supports the following subcommands: - -1. list -2. catalog -3. namespace -4. table -5. view - -Each of these subcommands, except `list`, supports the `grant` and `revoke` actions and requires an action to be specified. - -Note that each subcommand's `revoke` action always accepts the same options that the corresponding `grant` action does, but with the addition of the `cascade` option. `cascade` is used to revoke all other privileges that depend on the specified privilege. - -#### list - -The `list` subcommand shows details about all privileges for a catalog role. - -``` -input: polaris privileges list --help -options: - list - Named arguments: - --catalog The name of an existing catalog - --catalog-role The name of a catalog role -``` - -##### Examples - -``` -polaris privileges list --catalog my_catalog --catalog-role my_role - -polaris privileges my_role list --catalog-role my_other_role --catalog my_catalog -``` - -#### catalog - -The `catalog` subcommand manages privileges at the catalog level. `grant` is used to grant catalog privileges to the specified catalog role, and `revoke` is used to revoke them. - -``` -input: polaris privileges catalog --help -options: - catalog - grant - Named arguments: - --catalog The name of an existing catalog - --catalog-role The name of a catalog role - Positional arguments: - privilege - revoke - Named arguments: - --cascade When revoking privileges, additionally revoke privileges that depend on the specified privilege - --catalog The name of an existing catalog - --catalog-role The name of a catalog role - Positional arguments: - privilege -``` - -##### Examples - -``` -polaris privileges \ - catalog \ - grant \ - --catalog my_catalog \ - --catalog-role catalog_role \ - TABLE_CREATE - -polaris privileges \ - catalog \ - revoke \ - --catalog my_catalog \ - --catalog-role catalog_role \ - --cascade \ - TABLE_CREATE -``` - -#### namespace - -The `namespace` subcommand manages privileges at the namespace level. - -``` -input: polaris privileges namespace --help -options: - namespace - grant - Named arguments: - --namespace A period-delimited namespace - --catalog The name of an existing catalog - --catalog-role The name of a catalog role - Positional arguments: - privilege - revoke - Named arguments: - --namespace A period-delimited namespace - --cascade When revoking privileges, additionally revoke privileges that depend on the specified privilege - --catalog The name of an existing catalog - --catalog-role The name of a catalog role - Positional arguments: - privilege -``` - -##### Examples - -``` -polaris privileges \ - namespace \ - grant \ - --catalog my_catalog \ - --catalog-role catalog_role \ - --namespace a.b \ - TABLE_LIST - -polaris privileges \ - namespace \ - revoke \ - --catalog my_catalog \ - --catalog-role catalog_role \ - --namespace a.b \ - TABLE_LIST -``` - -#### table - -The `table` subcommand manages privileges at the table level. - -``` -input: polaris privileges table --help -options: - table - grant - Named arguments: - --namespace A period-delimited namespace - --table The name of a table - --catalog The name of an existing catalog - --catalog-role The name of a catalog role - Positional arguments: - privilege - revoke - Named arguments: - --namespace A period-delimited namespace - --table The name of a table - --cascade When revoking privileges, additionally revoke privileges that depend on the specified privilege - --catalog The name of an existing catalog - --catalog-role The name of a catalog role - Positional arguments: - privilege -``` - -##### Examples - -``` -polaris privileges \ - table \ - grant \ - --catalog my_catalog \ - --catalog-role catalog_role \ - --namespace a.b \ - --table t \ - TABLE_DROP - -polaris privileges \ - table \ - grant \ - --catalog my_catalog \ - --catalog-role catalog_role \ - --namespace a.b \ - --table t \ - --cascade \ - TABLE_DROP -``` - -#### view - -The `view` subcommand manages privileges at the view level. - -``` -input: polaris privileges view --help -options: - view - grant - Named arguments: - --namespace A period-delimited namespace - --view The name of a view - --catalog The name of an existing catalog - --catalog-role The name of a catalog role - Positional arguments: - privilege - revoke - Named arguments: - --namespace A period-delimited namespace - --view The name of a view - --cascade When revoking privileges, additionally revoke privileges that depend on the specified privilege - --catalog The name of an existing catalog - --catalog-role The name of a catalog role - Positional arguments: - privilege -``` - -##### Examples - -``` -polaris privileges \ - view \ - grant \ - --catalog my_catalog \ - --catalog-role catalog_role \ - --namespace a.b.c \ - --view v \ - VIEW_FULL_METADATA - -polaris privileges \ - view \ - grant \ - --catalog my_catalog \ - --catalog-role catalog_role \ - --namespace a.b.c \ - --view v \ - --cascade \ - VIEW_FULL_METADATA -``` - -### profiles - -The `profiles` command is used to manage stored authentication profiles in Polaris. Profiles allow authentication credentials to be saved and reused, eliminating the need to pass credentials with every command. - -`profiles` supports the following subcommands: - -1. create -2. delete -3. get -4. list -5. update - -#### create - -The `create` subcommand is used to create a new authentication profile. - -``` -input: polaris profiles create --help -options: - create - Positional arguments: - profile -``` - -##### Examples - -``` -polaris profiles create dev -``` - -#### delete - -The `delete` subcommand removes a stored profile. - -``` -input: polaris profiles delete --help -options: - delete - Positional arguments: - profile -``` - -##### Examples - -``` -polaris profiles delete dev -``` - -#### get - -The `get` subcommand removes a stored profile. - -``` -input: polaris profiles get --help -options: - get - Positional arguments: - profile -``` - -##### Examples - -``` -polaris profiles get dev -``` - -#### list - -The `list` subcommand displays all stored profiles. - -``` -input: polaris profiles list --help -options: - list -``` - -##### Examples - -``` -polaris profiles list -``` - -#### update - -The `update` subcommand modifies an existing profile. - -``` -input: polaris profiles update --help -options: - update - Positional arguments: - profile -``` - -##### Examples - -``` -polaris profiles update dev -``` - -## Examples - -This section outlines example code for a few common operations as well as for some more complex ones. - -For especially complex operations, you may wish to instead directly use the Python API. - -### Creating a principal and a catalog - -``` -polaris principals create my_user - -polaris catalogs create \ - --type internal \ - --storage-type s3 \ - --default-base-location s3://iceberg-bucket/polaris-base \ - --role-arn arn:aws:iam::111122223333:role/ExampleCorpRole \ - --allowed-location s3://iceberg-bucket/polaris-alt-location-1 \ - --allowed-location s3://iceberg-bucket/polaris-alt-location-2 \ - my_catalog -``` - -### Granting a principal the ability to manage the content of a catalog - -``` -polaris principal-roles create power_user -polaris principal-roles grant --principal my_user power_user - -polaris catalog-roles create --catalog my_catalog my_catalog_role -polaris catalog-roles grant \ - --catalog my_catalog \ - --principal-role power_user \ - my_catalog_role - -polaris privileges \ - catalog \ - --catalog my_catalog \ - --catalog-role my_catalog_role \ - grant \ - CATALOG_MANAGE_CONTENT -``` - -### Identifying the tables a given principal has been granted explicit access to read - -_Note that some other privileges, such as `CATALOG_MANAGE_CONTENT`, subsume `TABLE_READ_DATA` and would not be discovered here._ - -``` -principal_roles=$(polaris principal-roles list --principal my_principal) -for principal_role in ${principal_roles}; do - catalog_roles=$(polaris catalog-roles --list --principal-role "${principal_role}") - for catalog_role in ${catalog_roles}; do - grants=$(polaris privileges list --catalog-role "${catalog_role}" --catalog "${catalog}") - for grant in $(echo "${grants}" | jq -c '.[] | select(.privilege == "TABLE_READ_DATA")'); do - echo "${grant}" - fi - done - done -done -``` - - diff --git a/1.0.1/configuration.md b/1.0.1/configuration.md deleted file mode 100644 index 7ba1a97c1d..0000000000 --- a/1.0.1/configuration.md +++ /dev/null @@ -1,188 +0,0 @@ ---- -# -# Licensed to the Apache Software Foundation (ASF) under one -# or more contributor license agreements. See the NOTICE file -# distributed with this work for additional information -# regarding copyright ownership. The ASF licenses this file -# to you under the Apache License, Version 2.0 (the -# "License"); you may not use this file except in compliance -# with the License. You may obtain a copy of the License at -# -# http://www.apache.org/licenses/LICENSE-2.0 -# -# Unless required by applicable law or agreed to in writing, -# software distributed under the License is distributed on an -# "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY -# KIND, either express or implied. See the License for the -# specific language governing permissions and limitations -# under the License. -# -title: Configuring Polaris -type: docs -weight: 550 ---- - -## Overview - -This page provides information on how to configure Apache Polaris (Incubating). Unless stated -otherwise, this information is valid both for Polaris Docker images (and Kubernetes deployments) as -well as for Polaris binary distributions. - -> Note: for Production tips and best practices, refer to [Configuring Polaris for Production]({{% ref "configuring-polaris-for-production.md" %}}). - -First off, Polaris server runs on Quarkus, and uses its configuration mechanisms. Read Quarkus -[configuration guide](https://quarkus.io/guides/config) to get familiar with the basics. - -Quarkus aggregates configuration properties from multiple sources, applying them in a specific order -of precedence. When a property is defined in multiple sources, the value from the source with the -higher priority overrides those from lower-priority sources. - -The sources are listed below, from highest to lowest priority: - -1. System properties: properties set via the Java command line using `-Dproperty.name=value`. -2. Environment variables (see below for important details). -3. Settings in `$PWD/config/application.properties` file. -4. The `application.properties` files packaged in Polaris. -5. Default values: hardcoded defaults within the application. - -When using environment variables, there are two naming conventions: - -1. If possible, just use the property name as the environment variable name. This works fine in most - cases, e.g. in Kubernetes deployments. For example, `polaris.realm-context.realms` can be - included as is in a container YAML definition: - ```yaml - env: - - name: "polaris.realm-context.realms" - value: "realm1,realm2" - ``` - -2. If running from a script or shell prompt, however, stricter naming rules apply: variable names - can consist solely of uppercase letters, digits, and the `_` (underscore) sign. In such - situations, the environment variable name must be derived from the property name, by using - uppercase letters, and replacing all dots, dashes and quotes by underscores. For example, - `polaris.realm-context.realms` becomes `POLARIS_REALM_CONTEXT_REALMS`. See - [here](https://smallrye.io/smallrye-config/Main/config/environment-variables/) for more details. - -> [!IMPORTANT] -> While convenient, uppercase-only environment variables can be problematic for complex property -> names. In these situations, it's preferable to use system properties or a configuration file. - -As stated above, a configuration file can also be provided at runtime; it should be available -(mounted) at `$PWD/config/application.properties` for Polaris server to recognize it. In Polaris -official Docker images, this location is `/deployment/config/application.properties`. - -For Kubernetes deployments, the configuration file is typically defined as a `ConfigMap`, then -mounted in the container at `/deployment/config/application.properties`. It can be mounted in -read-only mode, as Polaris only reads the configuration file once, at startup. - -## Polaris Configuration Options Reference - -| Configuration Property | Default Value | Description | -|----------------------------------------------------------------------------------------|-----------------------|---------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------| -| `polaris.persistence.type` | `relational-jdbc` | Define the persistence backend used by Polaris (`in-memory`, `relational-jdbc`, `eclipse-link` (deprecated)). See [Configuring Apache Polaris for Production)[{{% ref "configuring-polaris-for-production.md" %}}) | -| `polaris.persistence.relational.jdbc.max-retries` | `1` | Total number of retries JDBC persistence will attempt on connection resets or serialization failures before giving up. | -| `polaris.persistence.relational.jdbc.max_duaration_in_ms` | `5000 ms` | Max time interval (ms) since the start of a transaction when retries can be attempted. | -| `polaris.persistence.relational.jdbc.initial_delay_in_ms` | `100 ms` | Initial delay before retrying. The delay is doubled after each retry. | -| `polaris.persistence.eclipselink.configurationFile` | | Define the location of the `persistence.xml`. By default, it's the built-in `persistence.xml` in use. | -| `polaris.persistence.eclipselink.persistenceUnit` | `polaris` | Define the name of the persistence unit to use, as defined in the `persistence.xml`. | -| `polaris.realm-context.type` | `default` | Define the type of the Polaris realm to use. | -| `polaris.realm-context.realms` | `POLARIS` | Define the list of realms to use. | -| `polaris.realm-context.header-name` | `Polaris-Realm` | Define the header name defining the realm context. | -| `polaris.features."ENFORCE_PRINCIPAL_CREDENTIAL_ROTATION_REQUIRED_CHECKING"` | `false` | Flag to enforce check if credential rotation. | -| `polaris.features."SUPPORTED_CATALOG_STORAGE_TYPES"` | `FILE` | Define the catalog supported storage. Supported values are `S3`, `GCS`, `AZURE`, `FILE`. | -| `polaris.features.realm-overrides."my-realm"."SKIP_CREDENTIAL_SUBSCOPING_INDIRECTION"` | `true` | "Override" realm features, here the skip credential subscoping indirection flag. | -| `polaris.authentication.authenticator.type` | `default` | Define the Polaris authenticator type. | -| `polaris.authentication.token-service.type` | `default` | Define the Polaris token service type. | -| `polaris.authentication.token-broker.type` | `rsa-key-pair` | Define the Polaris token broker type. Also configure the location of the key files. For RSA: if the locations of the key files are not configured, an ephemeral key-pair will be created on each Polaris server instance startup, which breaks existing tokens after server restarts and is also incompatible with running multiple Polaris server instances. | -| `polaris.authentication.token-broker.max-token-generation` | `PT1H` | Define the max token generation policy on the token broker. | -| `polaris.authentication.token-broker.rsa-key-pair.private-key-file` | | Define the location of the RSA-256 private key file, if present the `public-key` file must be specified, too. | -| `polaris.authentication.token-broker.rsa-key-pair.public-key-file` | | Define the location of the RSA-256 public key file, if present the `private-key` file must be specified, too. | -| `polaris.authentication.token-broker.symmetric-key.secret` | `secret` | Define the secret of the symmetric key. | -| `polaris.authentication.token-broker.symmetric-key.file` | `/tmp/symmetric.key` | Define the location of the symmetric key file. | -| `polaris.storage.aws.access-key` | `accessKey` | Define the AWS S3 access key. If unset, the default credential provider chain will be used. | -| `polaris.storage.aws.secret-key` | `secretKey` | Define the AWS S3 secret key. If unset, the default credential provider chain will be used. | -| `polaris.storage.gcp.token` | `token` | Define the Google Cloud Storage token. If unset, the default credential provider chain will be used. | -| `polaris.storage.gcp.lifespan` | `PT1H` | Define the Google Cloud Storage lifespan type. If unset, the default credential provider chain will be used. | -| `polaris.log.request-id-header-name` | `Polaris-Request-Id` | Define the header name to match request ID in the log. | -| `polaris.log.mdc.aid` | `polaris` | Define the log context (e.g. MDC) AID. | -| `polaris.log.mdc.sid` | `polaris-service` | Define the log context (e.g. MDC) SID. | -| `polaris.rate-limiter.filter.type` | `no-op` | Define the Polaris rate limiter. Supported values are `no-op`, `token-bucket`. | -| `polaris.rate-limiter.token-bucket.type` | `default` | Define the token bucket rate limiter. | -| `polaris.rate-limiter.token-bucket.requests-per-second` | `9999` | Define the number of requests per second for the token bucket rate limiter. | -| `polaris.rate-limiter.token-bucket.window` | `PT10S` | Define the window type for the token bucket rate limiter. | -| `polaris.metrics.tags.=` | `application=Polaris` | Define arbitrary metric tags to include in every request. | -| `polaris.metrics.realm-id-tag.api-metrics-enabled` | `false` | Whether to enable the `realm_id` metric tag in API metrics. | -| `polaris.metrics.realm-id-tag.http-metrics-enabled` | `false` | Whether to enable the `realm_id` metric tag in HTTP request metrics. | -| `polaris.metrics.realm-id-tag.http-metrics-max-cardinality` | `100` | The maximum cardinality for the `realm_id` tag in HTTP request metrics. | -| `polaris.tasks.max-concurrent-tasks` | `100` | Define the max number of concurrent tasks. | -| `polaris.tasks.max-queued-tasks` | `1000` | Define the max number of tasks in queue. | - | `polaris.config.rollback.compaction.on-conflicts.enabled` | `false` | When set to true Polaris will apply the deconfliction by rollbacking those REPLACE operations snapshots which have the property of `polaris.internal.rollback.compaction.on-conflict` in their snapshot summary set to `rollback`, to resolve conflicts at the server end. | - -There are non Polaris configuration properties that can be useful: - -| Configuration Property | Default Value | Description | -|------------------------------------------------------|---------------------------------|-----------------------------------------------------------------------------| -| `quarkus.log.level` | `INFO` | Define the root log level. | -| `quarkus.log.category."org.apache.polaris".level` | | Define the log level for a specific category. | -| `quarkus.default-locale` | System locale | Force the use of a specific locale, for instance `en_US`. | -| `quarkus.http.port` | `8181` | Define the HTTP port number. | -| `quarkus.http.auth.basic` | `false` | Enable the HTTP basic authentication. | -| `quarkus.http.limits.max-body-size` | `10240K` | Define the HTTP max body size limit. | -| `quarkus.http.cors.origins` | | Define the HTTP CORS origins. | -| `quarkus.http.cors.methods` | `PATCH, POST, DELETE, GET, PUT` | Define the HTTP CORS covered methods. | -| `quarkus.http.cors.headers` | `*` | Define the HTTP CORS covered headers. | -| `quarkus.http.cors.exposed-headers` | `*` | Define the HTTP CORS covered exposed headers. | -| `quarkus.http.cors.access-control-max-age` | `PT10M` | Define the HTTP CORS access control max age. | -| `quarkus.http.cors.access-control-allow-credentials` | `true` | Define the HTTP CORS access control allow credentials flag. | -| `quarkus.management.enabled` | `true` | Enable the management server. | -| `quarkus.management.port` | `8182` | Define the port number of the Polaris management server. | -| `quarkus.management.root-path` | | Define the root path where `/metrics` and `/health` endpoints are based on. | -| `quarkus.otel.sdk.disabled` | `true` | Enable the OpenTelemetry layer. | - -> Note: This section is only relevant for Polaris Docker images and Kubernetes deployments. - -There are many other actionable environment variables available in the official Polaris Docker -image; they come from the base image used by Polaris, [ubi9/openjdk-21-runtime]. They should be used -to fine-tune the Java runtime directly, e.g. to enable debugging or to set the heap size. These -variables are not specific to Polaris, but are inherited from the base image. If in doubt, leave -everything at its default! - -[ubi9/openjdk-21-runtime]: https://catalog.redhat.com/software/containers/ubi9/openjdk-21-runtime/6501ce769a0d86945c422d5f - -| Environment variable | Description | -|----------------------------------|----------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------| -| `JAVA_OPTS` or `JAVA_OPTIONS` | **NOT RECOMMENDED**. JVM options passed to the `java` command (example: "-verbose:class"). Setting this variable will override all options set by any of the other variables in this table. To pass extra settings, use `JAVA_OPTS_APPEND` instead. | -| `JAVA_OPTS_APPEND` | User specified Java options to be appended to generated options in `JAVA_OPTS` (example: "-Dsome.property=foo"). | -| `JAVA_TOOL_OPTIONS` | This variable is defined and honored by all OpenJDK distros, see [here](https://bugs.openjdk.org/browse/JDK-4971166). Options defined here take precedence over all else; using this variable is generally not necessary, but can be useful e.g. to enforce JVM startup parameters, to set up remote debug, or to define JVM agents. | -| `JAVA_MAX_MEM_RATIO` | Is used to calculate a default maximal heap memory based on a containers restriction. If used in a container without any memory constraints for the container then this option has no effect. If there is a memory constraint then `-XX:MaxRAMPercentage` is set to a ratio of the container available memory as set here. The default is `80` which means 80% of the available memory is used as an upper boundary. You can skip this mechanism by setting this value to `0` in which case no `-XX:MaxRAMPercentage` option is added. | -| `JAVA_DEBUG` | If set remote debugging will be switched on. Disabled by default (example: true"). | -| `JAVA_DEBUG_PORT` | Port used for remote debugging. Defaults to "5005" (tip: use "*:5005" to enable debugging on all network interfaces). | -| `GC_MIN_HEAP_FREE_RATIO` | Minimum percentage of heap free after GC to avoid expansion. Default is 10. | -| `GC_MAX_HEAP_FREE_RATIO` | Maximum percentage of heap free after GC to avoid shrinking. Default is 20. | -| `GC_TIME_RATIO` | Specifies the ratio of the time spent outside the garbage collection. Default is 4. | -| `GC_ADAPTIVE_SIZE_POLICY_WEIGHT` | The weighting given to the current GC time versus previous GC times. Default is 90. | -| `GC_METASPACE_SIZE` | The initial metaspace size. There is no default (example: "20"). | -| `GC_MAX_METASPACE_SIZE` | The maximum metaspace size. There is no default (example: "100"). | -| `GC_CONTAINER_OPTIONS` | Specify Java GC to use. The value of this variable should contain the necessary JRE command-line options to specify the required GC, which will override the default of `-XX:+UseParallelGC` (example: `-XX:+UseG1GC`). | -Here are some examples: - -| Example | `docker run` option | -|--------------------------------------------|---------------------------------------------------------------------------------------------------------------------| -| Using another GC | `-e GC_CONTAINER_OPTIONS="-XX:+UseShenandoahGC"` lets Polaris use Shenandoah GC instead of the default parallel GC. | -| Set the Java heap size to a _fixed_ amount | `-e JAVA_OPTS_APPEND="-Xms8g -Xmx8g"` lets Polaris use a Java heap of 8g. | -| Set the maximum heap percentage | `-e JAVA_MAX_MEM_RATIO="70"` lets Polaris use 70% percent of the available memory. | - - -## Troubleshooting Configuration Issues - -If you encounter issues with the configuration, you can ask Polaris to print out the configuration it -is using. To do this, set the log level for the `io.smallrye.config` category to `DEBUG`, and also -set the console appender level to `DEBUG`: - -```properties -quarkus.log.console.level=DEBUG -quarkus.log.category."io.smallrye.config".level=DEBUG -``` - -> [!IMPORTANT] This will print out all configuration values, including sensitive ones like -> passwords. Don't do this in production, and don't share this output with anyone you don't trust! diff --git a/1.0.1/configuring-polaris-for-production.md b/1.0.1/configuring-polaris-for-production.md deleted file mode 100644 index fac51b40f9..0000000000 --- a/1.0.1/configuring-polaris-for-production.md +++ /dev/null @@ -1,222 +0,0 @@ ---- -# -# Licensed to the Apache Software Foundation (ASF) under one -# or more contributor license agreements. See the NOTICE file -# distributed with this work for additional information -# regarding copyright ownership. The ASF licenses this file -# to you under the Apache License, Version 2.0 (the -# "License"); you may not use this file except in compliance -# with the License. You may obtain a copy of the License at -# -# http://www.apache.org/licenses/LICENSE-2.0 -# -# Unless required by applicable law or agreed to in writing, -# software distributed under the License is distributed on an -# "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY -# KIND, either express or implied. See the License for the -# specific language governing permissions and limitations -# under the License. -# -title: Configuring Polaris for Production -linkTitle: Production Configuration -type: docs -weight: 600 ---- - -The default server configuration is intended for development and testing. When you deploy Polaris in production, -review and apply the following checklist: -- [ ] Configure OAuth2 keys -- [ ] Enforce realm header validation (`require-header=true`) -- [ ] Use a durable metastore (JDBC + PostgreSQL) -- [ ] Bootstrap valid realms in the metastore -- [ ] Disable local FILE storage - -### Configure OAuth2 - -Polaris authentication requires specifying a token broker factory type. Two implementations are -supported out of the box: - -- [rsa-key-pair] uses a pair of public and private keys; -- [symmetric-key] uses a shared secret. - -[rsa-key-pair]: https://github.com/apache/polaris/blob/390f1fa57bb1af24a21aa95fdbff49a46e31add7/service/common/src/main/java/org/apache/polaris/service/auth/JWTRSAKeyPairFactory.java -[symmetric-key]: https://github.com/apache/polaris/blob/390f1fa57bb1af24a21aa95fdbff49a46e31add7/service/common/src/main/java/org/apache/polaris/service/auth/JWTSymmetricKeyFactory.java - -By default, Polaris uses `rsa-key-pair`, with randomly generated keys. - -> [!IMPORTANT] -> The default `rsa-key-pair` configuration is not suitable when deploying many replicas of Polaris, -> as each replica will have its own set of keys. This will cause token validation to fail when a -> request is routed to a different replica than the one that issued the token. - -It is highly recommended to configure Polaris with previously-generated RSA keys. This can be done -by setting the following properties: - -```properties -polaris.authentication.token-broker.type=rsa-key-pair -polaris.authentication.token-broker.rsa-key-pair.public-key-file=/tmp/public.key -polaris.authentication.token-broker.rsa-key-pair.private-key-file=/tmp/private.key -``` - -To generate an RSA key pair, you can use the following commands: - -```shell -openssl genrsa -out private.key 2048 -openssl rsa -in private.key -pubout -out public.key -``` - -Alternatively, you can use a symmetric key by setting the following properties: - -```properties -polaris.authentication.token-broker.type=symmetric-key -polaris.authentication.token-broker.symmetric-key.file=/tmp/symmetric.key -``` - -Note: it is also possible to set the symmetric key secret directly in the configuration file. If -possible, pass the secret as an environment variable to avoid storing sensitive information in the -configuration file: - -```properties -polaris.authentication.token-broker.symmetric-key.secret=${POLARIS_SYMMETRIC_KEY_SECRET} -``` - -Finally, you can also configure the token broker to use a maximum lifespan by setting the following -property: - -```properties -polaris.authentication.token-broker.max-token-generation=PT1H -``` - -Typically, in Kubernetes, you would define the keys as a `Secret` and mount them as files in the -container. - -### Realm Context Resolver - -By default, Polaris resolves realms based on incoming request headers. You can configure the realm -context resolver by setting the following properties in `application.properties`: - -```properties -polaris.realm-context.realms=POLARIS,MY-REALM -polaris.realm-context.header-name=Polaris-Realm -``` - -Where: - -- `realms` is a comma-separated list of allowed realms. This setting _must_ be correctly configured. - At least one realm must be specified. -- `header-name` is the name of the header used to resolve the realm; by default, it is - `Polaris-Realm`. - -If a request contains the specified header, Polaris will use the realm specified in the header. If -the realm is not in the list of allowed realms, Polaris will return a `404 Not Found` response. - -If a request _does not_ contain the specified header, however, by default Polaris will use the first -realm in the list as the default realm. In the above example, `POLARIS` is the default realm and -would be used if the `Polaris-Realm` header is not present in the request. - -This is not recommended for production use, as it may lead to security vulnerabilities. To avoid -this, set the following property to `true`: - -```properties -polaris.realm-context.require-header=true -``` - -This will cause Polaris to also return a `404 Not Found` response if the realm header is not present -in the request. - -### Metastore Configuration - -A metastore should be configured with an implementation that durably persists Polaris entities. By -default, Polaris uses an in-memory metastore. - -> [!IMPORTANT] -> The default in-memory metastore is not suitable for production use, as it will lose all data -> when the server is restarted; it is also unusable when multiple Polaris replicas are used. - -To enable a durable metastore, configure your system to use the Relational JDBC-backed metastore. -This implementation leverages Quarkus for datasource management and supports configuration through -environment variables or JVM -D flags at startup. For more information, refer to the [Quarkus configuration reference](https://quarkus.io/guides/config-reference#env-file). - -Configure the metastore by setting the following ENV variables: - -``` -POLARIS_PERSISTENCE_TYPE=relational-jdbc - -QUARKUS_DATASOURCE_DB_KIND=postgresql -QUARKUS_DATASOURCE_USERNAME= -QUARKUS_DATASOURCE_PASSWORD= -QUARKUS_DATASOURCE_JDBC_URL= -``` - - -The relational JDBC metastore is a Quarkus-managed datasource and only supports Postgres and H2 as of now. -Please refer to the documentation here: -[Configure data sources in Quarkus](https://quarkus.io/guides/datasource) - -> [!IMPORTANT] -> Be sure to secure your metastore backend since it will be storing sensitive data and catalog -> metadata. - -Note: Polaris will always create schema 'polaris_schema' during bootstrap under the configured database. - -### Bootstrapping - -Before using Polaris, you must **bootstrap** the metastore. This is a manual operation that must be -performed **only once** for each realm in order to prepare the metastore to integrate with Polaris. - -By default, when bootstrapping a new realm, Polaris will create randomised `CLIENT_ID` and -`CLIENT_SECRET` for the `root` principal and store their hashes in the metastore backend. - -Depending on your database, this may not be convenient as the generated credentials are not stored -in clear text in the database. - -In order to provide your own credentials for `root` principal (so you can request tokens via -`api/catalog/v1/oauth/tokens`), use the [Polaris Admin Tool]({{% ref "admin-tool" %}}) - -You can verify the setup by attempting a token issue for the `root` principal: - -```bash -curl -X POST http://localhost:8181/api/catalog/v1/oauth/tokens \ - -d "grant_type=client_credentials" \ - -d "client_id=my-client-id" \ - -d "client_secret=my-client-secret" \ - -d "scope=PRINCIPAL_ROLE:ALL" -``` - -Which should return an access token: - -```json -{ - "access_token": "...", - "token_type": "bearer", - "issued_token_type": "urn:ietf:params:oauth:token-type:access_token", - "expires_in": 3600 -} -``` - -If you used a non-default realm name, add the appropriate request header to the `curl` command, -otherwise Polaris will resolve the realm to the first one in the configuration -`polaris.realm-context.realms`. Here is an example to set realm header: - -```bash -curl -X POST http://localhost:8181/api/catalog/v1/oauth/tokens \ - -H "Polaris-Realm: my-realm" \ - -d "grant_type=client_credentials" \ - -d "client_id=my-client-id" \ - -d "client_secret=my-client-secret" \ - -d "scope=PRINCIPAL_ROLE:ALL" -``` - -### Disable FILE Storage Type -By default, Polaris allows using the local file system (`FILE`) for catalog storage. This is fine for testing, -but **not recommended for production**. To disable it, set the supported storage types like this: -```hocon -polaris.features."SUPPORTED_CATALOG_STORAGE_TYPES" = [ "S3", "Azure" ] -``` -Leave out `FILE` to prevent its use. Only include the storage types your setup needs. - -### Upgrade Considerations - -The [Polaris Evolution](../evolution) page discusses backward compatibility and -upgrade concerns. - diff --git a/1.0.1/entities.md b/1.0.1/entities.md deleted file mode 100644 index 04d625bb94..0000000000 --- a/1.0.1/entities.md +++ /dev/null @@ -1,95 +0,0 @@ ---- -# -# Licensed to the Apache Software Foundation (ASF) under one -# or more contributor license agreements. See the NOTICE file -# distributed with this work for additional information -# regarding copyright ownership. The ASF licenses this file -# to you under the Apache License, Version 2.0 (the -# "License"); you may not use this file except in compliance -# with the License. You may obtain a copy of the License at -# -# http://www.apache.org/licenses/LICENSE-2.0 -# -# Unless required by applicable law or agreed to in writing, -# software distributed under the License is distributed on an -# "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY -# KIND, either express or implied. See the License for the -# specific language governing permissions and limitations -# under the License. -# -Title: Entities -type: docs -weight: 400 ---- - -This page documents various entities that can be managed in Apache Polaris (Incubating). - -## Catalog - -A catalog is a top-level entity in Polaris that may contain other entities like [namespaces](#namespace) and [tables](#table). These map directly to [Apache Iceberg catalogs](https://iceberg.apache.org/concepts/catalog/). - -For information on managing catalogs with the REST API or for more information on what data can be associated with a catalog, see [the API docs]({{% github-polaris "client/python/docs/CreateCatalogRequest.md" %}}). - -### Storage Type - -All catalogs in Polaris are associated with a _storage type_. Valid Storage Types are `S3`, `Azure`, and `GCS`. The `FILE` type is also additionally available for testing. Each of these types relates to a different storage provider where data within the catalog may reside. Depending on the storage type, various other configurations may be set for a catalog including credentials to be used when accessing data inside the catalog. - -For details on how to use Storage Types in the REST API, see [the API docs]({{% github-polaris "client/python/docs/StorageConfigInfo.md" %}}). - -For usage examples of storage types, see [docs]({{% ref "command-line-interface" %}}). - -## Namespace - -A namespace is a logical entity that resides within a [catalog](#catalog) and can contain other entities such as [tables](#table) or [views](#view). Some other systems may refer to namespaces as _schemas_ or _databases_. - -In Polaris, namespaces can be nested. For example, `a.b.c.d.e.f.g` is a valid namespace. `b` is said to reside within `a`, and so on. - -For information on managing namespaces with the REST API or for more information on what data can be associated with a namespace, see [the API docs]({{% github-polaris "client/python/docs/CreateNamespaceRequest.md" %}}). - -## Table - -Polaris tables are entities that map to [Apache Iceberg tables](https://iceberg.apache.org/docs/nightly/configuration/), [Delta tables](https://docs.databricks.com/aws/en/delta/table-properties), or [Hudi tables](https://hudi.apache.org/docs/next/configurations#TABLE_CONFIG). - -For information on managing tables with the REST API or for more information on what data can be associated with a table, see [the API docs]({{% github-polaris "client/python/docs/CreateTableRequest.md" %}}). - -## View - -Polaris views are entities that map to [Apache Iceberg views](https://iceberg.apache.org/view-spec/). - -For information on managing views with the REST API or for more information on what data can be associated with a view, see [the API docs]({{% github-polaris "client/python/docs/CreateViewRequest.md" %}}). - -## Principal - -Polaris principals are unique identities that can be used to represent users or services. Each principal may have one or more [principal roles](#principal-role) assigned to it for the purpose of accessing catalogs and the entities within them. - -For information on managing principals with the REST API or for more information on what data can be associated with a principal, see [the API docs]({{% github-polaris "client/python/docs/CreatePrincipalRequest.md" %}}). - -## Principal Role - -Polaris principal roles are labels that may be granted to [principals](#principal). Each principal may have one or more principal roles, and the same principal role may be granted to multiple principals. Principal roles may be assigned based on the persona or responsibilities of a given principal, or on how that principal will need to access different entities within Polaris. - -For information on managing principal roles with the REST API or for more information on what data can be associated with a principal role, see [the API docs]({{% github-polaris "client/python/docs/CreatePrincipalRoleRequest.md" %}}). - -## Catalog Role - -Polaris catalog roles are labels that may be granted to [catalogs](#catalog). Each catalog may have one or more catalog roles, and the same catalog role may be granted to multiple catalogs. Catalog roles may be assigned based on the nature of data that will reside in a catalog, or by the groups of users and services that might need to access that data. - -Each catalog role may have multiple [privileges](#privilege) granted to it, and each catalog role can be granted to one or more [principal roles](#principal-role). This is the mechanism by which principals are granted access to entities inside a catalog such as namespaces and tables. - -## Policy - -Polaris policy is a set of rules governing actions on specified resources under predefined conditions. Polaris support policy for Iceberg table compaction, snapshot expiry, row-level access control, and custom policy definitions. - -Policy can be applied at catalog level, namespace level, or table level. Policy inheritance can be achieved by attaching one to a higher-level scope, such as namespace or catalog. As a result, tables registered under those entities do not need to be declared individually for the same policy. If a table or a namespace requires a different policy, user can assign a different policy, hence overriding policy of the same type declared at the higher level entities. - -## Privilege - -Polaris privileges are granted to [catalog roles](#catalog-role) in order to grant principals with a given principal role some degree of access to catalogs with a given catalog role. When a privilege is granted to a catalog role, any principal roles granted that catalog role receive the privilege. In turn, any principals who are granted that principal role receive it. - -A privilege can be scoped to any entity inside a catalog, including the catalog itself. - -For a list of supported privileges for each privilege class, see the API docs: -* [Table Privileges]({{% github-polaris "client/python/docs/TablePrivilege.md" %}}) -* [View Privileges]({{% github-polaris "client/python/docs/ViewPrivilege.md" %}}) -* [Namespace Privileges]({{% github-polaris "client/python/docs/NamespacePrivilege.md" %}}) -* [Catalog Privileges]({{% github-polaris "client/python/docs/CatalogPrivilege.md" %}}) diff --git a/1.0.1/evolution.md b/1.0.1/evolution.md deleted file mode 100644 index ea29badc84..0000000000 --- a/1.0.1/evolution.md +++ /dev/null @@ -1,115 +0,0 @@ ---- -# -# Licensed to the Apache Software Foundation (ASF) under one -# or more contributor license agreements. See the NOTICE file -# distributed with this work for additional information -# regarding copyright ownership. The ASF licenses this file -# to you under the Apache License, Version 2.0 (the -# "License"); you may not use this file except in compliance -# with the License. You may obtain a copy of the License at -# -# http://www.apache.org/licenses/LICENSE-2.0 -# -# Unless required by applicable law or agreed to in writing, -# software distributed under the License is distributed on an -# "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY -# KIND, either express or implied. See the License for the -# specific language governing permissions and limitations -# under the License. -# -title: Polaris Evolution -type: docs -weight: 1000 ---- - -This page discusses what can be expected from Apache Polaris as the project evolves. - -## Using Polaris as a Catalog - -Polaris is primarily intended to be used as a Catalog of Tables and Views. As such, -it implements the Iceberg REST Catalog API and its own REST APIs. - -Revisions of the Iceberg REST Catalog API are controlled by the [Apache Iceberg](https://iceberg.apache.org/) -community. Polaris attempts to accurately implement this specification. Nonetheless, -optional REST Catalog features may or may not be supported immediately. In general, -there is no guarantee that Polaris releases always implement the latest version of -the Iceberg REST Catalog API. - -Any API under Polaris control that is not in an "experimental" or "beta" state -(e.g. the Management API) is maintained as a versioned REST API. New releases of Polaris -may include changes to the current version of the API. When that happens those changes -are intended to be compatible with prior versions of Polaris clients. Certain endpoints -and parameters may be deprecated. - -In case a major change is required to an API that cannot be implemented in a -backward-compatible way, new endpoints (URI paths) may be introduced. New URI "roots" may -be introduced too (e.g. `api/catalog/v2`). - -Note that those "v1", "v2", etc. URI path segments are not meant to be 1:1 with Polaris -releases or Polaris project version numbers (e.g. a "v2" path segment does not mean that -it is added in Polaris 2.0). - -Polaris servers will support deprecated API endpoints / parameters / versions / etc. -for some transition period to allow clients to migrate. - -### Managing Polaris Database - -Polaris stores its data in a database, which is sometimes referred to as "Metastore" or -"Persistence" in other docs. - -Each Polaris release may support multiple Persistence [implementations](../metastores), -for example, "EclipseLink" (deprecated) and "JDBC" (current). - -Each type of Persistence evolves individually. Within each Persistence type, Polaris -attempts to support rolling upgrades (both version X and X + 1 servers running at the -same time). - -However, migrating between different Persistence types is not supported in a rolling -upgrade manner (for example, migrating from "EclipseLink" to "JDBC"). Polaris provides -[tools](https://github.com/apache/polaris-tools/) for migrating between different -catalogs and those tools may be used to migrate between different Persistence types -as well. Service interruption (downtime) should be expected in those cases. - -## Using Polaris as a Build-Time Dependency - -Polaris produces several jars. These jars or custom builds of Polaris code may be used in -downstream projects according to the terms of the license included into Polaris distributions. - -The minimal version of the JRE required by Polaris code (compilation target) may be updated in -any release. Different Polaris jars may have different minimal JRE version requirements. - -Changes in Java class should be expected at any time regardless of the module name or -whether the class / method is `public` or not. - -This approach is not meant to discourage the use of Polaris code in downstream projects, but -to allow more flexibility in evolving the codebase to support new catalog-level features -and improve code efficiency. Maintainers of downstream projects are encouraged to join Polaris -mailing lists to monitor project changes, suggest improvements, and engage with the Polaris -community in case of specific compatibility concerns. - -## Semantic Versioning - -Polaris strives to follow [Semantic Versioning](https://semver.org/) conventions both with -respect to REST APIs (beta and experimental APIs excepted), [Polaris Policies](../policy/) -and user-facing [configuration](../configuration/). - -The following are some examples of Polaris approach to SemVer in REST APIs / configuration. -These examples are for illustration purposes and should not be considered to be -exhaustive. - -* Polaris implementing an optional Iceberg REST Catalog feature that was unimplemented -in the previous release is not considered a major change. - -* Supporting a new revision of the Iceberg REST Catalog spec in a backward-compatible way -is not considered a major change. Specifically, supporting new REST API prefixes (e.g. `v2`) -is not a major change because it does not affect older clients. - -* Changing the implementation of an Iceberg REST Catalog feature / endpoint in a non-backward -compatible way (e.g. removing or renaming a request parameter) is a major change. - -* Dropping support for a configuration property with the `polaris.` name prefix is a major change. - -* Dropping support for any previously defined [Policy](../policy/) type or property is a major change. - -* Upgrading Quarkus Runtime to its next major version is a major change (because -Quarkus-managed configuration may change). diff --git a/1.0.1/generic-table.md b/1.0.1/generic-table.md deleted file mode 100644 index 2e0e3fe8e6..0000000000 --- a/1.0.1/generic-table.md +++ /dev/null @@ -1,169 +0,0 @@ ---- -# -# Licensed to the Apache Software Foundation (ASF) under one -# or more contributor license agreements. See the NOTICE file -# distributed with this work for additional information -# regarding copyright ownership. The ASF licenses this file -# to you under the Apache License, Version 2.0 (the -# "License"); you may not use this file except in compliance -# with the License. You may obtain a copy of the License at -# -# http://www.apache.org/licenses/LICENSE-2.0 -# -# Unless required by applicable law or agreed to in writing, -# software distributed under the License is distributed on an -# "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY -# KIND, either express or implied. See the License for the -# specific language governing permissions and limitations -# under the License. -# -title: Generic Table (Beta) -type: docs -weight: 435 ---- - -The Generic Table in Apache Polaris is designed to provide support for non-Iceberg tables across different table formats includes delta, csv etc. It currently provides the following capabilities: -- Create a generic table under a namespace -- Load a generic table -- Drop a generic table -- List all generic tables under a namespace - -**NOTE** The current generic table is in beta release. Please use it with caution and report any issue if encountered. - -## What is a Generic Table? - -A generic table in Polaris is an entity that defines the following fields: - -- **name** (required): A unique identifier for the table within a namespace -- **format** (required): The format for the generic table, i.e. "delta", "csv" -- **base-location** (optional): Table base location in URI format. For example: s3:///path/to/table - - The table base location is a location that includes all files for the table - - A table with multiple disjoint locations (i.e. containing files that are outside the configured base location) is not compliant with the current generic table support in Polaris. - - If no location is provided, clients or users are responsible for managing the location. -- **properties** (optional): Properties for the generic table passed on creation. - - Currently, there is no reserved property key defined. - - The property definition and interpretation is delegated to client or engine implementations. -- **doc** (optional): Comment or description for the table - -## Generic Table API Vs. Iceberg Table API - -Generic Table provides a different set of APIs to operate on the generic table entities while Iceberg APIs operates on -the Iceberg table entities. - -| Operations | **Iceberg Table API** | **Generic Table API** | -|--------------|---------------------------------------------------------------------------------------------------------------------------------------------------------------------|-------------------------------------------------------------------------------------------------------------------------------| -| Create Table | Create an Iceberg table | Create a generic table | -| Load Table | Load an Iceberg table. If the table to load is a generic table, you need to call the Generic Table loadTable API, otherwise a TableNotFoundException will be thrown | Load a generic table. Similarly, try to load an Iceberg table through Generic Table API will thrown a TableNotFoundException. | -| Drop Table | Drop an Iceberg table. Similar as load table, if the table to drop is a Generic table, a tableNotFoundException will be thrown. | Drop a generic table. Drop an Iceberg table through Generic table endpoint will thrown an TableNotFound Exception | -| List Table | List all Iceberg tables | List all generic tables | - -Note that generic table shares the same namespace with Iceberg tables, the table name has to be unique under the same namespace. Furthermore, since -there is currently no support for Update Generic Table, any update to the existing table requires a drop and re-create. - -## Working with Generic Table - -There are two ways to work with Polaris Generic Tables today: -1) Directly communicate with Polaris through REST API calls using tools such as `curl`. Details will be described in the later section. -2) Use the Spark client provided if you are working with Spark. Please refer to [Polaris Spark Client]({{% ref "polaris-spark-client" %}}) for detailed instructions. - -### Create a Generic Table - -To create a generic table, you need to provide the corresponding fields as described in [What is a Generic Table](#what-is-a-generic-table). - -The REST API for creating a generic Table is `POST /polaris/v1/{prefix}/namespaces/{namespace}/generic-tables`, and the -request body looks like the following: - -```json -{ - "name": "", - "format": "", - "base-location": "", - "doc": "", - "properties": { - "": "" - } -} -``` - -Here is an example to create a generic table with name `delta_table` and format as `delta` under a namespace `delta_ns` -for catalog `delta_catalog` using curl: - -```shell -curl -X POST http://localhost:8181/api/catalog/polaris/v1/delta_catalog/namespaces/delta_ns/generic-tables \ - -H "Content-Type: application/json" \ - -d '{ - "name": "delta_table", - "format": "delta", - "base-location": "s3:///path/to/table", - "doc": "delta table example", - "properties": { - "key1": "value1" - } - }' -``` - -### Load a Generic Table -The REST endpoint for load a generic table is `GET /polaris/v1/{prefix}/namespaces/{namespace}/generic-tables/{generic-table}`. - -Here is an example to load the table `delta_table` using curl: -```shell -curl -X GET http://localhost:8181/api/catalog/polaris/v1/delta_catalog/namespaces/delta_ns/generic-tables/delta_table -``` -And the response looks like the following: -```json -{ - "table": { - "name": "delta_table", - "format": "delta", - "base-location": "s3:///path/to/table", - "doc": "delta table example", - "properties": { - "key1": "value1" - } - } -} -``` - -### List Generic Tables -The REST endpoint for listing the generic tables under a given -namespace is `GET /polaris/v1/{prefix}/namespaces/{namespace}/generic-tables/`. - -Following curl command lists all tables under namespace delta_namespace: -```shell -curl -X GET http://localhost:8181/api/catalog/polaris/v1/delta_catalog/namespaces/delta_ns/generic-tables/ -``` -Example Response: -```json -{ - "identifiers": [ - { - "namespace": ["delta_ns"], - "name": "delta_table" - } - ], - "next-page-token": null -} -``` - -### Drop a Generic Table -The drop generic table REST endpoint is `DELETE /polaris/v1/{prefix}/namespaces/{namespace}/generic-tables/{generic-table}` - -The following curl call drops the table `delat_table`: -```shell -curl -X DELETE http://localhost:8181/api/catalog/polaris/v1/delta_catalog/namespaces/delta_ns/generic-tables/{generic-table} -``` - -### API Reference - -For the complete and up-to-date API specification, see the [Catalog API Spec](https://editor-next.swagger.io/?url=https://raw.githubusercontent.com/apache/polaris/refs/heads/main/spec/generated/bundled-polaris-catalog-service.yaml). - -## Limitations - -Current limitations of Generic Table support: -1) Limited spec information. Currently, there is no spec for information like Schema, Partition etc. -2) No commit coordination or update capability provided at the catalog service level. - -Therefore, the catalog itself is unaware of anything about the underlying table except some of the loosely defined metadata. -It is the responsibility of the engine (and plugins used by the engine) to determine exactly how loading or commiting data -should look like based on the metadata. For example, with the delta support, th delta log serialization, deserialization -and update all happens at client side. diff --git a/1.0.1/getting-started/_index.md b/1.0.1/getting-started/_index.md deleted file mode 100644 index 515d211538..0000000000 --- a/1.0.1/getting-started/_index.md +++ /dev/null @@ -1,23 +0,0 @@ ---- -# -# Licensed to the Apache Software Foundation (ASF) under one -# or more contributor license agreements. See the NOTICE file -# distributed with this work for additional information -# regarding copyright ownership. The ASF licenses this file -# to you under the Apache License, Version 2.0 (the -# "License"); you may not use this file except in compliance -# with the License. You may obtain a copy of the License at -# -# http://www.apache.org/licenses/LICENSE-2.0 -# -# Unless required by applicable law or agreed to in writing, -# software distributed under the License is distributed on an -# "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY -# KIND, either express or implied. See the License for the -# specific language governing permissions and limitations -# under the License. -# -title: 'Getting Started' -type: docs -weight: 101 ---- \ No newline at end of file diff --git a/1.0.1/getting-started/deploying-polaris/_index.md b/1.0.1/getting-started/deploying-polaris/_index.md deleted file mode 100644 index 32fd5dafd6..0000000000 --- a/1.0.1/getting-started/deploying-polaris/_index.md +++ /dev/null @@ -1,27 +0,0 @@ ---- -# -# Licensed to the Apache Software Foundation (ASF) under one -# or more contributor license agreements. See the NOTICE file -# distributed with this work for additional information -# regarding copyright ownership. The ASF licenses this file -# to you under the Apache License, Version 2.0 (the -# "License"); you may not use this file except in compliance -# with the License. You may obtain a copy of the License at -# -# http://www.apache.org/licenses/LICENSE-2.0 -# -# Unless required by applicable law or agreed to in writing, -# software distributed under the License is distributed on an -# "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY -# KIND, either express or implied. See the License for the -# specific language governing permissions and limitations -# under the License. -# -Title: Deploying Polaris on Cloud Providers -type: docs -weight: 300 ---- - -We will now demonstrate how to deploy Polaris locally, as well as with all supported Cloud Providers: Amazon Web Services (AWS), Azure, and Google Cloud Platform (GCP). - -Locally, Polaris can be deployed using both Docker and local build. On the cloud, this tutorial will deploy Polaris using Docker only - but local builds can also be executed. \ No newline at end of file diff --git a/1.0.1/getting-started/deploying-polaris/quickstart-deploy-aws.md b/1.0.1/getting-started/deploying-polaris/quickstart-deploy-aws.md deleted file mode 100644 index d66ea2784c..0000000000 --- a/1.0.1/getting-started/deploying-polaris/quickstart-deploy-aws.md +++ /dev/null @@ -1,57 +0,0 @@ ---- -# -# Licensed to the Apache Software Foundation (ASF) under one -# or more contributor license agreements. See the NOTICE file -# distributed with this work for additional information -# regarding copyright ownership. The ASF licenses this file -# to you under the Apache License, Version 2.0 (the -# "License"); you may not use this file except in compliance -# with the License. You may obtain a copy of the License at -# -# http://www.apache.org/licenses/LICENSE-2.0 -# -# Unless required by applicable law or agreed to in writing, -# software distributed under the License is distributed on an -# "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY -# KIND, either express or implied. See the License for the -# specific language governing permissions and limitations -# under the License. -# -Title: Deploying Polaris on Amazon Web Services (AWS) -type: docs -weight: 310 ---- - -Build and launch Polaris using the AWS Startup Script at the location provided in the command below. This script will start an [Amazon RDS for PostgreSQL](https://aws.amazon.com/rds/postgresql/) instance, which will be used as the backend Postgres instance holding all Polaris data. -Additionally, Polaris will be bootstrapped to use this database and Docker containers will be spun up for Spark SQL and Trino. - -The requirements to run the script below are: -* There must be at least two subnets created in the VPC and region in which your EC2 instance reside. The span of subnets MUST include at least 2 availability zones (AZs) within the same region. -* Your EC2 instance must be enabled with [IMDSv1 or IMDSv2 with 2+ hop limit](https://docs.aws.amazon.com/AWSEC2/latest/UserGuide/configuring-IMDS-new-instances.html#configure-IMDS-new-instances-instance-settings). -* The AWS identity that you will use to run this script must have the following AWS permissions: - * "ec2:DescribeInstances" - * "rds:CreateDBInstance" - * "rds:DescribeDBInstances" - * "rds:CreateDBSubnetGroup" - * "sts:AssumeRole" on the same role as the Instance Profile role of the EC2 instance on which you are running this script. Additionally, you should ensure that the Instance Profile contains a trust policy that allows the role to trust itself to be assumed. - -```shell -chmod +x getting-started/assets/cloud_providers/deploy-aws.sh -export ASSETS_PATH=$(pwd)/getting-started/assets/ -export CLIENT_ID=root -export CLIENT_SECRET=s3cr3t -./getting-started/assets/cloud_providers/deploy-aws.sh -``` - -## Next Steps -Congrats, you now have a running instance of1 Polaris! For details on how to use Polaris, check out the [Using Polaris]({{% relref "../using-polaris.md" %}}) page. - -## Cleanup Instructions -To shut down the Polaris server, run the following commands: - -```shell -export ASSETS_PATH=$(pwd)/getting-started/assets/ -docker compose -p polaris -f getting-started/eclipselink/docker-compose.yml down -``` - -To deploy Polaris in a production setting, please review further recommendations at the [Configuring Polaris for Production]({{% relref "../../configuring-polaris-for-production" %}}) page. diff --git a/1.0.1/getting-started/deploying-polaris/quickstart-deploy-azure.md b/1.0.1/getting-started/deploying-polaris/quickstart-deploy-azure.md deleted file mode 100644 index a90cfd9cbd..0000000000 --- a/1.0.1/getting-started/deploying-polaris/quickstart-deploy-azure.md +++ /dev/null @@ -1,52 +0,0 @@ ---- -# -# Licensed to the Apache Software Foundation (ASF) under one -# or more contributor license agreements. See the NOTICE file -# distributed with this work for additional information -# regarding copyright ownership. The ASF licenses this file -# to you under the Apache License, Version 2.0 (the -# "License"); you may not use this file except in compliance -# with the License. You may obtain a copy of the License at -# -# http://www.apache.org/licenses/LICENSE-2.0 -# -# Unless required by applicable law or agreed to in writing, -# software distributed under the License is distributed on an -# "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY -# KIND, either express or implied. See the License for the -# specific language governing permissions and limitations -# under the License. -# -Title: Deploying Polaris on Azure -type: docs -weight: 320 ---- - -Build and launch Polaris using the AWS Startup Script at the location provided in the command below. This script will start an [Azure Database for PostgreSQL - Flexible Server](https://learn.microsoft.com/en-us/azure/postgresql/flexible-server/overview) instance, which will be used as the backend Postgres instance holding all Polaris data. -Additionally, Polaris will be bootstrapped to use this database and Docker containers will be spun up for Spark SQL and Trino. - -The requirements to run the script below are: -* Install the AZ CLI, if it is not already installed on the Azure VM. Instructions to download the AZ CLI can be found [here](https://learn.microsoft.com/en-us/cli/azure/install-azure-cli). -* You must be logged into the AZ CLI. Please run `az account show` to ensure that you are logged in prior to running this script. -* Assign a System-Assigned Managed Identity to the Azure VM. - -```shell -chmod +x getting-started/assets/cloud_providers/deploy-azure.sh -export ASSETS_PATH=$(pwd)/getting-started/assets/ -export CLIENT_ID=root -export CLIENT_SECRET=s3cr3t -./getting-started/assets/cloud_providers/deploy-azure.sh -``` - -## Next Steps -Congrats, you now have a running instance of Polaris! For further information regarding how to use Polaris, check out the [Using Polaris]({{% relref "../using-polaris.md" %}}) page. - -## Cleanup Instructions -To shut down the Polaris server, run the following commands: - -```shell -export ASSETS_PATH=$(pwd)/getting-started/assets/ -docker compose -p polaris -f getting-started/eclipselink/docker-compose.yml down -``` - -To deploy Polaris in a production setting, please review further recommendations at the [Configuring Polaris for Production]({{% relref "../../configuring-polaris-for-production" %}}) page. diff --git a/1.0.1/getting-started/deploying-polaris/quickstart-deploy-gcp.md b/1.0.1/getting-started/deploying-polaris/quickstart-deploy-gcp.md deleted file mode 100644 index fe7fc0c1d5..0000000000 --- a/1.0.1/getting-started/deploying-polaris/quickstart-deploy-gcp.md +++ /dev/null @@ -1,52 +0,0 @@ ---- -# -# Licensed to the Apache Software Foundation (ASF) under one -# or more contributor license agreements. See the NOTICE file -# distributed with this work for additional information -# regarding copyright ownership. The ASF licenses this file -# to you under the Apache License, Version 2.0 (the -# "License"); you may not use this file except in compliance -# with the License. You may obtain a copy of the License at -# -# http://www.apache.org/licenses/LICENSE-2.0 -# -# Unless required by applicable law or agreed to in writing, -# software distributed under the License is distributed on an -# "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY -# KIND, either express or implied. See the License for the -# specific language governing permissions and limitations -# under the License. -# -Title: Deploying Polaris on Google Cloud Platform (GCP) -type: docs -weight: 330 ---- - -Build and launch Polaris using the AWS Startup Script at the location provided in the command below. This script will start a [Cloud SQL for PostgreSQL](https://cloud.google.com/sql/docs/postgres) instance, which will be used as the backend Postgres instance holding all Polaris data. -Additionally, Polaris will be bootstrapped to use this database and Docker containers will be spun up for Spark SQL and Trino. - -The requirements to run the script below are: -* Install the `gcloud` CLI, if it is not already installed on the GCP VM. Instructions to download the `gcloud` CLI can be found [here](https://cloud.google.com/sdk/docs/install). -* Ensure the `Cloud SQL Admin API` has been enabled in your project and that your VM's Principal has access to the correct role: `roles/cloudsql.admin`. -* Ensure the VM's Principal has access to at least Read-only scope on Compute Engine: `compute.readonly`. - -```shell -chmod +x getting-started/assets/cloud_providers/deploy-gcp.sh -export ASSETS_PATH=$(pwd)/getting-started/assets/ -export CLIENT_ID=root -export CLIENT_SECRET=s3cr3t -./getting-started/assets/cloud_providers/deploy-gcp.sh -``` - -## Next Steps -Congrats, you now have a running instance of Polaris! For further information regarding how to use Polaris, check out the [Using Polaris]({{% relref "../using-polaris.md" %}}) page. - -## Cleanup Instructions -To shut down the Polaris server, run the following commands: - -```shell -export ASSETS_PATH=$(pwd)/getting-started/assets/ -docker compose -p polaris -f getting-started/eclipselink/docker-compose.yml down -``` - -To deploy Polaris in a production setting, please review further recommendations at the [Configuring Polaris for Production]({{% relref "../../configuring-polaris-for-production" %}}) page. diff --git a/1.0.1/getting-started/install-dependencies.md b/1.0.1/getting-started/install-dependencies.md deleted file mode 100644 index 7341118868..0000000000 --- a/1.0.1/getting-started/install-dependencies.md +++ /dev/null @@ -1,118 +0,0 @@ ---- -# -# Licensed to the Apache Software Foundation (ASF) under one -# or more contributor license agreements. See the NOTICE file -# distributed with this work for additional information -# regarding copyright ownership. The ASF licenses this file -# to you under the Apache License, Version 2.0 (the -# "License"); you may not use this file except in compliance -# with the License. You may obtain a copy of the License at -# -# http://www.apache.org/licenses/LICENSE-2.0 -# -# Unless required by applicable law or agreed to in writing, -# software distributed under the License is distributed on an -# "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY -# KIND, either express or implied. See the License for the -# specific language governing permissions and limitations -# under the License. -# -Title: Installing Dependencies -type: docs -weight: 100 ---- - -This guide serves as an introduction to several key entities that can be managed with Apache Polaris (Incubating), describes how to build and deploy Polaris locally, and finally includes examples of how to use Polaris with Apache Spark™. - -# Prerequisites - -This guide covers building Polaris, deploying it locally or via [Docker](https://www.docker.com/), and interacting with it using the command-line interface and [Apache Spark](https://spark.apache.org/). Before proceeding with Polaris, be sure to satisfy the relevant prerequisites listed here. - -## Git - -To get the latest Polaris code, you'll need to clone the repository using [git](https://git-scm.com/). You can install git using [homebrew](https://brew.sh/) on MacOS: - -```shell -brew install git -``` - -Please follow instructions from the [Git Documentation](https://git-scm.com/book/en/v2/Getting-Started-Installing-Git) for instructions on installing Git on other platforms. - -Then, use git to clone the Polaris repo: - -```shell -cd ~ -git clone https://github.com/apache/polaris.git -``` - -## Docker - -It is recommended to deploy Polaris inside [Docker](https://www.docker.com/) for the Quickstart workflow. Instructions for deploying the Quickstart workflow on the supported Cloud Providers (AWS, Azure, GCP) will be provided only with Docker. However, non-Docker deployment instructions for local deployments can also be followed on Cloud Providers. - -Instructions to install Docker can be found on the [Docker website](https://docs.docker.com/engine/install/). Ensure that Docker and the Docker Compose plugin are both installed. - -### Docker on MacOS -Docker can be installed using [homebrew](https://brew.sh/): - -```shell -brew install --cask docker -``` - -There could be a [Docker permission issues](https://github.com/apache/polaris/pull/971) related to seccomp configuration. To resolve these issues, set the `seccomp` profile to "unconfined" when running a container. For example: - -```shell -docker run --security-opt seccomp=unconfined apache/polaris:latest -``` - -Note: Setting the seccomp profile to "unconfined" disables the default system call filtering, which may pose security risks. Use this configuration with caution, especially in production environments. - -### Docker on Amazon Linux -Docker can be installed using a modification to the CentOS instructions. For example: - -```shell -sudo dnf update -y -# Remove old version -sudo dnf remove -y docker docker-client docker-client-latest docker-common docker-latest docker-latest-logrotate docker-logrotate docker-engine -# Install dnf plugin -sudo dnf -y install dnf-plugins-core -# Add CentOS repository -sudo dnf config-manager --add-repo https://download.docker.com/linux/centos/docker-ce.repo -# Adjust release server version in the path as it will not match with Amazon Linux 2023 -sudo sed -i 's/$releasever/9/g' /etc/yum.repos.d/docker-ce.repo -# Install as usual -sudo dnf -y install docker-ce docker-ce-cli containerd.io docker-buildx-plugin docker-compose-plugin -``` - -### Confirm Docker Installation - -Once installed, make sure that both Docker and the Docker Compose plugin are installed: - -```shell -docker version -docker compose version -``` - -Also make sure Docker is running and is able to run a sample Docker container: - -```shell -docker run hello-world -``` - -## Java - -If you plan to build Polaris from source yourself or using this tutorial's instructions on a Cloud Provider, you will need to satisfy a few prerequisites first. - -Polaris is built using [gradle](https://gradle.org/) and is compatible with Java 21. We recommend the use of [jenv](https://www.jenv.be/) to manage multiple Java versions. For example, to install Java 21 via [homebrew](https://brew.sh/) and configure it with jenv: - -```shell -cd ~/polaris -brew install openjdk@21 jenv -jenv add $(brew --prefix openjdk@21) -jenv local 21 -``` - -Ensure that `java --version` and `javac` both return non-zero responses. - -## jq - -Most Polaris Quickstart scripts require `jq`. Follow the instructions from the [jq](https://jqlang.org/download/) website to download this tool. \ No newline at end of file diff --git a/1.0.1/getting-started/quickstart.md b/1.0.1/getting-started/quickstart.md deleted file mode 100644 index a9fd43f906..0000000000 --- a/1.0.1/getting-started/quickstart.md +++ /dev/null @@ -1,116 +0,0 @@ ---- -# -# Licensed to the Apache Software Foundation (ASF) under one -# or more contributor license agreements. See the NOTICE file -# distributed with this work for additional information -# regarding copyright ownership. The ASF licenses this file -# to you under the Apache License, Version 2.0 (the -# "License"); you may not use this file except in compliance -# with the License. You may obtain a copy of the License at -# -# http://www.apache.org/licenses/LICENSE-2.0 -# -# Unless required by applicable law or agreed to in writing, -# software distributed under the License is distributed on an -# "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY -# KIND, either express or implied. See the License for the -# specific language governing permissions and limitations -# under the License. -# -Title: Quickstart -type: docs -weight: 200 ---- - -Polaris can be deployed via a docker image or as a standalone process. Before starting, be sure that you've satisfied the relevant prerequisites detailed in the previous page. - -## Common Setup -Before running Polaris, ensure you have completed the following setup steps: - -1. **Build Polaris** -```shell -cd ~/polaris -./gradlew \ - :polaris-server:assemble \ - :polaris-server:quarkusAppPartsBuild \ - :polaris-admin:assemble --rerun \ - -Dquarkus.container-image.tag=postgres-latest \ - -Dquarkus.container-image.build=true -``` -- **For standalone**: Omit the `-Dquarkus.container-image.tag` and `-Dquarkus.container-image.build` options if you do not need to build a Docker image. - -## Running Polaris with Docker - -To start using Polaris in Docker and launch Polaris, which is packaged with a Postgres instance, Apache Spark, and Trino. - -```shell -export ASSETS_PATH=$(pwd)/getting-started/assets/ -export QUARKUS_DATASOURCE_JDBC_URL=jdbc:postgresql://postgres:5432/POLARIS -export QUARKUS_DATASOURCE_USERNAME=postgres -export QUARKUS_DATASOURCE_PASSWORD=postgres -export CLIENT_ID=root -export CLIENT_SECRET=s3cr3t -docker compose -p polaris -f getting-started/assets/postgres/docker-compose-postgres.yml \ - -f getting-started/jdbc/docker-compose-bootstrap-db.yml \ - -f getting-started/jdbc/docker-compose.yml up -d -``` - -You should see output for some time as Polaris, Spark, and Trino build and start up. Eventually, you won’t see any more logs and see some logs relating to Spark, resembling the following: - -``` -spark-sql-1 | Spark Web UI available at http://8bc4de8ed854:4040 -spark-sql-1 | Spark master: local[*], Application Id: local-1743745174604 -spark-sql-1 | 25/04/04 05:39:38 WARN SparkSQLCLIDriver: WARNING: Directory for Hive history file: /home/spark does not exist. History will not be available during this session. -spark-sql-1 | 25/04/04 05:39:39 WARN RESTSessionCatalog: Iceberg REST client is missing the OAuth2 server URI configuration and defaults to http://polaris:8181/api/catalogv1/oauth/tokens. This automatic fallback will be removed in a future Iceberg release.It is recommended to configure the OAuth2 endpoint using the 'oauth2-server-uri' property to be prepared. This warning will disappear if the OAuth2 endpoint is explicitly configured. See https://github.com/apache/iceberg/issues/10537 -``` - -The Docker image pre-configures a sample catalog called `quickstart_catalog` that uses a local file system. - -## Running Polaris as a Standalone Process - -You can also start Polaris through Gradle (packaged within the Polaris repository): - -1. **Start the Server** - -Run the following command to start Polaris: - -```shell -./gradlew run -``` - -You should see output for some time as Polaris builds and starts up. Eventually, you won’t see any more logs and should see messages that resemble the following: - -``` -INFO [io.quarkus] [,] [,,,] (Quarkus Main Thread) polaris-runtime-service on JVM (powered by Quarkus ) started in 2.656s. Listening on: http://localhost:8181. Management interface listening on http://0.0.0.0:8182. -INFO [io.quarkus] [,] [,,,] (Quarkus Main Thread) Profile prod activated. Live Coding activated. -INFO [io.quarkus] [,] [,,,] (Quarkus Main Thread) Installed features: [...] -``` - -At this point, Polaris is running. - -When using a Gradle-launched Polaris instance in this tutorial, we'll launch an instance of Polaris that stores entities only in-memory. This means that any entities that you define will be destroyed when Polaris is shut down. -For more information on how to configure Polaris for production usage, see the [docs]({{% relref "../configuring-polaris-for-production" %}}). - -When Polaris is run using the `./gradlew run` command, the root principal credentials are `root` and `secret` for the `CLIENT_ID` and `CLIENT_SECRET`, respectively. - -### Installing Apache Spark and Trino Locally for Testing - -#### Apache Spark - -If you want to connect to Polaris with [Apache Spark](https://spark.apache.org/), you'll need to start by cloning Spark. As in the [prerequisites]({{% ref "install-dependencies#git" %}}), make sure [git](https://git-scm.com/) is installed first. - -Then, clone Spark and check out a versioned branch. This guide uses [Spark 3.5](https://spark.apache.org/releases/spark-release-3-5-0.html). - -```shell -git clone --branch branch-3.5 https://github.com/apache/spark.git ~/spark -``` - -#### Trino -If you want to connect to Polaris with [Trino](https://trino.io/), it is recommended to set up a test instance of Trino using Docker. As in the [prerequisites]({{% ref "install-dependencies#docker" %}}), make sure [Docker](https://www.docker.com/) is installed first - -```shell -docker run --name trino -d -p 8080:8080 trinodb/trino -``` - -## Next Steps -Congrats, you now have a running instance of Polaris! For further information regarding how to use Polaris, check out the [Using Polaris]({{% ref "using-polaris" %}}) page. \ No newline at end of file diff --git a/1.0.1/getting-started/using-polaris.md b/1.0.1/getting-started/using-polaris.md deleted file mode 100644 index 35f0bae336..0000000000 --- a/1.0.1/getting-started/using-polaris.md +++ /dev/null @@ -1,315 +0,0 @@ ---- -# -# Licensed to the Apache Software Foundation (ASF) under one -# or more contributor license agreements. See the NOTICE file -# distributed with this work for additional information -# regarding copyright ownership. The ASF licenses this file -# to you under the Apache License, Version 2.0 (the -# "License"); you may not use this file except in compliance -# with the License. You may obtain a copy of the License at -# -# http://www.apache.org/licenses/LICENSE-2.0 -# -# Unless required by applicable law or agreed to in writing, -# software distributed under the License is distributed on an -# "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY -# KIND, either express or implied. See the License for the -# specific language governing permissions and limitations -# under the License. -# -Title: Using Polaris -type: docs -weight: 400 ---- - -## Setup - -Ensure your `CLIENT_ID` & `CLIENT_SECRET` variables are already defined, as they were required for starting the Polaris server earlier. - -```shell -export CLIENT_ID=YOUR_CLIENT_ID -export CLIENT_SECRET=YOUR_CLIENT_SECRET -``` - -## Defining a Catalog - -In Polaris, the [catalog]({{% relref "../entities#catalog" %}}) is the top-level entity that objects like [tables]({{% relref "../entities#table" %}}) and [views]({{% relref "../entities#view" %}}) are organized under. With a Polaris service running, you can create a catalog like so: - -```shell -cd ~/polaris - -./polaris \ - --client-id ${CLIENT_ID} \ - --client-secret ${CLIENT_SECRET} \ - catalogs \ - create \ - --storage-type s3 \ - --default-base-location ${DEFAULT_BASE_LOCATION} \ - --role-arn ${ROLE_ARN} \ - quickstart_catalog -``` - -This will create a new catalog called **quickstart_catalog**. If you are using one of the Getting Started locally-built Docker images, we have already created a catalog named `quickstart_catalog` for you. - -The `DEFAULT_BASE_LOCATION` you provide will be the default location that objects in this catalog should be stored in, and the `ROLE_ARN` you provide should be a [Role ARN](https://docs.aws.amazon.com/IAM/latest/UserGuide/reference-arns.html) with access to read and write data in that location. These credentials will be provided to engines reading data from the catalog once they have authenticated with Polaris using credentials that have access to those resources. - -If you’re using a storage type other than S3, such as Azure, you’ll provide a different type of credential than a Role ARN. For more details on supported storage types, see the [docs]({{% relref "../entities#storage-type" %}}). - -Additionally, if Polaris is running somewhere other than `localhost:8181`, you can specify the correct hostname and port by providing `--host` and `--port` flags. For the full set of options supported by the CLI, please refer to the [docs]({{% relref "../command-line-interface" %}}). - - -### Creating a Principal and Assigning it Privileges - -With a catalog created, we can create a [principal]({{% relref "../entities#principal" %}}) that has access to manage that catalog. For details on how to configure the Polaris CLI, see [the section above](#defining-a-catalog) or refer to the [docs]({{% relref "../command-line-interface" %}}). - -```shell -./polaris \ - --client-id ${CLIENT_ID} \ - --client-secret ${CLIENT_SECRET} \ - principals \ - create \ - quickstart_user - -./polaris \ - --client-id ${CLIENT_ID} \ - --client-secret ${CLIENT_SECRET} \ - principal-roles \ - create \ - quickstart_user_role - -./polaris \ - --client-id ${CLIENT_ID} \ - --client-secret ${CLIENT_SECRET} \ - catalog-roles \ - create \ - --catalog quickstart_catalog \ - quickstart_catalog_role -``` - -Be sure to provide the necessary credentials, hostname, and port as before. - -When the `principals create` command completes successfully, it will return the credentials for this new principal. Export them for future use. For example: - -```shell -./polaris ... principals create example -{"clientId": "XXXX", "clientSecret": "YYYY"} -export USER_CLIENT_ID=XXXX -export USER_CLIENT_SECRET=YYYY -``` - -Now, we grant the principal the [principal role]({{% relref "../entities#principal-role" %}}) we created, and grant the [catalog role]({{% relref "../entities#catalog-role" %}}) the principal role we created. For more information on these entities, please refer to the linked documentation. - -```shell -./polaris \ - --client-id ${CLIENT_ID} \ - --client-secret ${CLIENT_SECRET} \ - principal-roles \ - grant \ - --principal quickstart_user \ - quickstart_user_role - -./polaris \ - --client-id ${CLIENT_ID} \ - --client-secret ${CLIENT_SECRET} \ - catalog-roles \ - grant \ - --catalog quickstart_catalog \ - --principal-role quickstart_user_role \ - quickstart_catalog_role -``` - -Now, we’ve linked our principal to the catalog via roles like so: - -![Principal to Catalog](/img/quickstart/privilege-illustration-1.png "Principal to Catalog") - -In order to give this principal the ability to interact with the catalog, we must assign some [privileges]({{% relref "../entities#privilege" %}}). For the time being, we will give this principal the ability to fully manage content in our new catalog. We can do this with the CLI like so: - -```shell -./polaris \ - --client-id ${CLIENT_ID} \ - --client-secret ${CLIENT_SECRET} \ - privileges \ - catalog \ - grant \ - --catalog quickstart_catalog \ - --catalog-role quickstart_catalog_role \ - CATALOG_MANAGE_CONTENT -``` - -This grants the [catalog privileges]({{% relref "../entities#privilege" %}}) `CATALOG_MANAGE_CONTENT` to our catalog role, linking everything together like so: - -![Principal to Catalog with Catalog Role](/img/quickstart/privilege-illustration-2.png "Principal to Catalog with Catalog Role") - -`CATALOG_MANAGE_CONTENT` has create/list/read/write privileges on all entities within the catalog. The same privilege could be granted to a namespace, in which case the principal could create/list/read/write any entity under that namespace. - -## Using Iceberg & Polaris - -At this point, we’ve created a principal and granted it the ability to manage a catalog. We can now use an external engine to assume that principal, access our catalog, and store data in that catalog using [Apache Iceberg](https://iceberg.apache.org/). Polaris is compatible with any [Apache Iceberg](https://iceberg.apache.org/) client that supports the REST API. Depending on the client you plan to use, refer to the respective examples below. - -### Connecting with Spark - -#### Using a Local Build of Spark - -To use a Polaris-managed catalog in [Apache Spark](https://spark.apache.org/), we can configure Spark to use the Iceberg catalog REST API. - -This guide uses [Apache Spark 3.5](https://spark.apache.org/releases/spark-release-3-5-0.html), but be sure to find [the appropriate iceberg-spark package for your Spark version](https://mvnrepository.com/artifact/org.apache.iceberg/iceberg-spark). From a local Spark clone on the `branch-3.5` branch we can run the following: - -_Note: the credentials provided here are those for our principal, not the root credentials._ - -```shell -bin/spark-sql \ ---packages org.apache.iceberg:iceberg-spark-runtime-3.5_2.12:1.9.0,org.apache.iceberg:iceberg-aws-bundle:1.9.0 \ ---conf spark.sql.extensions=org.apache.iceberg.spark.extensions.IcebergSparkSessionExtensions \ ---conf spark.sql.catalog.quickstart_catalog.warehouse=quickstart_catalog \ ---conf spark.sql.catalog.quickstart_catalog.header.X-Iceberg-Access-Delegation=vended-credentials \ ---conf spark.sql.catalog.quickstart_catalog=org.apache.iceberg.spark.SparkCatalog \ ---conf spark.sql.catalog.quickstart_catalog.catalog-impl=org.apache.iceberg.rest.RESTCatalog \ ---conf spark.sql.catalog.quickstart_catalog.uri=http://localhost:8181/api/catalog \ ---conf spark.sql.catalog.quickstart_catalog.credential='${USER_CLIENT_ID}:${USER_CLIENT_SECRET}' \ ---conf spark.sql.catalog.quickstart_catalog.scope='PRINCIPAL_ROLE:ALL' \ ---conf spark.sql.catalog.quickstart_catalog.token-refresh-enabled=true \ ---conf spark.sql.catalog.quickstart_catalog.client.region=us-west-2 -``` - -Similar to the CLI commands above, this configures Spark to use the Polaris running at `localhost:8181`. If your Polaris server is running elsewhere, but sure to update the configuration appropriately. - -Finally, note that we include the `iceberg-aws-bundle` package here. If your table is using a different filesystem, be sure to include the appropriate dependency. - -#### Using Spark SQL from a Docker container - -Refresh the Docker container with the user's credentials: -```shell -docker compose -p polaris -f getting-started/eclipselink/docker-compose.yml stop spark-sql -docker compose -p polaris -f getting-started/eclipselink/docker-compose.yml rm -f spark-sql -docker compose -p polaris -f getting-started/eclipselink/docker-compose.yml up -d --no-deps spark-sql -``` - -Attach to the running spark-sql container: - -```shell -docker attach $(docker ps -q --filter name=spark-sql) -``` - -#### Sample Commands - -Once the Spark session starts, we can create a namespace and table within the catalog: - -```sql -USE quickstart_catalog; -CREATE NAMESPACE IF NOT EXISTS quickstart_namespace; -CREATE NAMESPACE IF NOT EXISTS quickstart_namespace.schema; -USE NAMESPACE quickstart_namespace.schema; -CREATE TABLE IF NOT EXISTS quickstart_table (id BIGINT, data STRING) USING ICEBERG; -``` - -We can now use this table like any other: - -``` -INSERT INTO quickstart_table VALUES (1, 'some data'); -SELECT * FROM quickstart_table; -. . . -+---+---------+ -|id |data | -+---+---------+ -|1 |some data| -+---+---------+ -``` - -If at any time access is revoked... - -```shell -./polaris \ - --client-id ${CLIENT_ID} \ - --client-secret ${CLIENT_SECRET} \ - privileges \ - catalog \ - revoke \ - --catalog quickstart_catalog \ - --catalog-role quickstart_catalog_role \ - CATALOG_MANAGE_CONTENT -``` - -Spark will lose access to the table: - -``` -INSERT INTO quickstart_table VALUES (1, 'some data'); - -org.apache.iceberg.exceptions.ForbiddenException: Forbidden: Principal 'quickstart_user' with activated PrincipalRoles '[]' and activated grants via '[quickstart_catalog_role, quickstart_user_role]' is not authorized for op LOAD_TABLE_WITH_READ_DELEGATION -``` - -### Connecting with Trino - -Refresh the Docker container with the user's credentials: - -```shell -docker compose -p polaris -f getting-started/eclipselink/docker-compose.yml stop trino -docker compose -p polaris -f getting-started/eclipselink/docker-compose.yml rm -f trino -docker compose -p polaris -f getting-started/eclipselink/docker-compose.yml up -d --no-deps trino -``` - -Attach to the running Trino container: - -```shell -docker exec -it $(docker ps -q --filter name=trino) trino -``` - -You may not see Trino's prompt immediately, type ENTER to see it. A few commands that you can try: - -```sql -SHOW CATALOGS; -SHOW SCHEMAS FROM iceberg; -CREATE SCHEMA iceberg.quickstart_schema; -CREATE TABLE iceberg.quickstart_schema.quickstart_table AS SELECT 1 x; -SELECT * FROM iceberg.quickstart_schema.quickstart_table; -``` - -If at any time access is revoked... - -```shell -./polaris \ - --client-id ${CLIENT_ID} \ - --client-secret ${CLIENT_SECRET} \ - privileges \ - catalog \ - revoke \ - --catalog quickstart_catalog \ - --catalog-role quickstart_catalog_role \ - CATALOG_MANAGE_CONTENT -``` - -Trino will lose access to the table: - -```sql -SELECT * FROM iceberg.quickstart_schema.quickstart_table; - -org.apache.iceberg.exceptions.ForbiddenException: Forbidden: Principal 'quickstart_user' with activated PrincipalRoles '[]' and activated grants via '[quickstart_catalog_role, quickstart_user_role]' is not authorized for op LOAD_TABLE_WITH_READ_DELEGATION -``` - -### Connecting Using REST APIs - -To access Polaris from the host machine, first request an access token: - -```shell -export POLARIS_TOKEN=$(curl -s http://polaris:8181/api/catalog/v1/oauth/tokens \ - --resolve polaris:8181:127.0.0.1 \ - --user ${CLIENT_ID}:${CLIENT_SECRET} \ - -d 'grant_type=client_credentials' \ - -d 'scope=PRINCIPAL_ROLE:ALL' | jq -r .access_token) -``` - -Then, use the access token in the Authorization header when accessing Polaris: - -```shell -curl -v http://127.0.0.1:8181/api/management/v1/principal-roles -H "Authorization: Bearer $POLARIS_TOKEN" -curl -v http://127.0.0.1:8181/api/management/v1/catalogs/quickstart_catalog -H "Authorization: Bearer $POLARIS_TOKEN" -``` - -## Next Steps -* Visit [Configuring Polaris for Production]({{% relref "../configuring-polaris-for-production" %}}). -* A Getting Started experience for using Spark with Jupyter Notebooks is documented [here](https://github.com/apache/polaris/blob/main/getting-started/spark/README.md). -* To shut down a locally-deployed Polaris server and clean up all related Docker containers, run the command listed below. Cloud Deployments have their respective termination commands on their Deployment page, while Polaris running on Gradle will terminate when the Gradle process terminates. -```shell -docker compose -p polaris -f getting-started/assets/postgres/docker-compose-postgres.yml -f getting-started/jdbc/docker-compose-bootstrap-db.yml -f getting-started/jdbc/docker-compose.yml down -``` - - diff --git a/1.0.1/metastores.md b/1.0.1/metastores.md deleted file mode 100644 index 4810b124a0..0000000000 --- a/1.0.1/metastores.md +++ /dev/null @@ -1,151 +0,0 @@ ---- -# -# Licensed to the Apache Software Foundation (ASF) under one -# or more contributor license agreements. See the NOTICE file -# distributed with this work for additional information -# regarding copyright ownership. The ASF licenses this file -# to you under the Apache License, Version 2.0 (the -# "License"); you may not use this file except in compliance -# with the License. You may obtain a copy of the License at -# -# http://www.apache.org/licenses/LICENSE-2.0 -# -# Unless required by applicable law or agreed to in writing, -# software distributed under the License is distributed on an -# "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY -# KIND, either express or implied. See the License for the -# specific language governing permissions and limitations -# under the License. -# -title: Metastores -type: docs -weight: 700 ---- - -This page explains how to configure and use Polaris metastores with either the recommended Relational JDBC or the -deprecated EclipseLink persistence backends. - -## Relational JDBC -This implementation leverages Quarkus for datasource management and supports configuration through -environment variables or JVM -D flags at startup. For more information, refer to the [Quarkus configuration reference](https://quarkus.io/guides/config-reference#env-file). - - -``` -POLARIS_PERSISTENCE_TYPE=relational-jdbc - -QUARKUS_DATASOURCE_DB_KIND=postgresql -QUARKUS_DATASOURCE_USERNAME= -QUARKUS_DATASOURCE_PASSWORD= -QUARKUS_DATASOURCE_JDBC_URL= -``` - -The Relational JDBC metastore currently relies on a Quarkus-managed datasource and supports only PostgresSQL and H2 databases. This limitation is similar to that of EclipseLink, primarily due to underlying schema differences. At this time, official documentation is provided exclusively for usage with PostgreSQL. -Please refer to the documentation here: -[Configure data sources in Quarkus](https://quarkus.io/guides/datasource) - -Additionally the retries can be configured via `polaris.persistence.relational.jdbc.*` properties please ref [configuration](./configuration.md) - -## EclipseLink (Deprecated) -> [!IMPORTANT] Eclipse link is deprecated, its recommend to use Relational JDBC as persistence instead. - -Polaris includes EclipseLink plugin by default with PostgresSQL driver. - -Configure the `polaris.persistence` section in your Polaris configuration file -(`application.properties`) as follows: - -``` -polaris.persistence.type=eclipse-link -polaris.persistence.eclipselink.configuration-file=/path/to/persistence.xml -polaris.persistence.eclipselink.persistence-unit=polaris -``` - -Alternatively, configuration can also be done with environment variables or system properties. Refer -to the [Quarkus Configuration Reference] for more information. - -The `configuration-file` option must point to an [EclipseLink configuration file]. This file, named -`persistence.xml`, is used to set up the database connection properties, which can differ depending -on the type of database and its configuration. - -> Note: You have to locate the `persistence.xml` at least two folders down to the root folder, e.g. `/deployments/config/persistence.xml` is OK, whereas `/deployments/persistence.xml` will cause an infinity loop. -[Quarkus Configuration Reference]: https://quarkus.io/guides/config-reference -[EclipseLink configuration file]: https://eclipse.dev/eclipselink/documentation/4.0/solutions/solutions.html#TESTINGJPA002 - -Polaris creates and connects to a separate database for each realm. Specifically, the `{realm}` placeholder in `jakarta.persistence.jdbc.url` is substituted with the actual realm name, allowing the Polaris server to connect to different databases based on the realm. - -> Note: some database systems such as Postgres don't create databases automatically. Database admins need to create them manually before running Polaris server. - -A single `persistence.xml` can describe multiple [persistence units](https://eclipse.dev/eclipselink/documentation/4.0/concepts/concepts.html#APPDEV001). For example, with both a `polaris-dev` and `polaris` persistence unit defined, you could use a single `persistence.xml` to easily switch between development and production databases. Use the `persistence-unit` option in the Polaris server configuration to easily switch between persistence units. - -### Using H2 - -> [!IMPORTANT] H2 is an in-memory database and is not suitable for production! - -The default [persistence.xml] in Polaris is already configured for H2, but you can easily customize -your H2 configuration using the persistence unit template below: - -[persistence.xml]: https://github.com/apache/polaris/blob/main/extension/persistence/eclipselink/src/main/resources/META-INF/persistence.xml - -```xml - - org.eclipse.persistence.jpa.PersistenceProvider - org.apache.polaris.extension.persistence.impl.eclipselink.models.ModelEntity - org.apache.polaris.extension.persistence.impl.eclipselink.models.ModelEntityActive - org.apache.polaris.extension.persistence.impl.eclipselink.models.ModelEntityChangeTracking - org.apache.polaris.extension.persistence.impl.eclipselink.models.ModelEntityDropped - org.apache.polaris.extension.persistence.impl.eclipselink.models.ModelGrantRecord - org.apache.polaris.extension.persistence.impl.eclipselink.models.ModelPrincipalSecrets - org.apache.polaris.extension.persistence.impl.eclipselink.models.ModelSequenceId - NONE - - - - - - - -``` - -To build Polaris with the necessary H2 dependency and start the Polaris service, run the following: - -```shell -./gradlew \ - :polaris-server:assemble \ - :polaris-server:quarkusAppPartsBuild --rerun \ - -PeclipseLinkDeps=com.h2database:h2:2.3.232 -java -Dpolaris.persistence.type=eclipse-link \ - -Dpolaris.persistence.eclipselink.configuration-file=/path/to/persistence.xml \ - -Dpolaris.persistence.eclipselink.persistence-unit=polaris \ - -jar runtime/server/build/quarkus-app/quarkus-run.jar -``` - -### Using Postgres - -PostgreSQL is included by default in the Polaris server distribution. - -The following shows a sample configuration for integrating Polaris with Postgres. - -```xml - - org.eclipse.persistence.jpa.PersistenceProvider - org.apache.polaris.extension.persistence.impl.eclipselink.models.ModelEntity - org.apache.polaris.extension.persistence.impl.eclipselink.models.ModelEntityActive - org.apache.polaris.extension.persistence.impl.eclipselink.models.ModelEntityChangeTracking - org.apache.polaris.extension.persistence.impl.eclipselink.models.ModelEntityDropped - org.apache.polaris.extension.persistence.impl.eclipselink.models.ModelGrantRecord - org.apache.polaris.extension.persistence.impl.eclipselink.models.ModelPrincipalSecrets - org.apache.polaris.extension.persistence.impl.eclipselink.models.ModelSequenceId - NONE - - - - - - - - - - -``` - diff --git a/1.0.1/polaris-catalog-service.md b/1.0.1/polaris-catalog-service.md deleted file mode 100644 index 02fed63f46..0000000000 --- a/1.0.1/polaris-catalog-service.md +++ /dev/null @@ -1,26 +0,0 @@ ---- -# -# Licensed to the Apache Software Foundation (ASF) under one -# or more contributor license agreements. See the NOTICE file -# distributed with this work for additional information -# regarding copyright ownership. The ASF licenses this file -# to you under the Apache License, Version 2.0 (the -# "License"); you may not use this file except in compliance -# with the License. You may obtain a copy of the License at -# -# http://www.apache.org/licenses/LICENSE-2.0 -# -# Unless required by applicable law or agreed to in writing, -# software distributed under the License is distributed on an -# "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY -# KIND, either express or implied. See the License for the -# specific language governing permissions and limitations -# under the License. -# -linkTitle: 'Catalog API Spec' -weight: 900 -params: - show_page_toc: false ---- - -{{< redoc-polaris "generated/bundled-polaris-catalog-service.yaml" >}} diff --git a/1.0.1/polaris-management-service.md b/1.0.1/polaris-management-service.md deleted file mode 100644 index 0b66b9daa4..0000000000 --- a/1.0.1/polaris-management-service.md +++ /dev/null @@ -1,27 +0,0 @@ ---- -# -# Licensed to the Apache Software Foundation (ASF) under one -# or more contributor license agreements. See the NOTICE file -# distributed with this work for additional information -# regarding copyright ownership. The ASF licenses this file -# to you under the Apache License, Version 2.0 (the -# "License"); you may not use this file except in compliance -# with the License. You may obtain a copy of the License at -# -# http://www.apache.org/licenses/LICENSE-2.0 -# -# Unless required by applicable law or agreed to in writing, -# software distributed under the License is distributed on an -# "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY -# KIND, either express or implied. See the License for the -# specific language governing permissions and limitations -# under the License. -# -title: 'Apache Polaris Management Service OpenAPI' -linkTitle: 'Management OpenAPI' -weight: 800 -params: - show_page_toc: false ---- - -{{< redoc-polaris "polaris-management-service.yml" >}} diff --git a/1.0.1/polaris-spark-client.md b/1.0.1/polaris-spark-client.md deleted file mode 100644 index 1aa519de26..0000000000 --- a/1.0.1/polaris-spark-client.md +++ /dev/null @@ -1,130 +0,0 @@ ---- -# -# Licensed to the Apache Software Foundation (ASF) under one -# or more contributor license agreements. See the NOTICE file -# distributed with this work for additional information -# regarding copyright ownership. The ASF licenses this file -# to you under the Apache License, Version 2.0 (the -# "License"); you may not use this file except in compliance -# with the License. You may obtain a copy of the License at -# -# http://www.apache.org/licenses/LICENSE-2.0 -# -# Unless required by applicable law or agreed to in writing, -# software distributed under the License is distributed on an -# "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY -# KIND, either express or implied. See the License for the -# specific language governing permissions and limitations -# under the License. -# -Title: Polaris Spark Client -type: docs -weight: 650 ---- - -Apache Polaris now provides Catalog support for Generic Tables (non-Iceberg tables), please check out -the [Catalog API Spec]({{% ref "polaris-catalog-service" %}}) for Generic Table API specs. - -Along with the Generic Table Catalog support, Polaris is also releasing a Spark client, which helps to -provide an end-to-end solution for Apache Spark to manage Delta tables using Polaris. - -Note the Polaris Spark client is able to handle both Iceberg and Delta tables, not just Delta. - -This page documents how to connect Spark with Polaris Service using the Polaris Spark client. - -## Quick Start with Local Polaris service -If you want to quickly try out the functionality with a local Polaris service, simply check out the Polaris repo -and follow the instructions in the Spark plugin getting-started -[README](https://github.com/apache/polaris/blob/main/plugins/spark/v3.5/getting-started/README.md). - -Check out the Polaris repo: -```shell -cd ~ -git clone https://github.com/apache/polaris.git -``` - -## Start Spark against a deployed Polaris service -Before starting, ensure that the deployed Polaris service supports Generic Tables, and that Spark 3.5(version 3.5.3 or later is installed). -Spark 3.5.5 is recommended, and you can follow the instructions below to get a Spark 3.5.5 distribution. -```shell -cd ~ -wget https://archive.apache.org/dist/spark/spark-3.5.5/spark-3.5.5-bin-hadoop3.tgz -mkdir spark-3.5 -tar xzvf spark-3.5.5-bin-hadoop3.tgz -C spark-3.5 --strip-components=1 -cd spark-3.5 -``` - -### Connecting with Spark using the Polaris Spark client -The following CLI command can be used to start the Spark with connection to the deployed Polaris service using -a released Polaris Spark client. - -```shell -bin/spark-shell \ ---packages ,org.apache.iceberg:iceberg-aws-bundle:1.9.0,io.delta:delta-spark_2.12:3.3.1 \ ---conf spark.sql.extensions=org.apache.iceberg.spark.extensions.IcebergSparkSessionExtensions,io.delta.sql.DeltaSparkSessionExtension \ ---conf spark.sql.catalog.spark_catalog=org.apache.spark.sql.delta.catalog.DeltaCatalog \ ---conf spark.sql.catalog..warehouse= \ ---conf spark.sql.catalog..header.X-Iceberg-Access-Delegation=vended-credentials \ ---conf spark.sql.catalog.=org.apache.polaris.spark.SparkCatalog \ ---conf spark.sql.catalog..uri= \ ---conf spark.sql.catalog..credential=':' \ ---conf spark.sql.catalog..scope='PRINCIPAL_ROLE:ALL' \ ---conf spark.sql.catalog..token-refresh-enabled=true -``` -Assume the released Polaris Spark client you want to use is `org.apache.polaris:polaris-spark-3.5_2.12:1.0.1`, -replace the `polaris-spark-client-package` field with the release. - -The `spark-catalog-name` is the catalog name you will use with Spark, and `polaris-catalog-name` is the catalog name used -by Polaris service, for simplicity, you can use the same name. - -Replace the `polaris-service-uri` with the uri of the deployed Polaris service. For example, with a locally deployed -Polaris service, the uri would be `http://localhost:8181/api/catalog`. - -For `client-id` and `client-secret` values, you can refer to [Using Polaris]({{% ref "getting-started/using-polaris" %}}) -for more details. - -You can also start the connection by programmatically initialize a SparkSession, following is an example with PySpark: -```python -from pyspark.sql import SparkSession - -spark = SparkSession.builder - .config("spark.jars.packages", ",org.apache.iceberg:iceberg-aws-bundle:1.9.0,io.delta:delta-spark_2.12:3.3.1") - .config("spark.sql.catalog.spark_catalog", "org.apache.spark.sql.delta.catalog.DeltaCatalog") - .config("spark.sql.extensions", "org.apache.iceberg.spark.extensions.IcebergSparkSessionExtensions,io.delta.sql.DeltaSparkSessionExtension") - .config("spark.sql.catalog.", "org.apache.polaris.spark.SparkCatalog") - .config("spark.sql.catalog..uri", ) - .config("spark.sql.catalog..token-refresh-enabled", "true") - .config("spark.sql.catalog..credential", ":") - .config("spark.sql.catalog..warehouse", ) - .config("spark.sql.catalog.polaris.scope", 'PRINCIPAL_ROLE:ALL') - .config("spark.sql.catalog.polaris.header.X-Iceberg-Access-Delegation", 'vended-credentials') - .getOrCreate() -``` -Similar as the CLI command, make sure the corresponding fields are replaced correctly. - -### Create tables with Spark -After Spark is started, you can use it to create and access Iceberg and Delta tables, for example: -```python -spark.sql("USE polaris") -spark.sql("CREATE NAMESPACE IF NOT EXISTS DELTA_NS") -spark.sql("CREATE NAMESPACE IF NOT EXISTS DELTA_NS.PUBLIC") -spark.sql("USE NAMESPACE DELTA_NS.PUBLIC") -spark.sql("""CREATE TABLE IF NOT EXISTS PEOPLE ( - id int, name string) -USING delta LOCATION 'file:///tmp/var/delta_tables/people'; -""") -``` - -## Connecting with Spark using local Polaris Spark client jar -If you would like to use a version of the Spark client that is currently not yet released, you can -build a Spark client jar locally from source. Please check out the Polaris repo and refer to the Spark plugin -[README](https://github.com/apache/polaris/blob/main/plugins/spark/README.md) for detailed instructions. - -## Limitations -The Polaris Spark client has the following functionality limitations: -1) Create table as select (CTAS) is not supported for Delta tables. As a result, the `saveAsTable` method of `Dataframe` - is also not supported, since it relies on the CTAS support. -2) Create a Delta table without explicit location is not supported. -3) Rename a Delta table is not supported. -4) ALTER TABLE ... SET LOCATION is not supported for DELTA table. -5) For other non-Iceberg tables like csv, it is not supported. diff --git a/1.0.1/policy.md b/1.0.1/policy.md deleted file mode 100644 index 3f49353884..0000000000 --- a/1.0.1/policy.md +++ /dev/null @@ -1,197 +0,0 @@ ---- -# -# Licensed to the Apache Software Foundation (ASF) under one -# or more contributor license agreements. See the NOTICE file -# distributed with this work for additional information -# regarding copyright ownership. The ASF licenses this file -# to you under the Apache License, Version 2.0 (the -# "License"); you may not use this file except in compliance -# with the License. You may obtain a copy of the License at -# -# http://www.apache.org/licenses/LICENSE-2.0 -# -# Unless required by applicable law or agreed to in writing, -# software distributed under the License is distributed on an -# "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY -# KIND, either express or implied. See the License for the -# specific language governing permissions and limitations -# under the License. -# -title: Policy -type: docs -weight: 425 ---- - -The Polaris Policy framework empowers organizations to centrally define, manage, and enforce fine-grained governance, lifecycle, and operational rules across all data resources in the catalog. - -With the policy API, you can: -- Create and manage policies -- Attach policies to specific resources (catalogs, namespaces, tables, or views) -- Check applicable policies for any given resource - -## What is a Policy? - -A policy in Apache Polaris is a structured entity that defines rules governing actions on specified resources under -predefined conditions. Each policy contains: - -- **Name**: A unique identifier within a namespace -- **Type**: Determines the semantics and expected format of the policy content -- **Description**: Explains the purpose of the policy -- **Content**: Contains the actual rules defining the policy behavior -- **Version**: An automatically tracked revision number -- **Inheritable**: Whether the policy can be inherited by child resources, decided by its type - -### Policy Types - -Polaris supports several predefined system policy types (prefixed with `system.`): - -| Policy Type | Purpose | JSON-Schema | Applies To | -|-------------|-------------------------------------------------------|-------------|------------| -| **`system.data-compaction`** | Defines rules for data file compaction operations | [`data-compaction/2025-02-03.json`](https://polaris.apache.org/schemas/policies/system/data-compaction/2025-02-03.json) | Iceberg **table**, **namespace**, **catalog** | -| **`system.metadata-compaction`** | Defines rules for metadata file compaction operations | [`metadata-compaction/2025-02-03.json`](https://polaris.apache.org/schemas/policies/system/metadata-compaction/2025-02-03.json) | Iceberg **table**, **namespace**, **catalog** | -| **`system.orphan-file-removal`** | Defines rules for removing orphaned files | [`orphan-file-removal/2025-02-03.json`](https://polaris.apache.org/schemas/policies/system/orphan-file-removal/2025-02-03.json) | Iceberg **table**, **namespace**, **catalog** | -| **`system.snapshot-expiry`** | Defines rules for snapshot expiration | [`snapshot-expiry/2025-02-03.json`](https://polaris.apache.org/schemas/policies/system/snapshot-expiry/2025-02-03.json) | Iceberg **table**, **namespace**, **catalog** | - -Support for additional predefined system policy types and custom policy type definitions is in progress. -For more details, please refer to the [roadmap](https://github.com/apache/polaris/discussions/1028). - -### Policy Inheritance - -The entity hierarchy in Polaris is structured as follows: - -``` - Catalog - | - Namespace - | - +-----------+----------+ - | | | -Iceberg Iceberg Generic - Table View Table -``` - -Policies can be attached at any level, and inheritance flows from catalog down to namespace, then to tables and views. - -Policies can be inheritable or non-inheritable: - -- **Inheritable policies**: Apply to the target resource and all its applicable child resources -- **Non-inheritable policies**: Apply only to the specific target resource - -The inheritance follows an override mechanism: -1. Table-level policies override namespace and catalog policies -2. Namespace-level policies override parent namespace and catalog policies - -> **Important:** Because an override completely replaces the same policy type at higher levels, -> **only one instance of a given policy type can be attached to (and therefore affect) a resource**. - -## Working with Policies - -### Creating a Policy - -To create a policy, you need to provide a name, type, and optionally a description and content: - -```json -POST /polaris/v1/{prefix}/namespaces/{namespace}/policies -{ - "name": "compaction-policy", - "type": "system.data-compaction", - "description": "Policy for optimizing table storage", - "content": "{\"version\": \"2025-02-03\", \"enable\": true, \"config\": {\"target_file_size_bytes\": 134217728}}" -} -``` - -The policy content is validated against a schema specific to its type. Here are a few policy content examples: -- Data Compaction Policy -```json -{ - "version": "2025-02-03", - "enable": true, - "config": { - "target_file_size_bytes": 134217728, - "compaction_strategy": "bin-pack", - "max-concurrent-file-group-rewrites": 5 - } -} -``` -- Orphan File Removal Policy -```json -{ - "version": "2025-02-03", - "enable": true, - "max_orphan_file_age_in_days": 30, - "locations": ["s3://my-bucket/my-table-location"], - "config": { - "prefix_mismatch_mode": "ignore" - } -} -``` - -### Attaching Policies to Resources - -Policies can be attached to different resource levels: - -1. **Catalog level**: Applies to the entire catalog -2. **Namespace level**: Applies to a specific namespace -3. **Table-like level**: Applies to individual tables or views - -Example of attaching a policy to a table: - -```json -PUT /polaris/v1/{prefix}/namespaces/{namespace}/policies/{policy-name}/mappings -{ - "target": { - "type": "table-like", - "path": ["NS1", "NS2", "test_table_1"] - } -} -``` - -For inheritable policies, only one policy of a given type can be attached to a resource. For non-inheritable policies, -multiple policies of the same type can be attached. - -### Retrieving Applicable Policies -A user can view applicable policies on a resource (e.g., table, namespace, or catalog) as long as they have -read permission on that resource. - -Here is an example to find all policies that apply to a specific resource (including inherited policies): -``` -GET /polaris/v1/catalog/applicable-policies?namespace=finance%1Fquarterly&target-name=transactions -``` - -**Sample response:** -```json -{ - "policies": [ - { - "name": "snapshot-expiry-policy", - "type": "system.snapshot-expiry", - "appliedAt": "namespace", - "content": { - "version": "2025-02-03", - "enable": true, - "config": { - "min_snapshot_to_keep": 1, - "max_snapshot_age_days": 2, - "max_ref_age_days": 3 - } - } - }, - { - "name": "compaction-policy", - "type": "system.data-compaction", - "appliedAt": "catalog", - "content": { - "version": "2025-02-03", - "enable": true, - "config": { - "target_file_size_bytes": 134217728 - } - } - } - ] -} -``` - -### API Reference - -For the complete and up-to-date API specification, see the [policy-api.yaml](https://github.com/apache/polaris/blob/main/spec/polaris-catalog-apis/policy-apis.yaml). \ No newline at end of file diff --git a/1.0.1/realm.md b/1.0.1/realm.md deleted file mode 100644 index 9da5e7e25b..0000000000 --- a/1.0.1/realm.md +++ /dev/null @@ -1,53 +0,0 @@ ---- -# -# Licensed to the Apache Software Foundation (ASF) under one -# or more contributor license agreements. See the NOTICE file -# distributed with this work for additional information -# regarding copyright ownership. The ASF licenses this file -# to you under the Apache License, Version 2.0 (the -# "License"); you may not use this file except in compliance -# with the License. You may obtain a copy of the License at -# -# http://www.apache.org/licenses/LICENSE-2.0 -# -# Unless required by applicable law or agreed to in writing, -# software distributed under the License is distributed on an -# "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY -# KIND, either express or implied. See the License for the -# specific language governing permissions and limitations -# under the License. -# -Title: Realm -type: docs -weight: 350 ---- - -This page explains what a realm is and what it is used for in Polaris. - -### What is it? - -A realm in Polaris serves as logical partitioning mechanism within the catalog system. This isolation allows for multitenancy, enabling different teams, environments or organizations to operate independently within the same Polaris deployment. - -### Key Characteristics - -**Isolation:** Each realm encapsulates its own set of resources, ensuring that operations, policies in one realm do not affect others. - -**Authentication Context:** When configuring Polaris, principals credentials are associated with a specific realm. This allows for the separation of security concerns across different realms. - -**Configuration Scope:** Realm identifiers are used in various configurations, such as connection strings feature configurations, etc. - -An example of this is: - -`jdbc:postgresql://localhost:5432/{realm} -` -This ensures that each realm's data is stored separately. - -### How is it used in the system? - -**RealmContext:** It is a key concept used to identify and resolve the context in which operations are performed. For example `DefaultRealmContextResolver`, a realm is resolved from request headers, and operations are performed based on the resolved realm identifier. - -**Authentication and Authorization:** For example, in `BasePolarisAuthenticator`, `RealmContext` is used to provide context about the current security domain, which is used to retrieve the correct `PolarisMetastoreManager` that manages all Polaris entities and associated grant records metadata for -authorization. - -**Isolation:** In methods like `createEntityManagerFactory(@Nonnull RealmContext realmContext)` from `PolarisEclipseLinkPersistenceUnit` interface, the realm context influence how resources are created or managed based on the security policies of that realm. -An example of this is the way a realm name can be used to create a database connection url so that you have one database instance per realm, when applicable. Or it can be more granular and applied at primary key level (within the same database instance). \ No newline at end of file diff --git a/1.0.1/telemetry.md b/1.0.1/telemetry.md deleted file mode 100644 index 8df97f505d..0000000000 --- a/1.0.1/telemetry.md +++ /dev/null @@ -1,192 +0,0 @@ ---- -# -# Licensed to the Apache Software Foundation (ASF) under one -# or more contributor license agreements. See the NOTICE file -# distributed with this work for additional information -# regarding copyright ownership. The ASF licenses this file -# to you under the Apache License, Version 2.0 (the -# "License"); you may not use this file except in compliance -# with the License. You may obtain a copy of the License at -# -# http://www.apache.org/licenses/LICENSE-2.0 -# -# Unless required by applicable law or agreed to in writing, -# software distributed under the License is distributed on an -# "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY -# KIND, either express or implied. See the License for the -# specific language governing permissions and limitations -# under the License. -# -title: Telemetry -type: docs -weight: 450 ---- - -## Metrics - -Metrics are published using [Micrometer]; they are available from Polaris's management interface -(port 8282 by default) under the path `/q/metrics`. For example, if the server is running on -localhost, the metrics can be accessed via http://localhost:8282/q/metrics. - -[Micrometer]: https://quarkus.io/guides/telemetry-micrometer - -Metrics can be scraped by Prometheus or any compatible metrics scraping server. See: -[Prometheus](https://prometheus.io) for more information. - -Additional tags can be added to the metrics by setting the `polaris.metrics.tags.*` property. Each -tag is a key-value pair, where the key is the tag name and the value is the tag value. For example, -to add a tag `environment=prod` to all metrics, set `polaris.metrics.tags.environment=prod`. Many -tags can be added, such as below: - -```properties -polaris.metrics.tags.service=polaris -polaris.metrics.tags.environment=prod -polaris.metrics.tags.region=us-west-2 -``` - -Note that by default Polaris adds one tag: `application=Polaris`. You can override this tag by -setting the `polaris.metrics.tags.application=` property. - -### Realm ID Tag - -Polaris can add the realm ID as a tag to all API and HTTP request metrics. This is disabled by -default to prevent high cardinality issues, but can be enabled by setting the following properties: - -```properties -polaris.metrics.realm-id-tag.enable-in-api-metrics=true -polaris.metrics.realm-id-tag.enable-in-http-metrics=true -``` - -You should be particularly careful when enabling the realm ID tag in HTTP request metrics, as these -metrics typically have a much higher cardinality than API request metrics. - -In order to prevent the number of tags from growing indefinitely and causing performance issues or -crashing the server, the number of unique realm IDs in HTTP request metrics is limited to 100 by -default. If the number of unique realm IDs exceeds this value, a warning will be logged and no more -HTTP request metrics will be recorded. This threshold can be changed by setting the -`polaris.metrics.realm-id-tag.http-metrics-max-cardinality` property. - -## Traces - -Traces are published using [OpenTelemetry]. - -[OpenTelemetry]: https://quarkus.io/guides/opentelemetry-tracing - -By default OpenTelemetry is disabled in Polaris, because there is no reasonable default -for the collector endpoint for all cases. - -To enable OpenTelemetry and publish traces for Polaris set `quarkus.otel.sdk.disabled=false` -and configure a valid collector endpoint URL with `http://` or `https://` as the server property -`quarkus.otel.exporter.otlp.traces.endpoint`. - -_If these properties are not set, the server will not publish traces._ - -The collector must talk the OpenTelemetry protocol (OTLP) and the port must be its gRPC port -(by default 4317), e.g. "http://otlp-collector:4317". - -By default, Polaris adds a few attributes to the [OpenTelemetry Resource] to identify the server, -and notably: - -- `service.name`: set to `Apache Polaris Server (incubating)`; -- `service.version`: set to the Polaris version. - -[OpenTelemetry Resource]: https://opentelemetry.io/docs/languages/js/resources/ - -You can override the default resource attributes or add additional ones by setting the -`quarkus.otel.resource.attributes` property. - -This property expects a comma-separated list of key-value pairs, where the key is the attribute name -and the value is the attribute value. For example, to change the service name to `Polaris` and add -an attribute `deployment.environment=dev`, set the following property: - -```properties -quarkus.otel.resource.attributes=service.name=Polaris,deployment.environment=dev -``` - -The alternative syntax below can also be used: - -```properties -quarkus.otel.resource.attributes[0]=service.name=Polaris -quarkus.otel.resource.attributes[1]=deployment.environment=dev -``` - -Finally, two additional span attributes are added to all request parent spans: - -- `polaris.request.id`: The unique identifier of the request, if set by the caller through the - `Polaris-Request-Id` header. -- `polaris.realm`: The unique identifier of the realm. Always set (unless the request failed because - of a realm resolution error). - -### Troubleshooting Traces - -If the server is unable to publish traces, check first for a log warning message like the following: - -``` -SEVERE [io.ope.exp.int.grp.OkHttpGrpcExporter] (OkHttp http://localhost:4317/...) Failed to export spans. -The request could not be executed. Full error message: Failed to connect to localhost/0:0:0:0:0:0:0:1:4317 -``` - -This means that the server is unable to connect to the collector. Check that the collector is -running and that the URL is correct. - -## Logging - -Polaris relies on [Quarkus](https://quarkus.io/guides/logging) for logging. - -By default, logs are written to the console and to a file located in the `./logs` directory. The log -file is rotated daily and compressed. The maximum size of the log file is 10MB, and the maximum -number of backup files is 14. - -JSON logging can be enabled by setting the `quarkus.log.console.json` and `quarkus.log.file.json` -properties to `true`. By default, JSON logging is disabled. - -The log level can be set for the entire application or for specific packages. The default log level -is `INFO`. To set the log level for the entire application, use the `quarkus.log.level` property. - -To set the log level for a specific package, use the `quarkus.log.category."package-name".level`, -where `package-name` is the name of the package. For example, the package `io.smallrye.config` has a -useful logger to help debugging configuration issues; but it needs to be set to the `DEBUG` level. -This can be done by setting the following property: - -```properties -quarkus.log.category."io.smallrye.config".level=DEBUG -``` - -The log message format for both console and file output is highly configurable. The default format -is: - -``` -%d{yyyy-MM-dd HH:mm:ss,SSS} %-5p [%c{3.}] [%X{requestId},%X{realmId}] [%X{traceId},%X{parentId},%X{spanId},%X{sampled}] (%t) %s%e%n -``` - -Refer to the [Logging format](https://quarkus.io/guides/logging#logging-format) guide for more -information on placeholders and how to customize the log message format. - -### MDC Logging - -Polaris uses Mapped Diagnostic Context (MDC) to enrich log messages with additional context. The -following MDC keys are available: - -- `requestId`: The unique identifier of the request, if set by the caller through the - `Polaris-Request-Id` header. -- `realmId`: The unique identifier of the realm. Always set. -- `traceId`: The unique identifier of the trace. Present if tracing is enabled and the message is - originating from a traced context. -- `parentId`: The unique identifier of the parent span. Present if tracing is enabled and the - message is originating from a traced context. -- `spanId`: The unique identifier of the span. Present if tracing is enabled and the message is - originating from a traced context. -- `sampled`: Whether the trace has been sampled. Present if tracing is enabled and the message is - originating from a traced context. - -Other MDC keys can be added by setting the `polaris.log.mdc.*` property. Each property is a -key-value pair, where the key is the MDC key name and the value is the MDC key value. For example, -to add the MDC keys `environment=prod` and `region=us-west-2` to all log messages, set the following -properties: - -```properties -polaris.log.mdc.environment=prod -polaris.log.mdc.region=us-west-2 -``` - -MDC context is propagated across threads, including in `TaskExecutor` threads. \ No newline at end of file diff --git a/1.1.0/_index.md b/1.1.0/_index.md deleted file mode 100644 index 02cad78409..0000000000 --- a/1.1.0/_index.md +++ /dev/null @@ -1,179 +0,0 @@ ---- -# -# Licensed to the Apache Software Foundation (ASF) under one -# or more contributor license agreements. See the NOTICE file -# distributed with this work for additional information -# regarding copyright ownership. The ASF licenses this file -# to you under the Apache License, Version 2.0 (the -# "License"); you may not use this file except in compliance -# with the License. You may obtain a copy of the License at -# -# http://www.apache.org/licenses/LICENSE-2.0 -# -# Unless required by applicable law or agreed to in writing, -# software distributed under the License is distributed on an -# "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY -# KIND, either express or implied. See the License for the -# specific language governing permissions and limitations -# under the License. -# -linkTitle: '1.1.0' -title: 'Overview' -type: docs -weight: 200 -params: - top_hidden: true - show_page_toc: false -cascade: - type: docs - params: - show_page_toc: true -# This file will NOT be copied into a new release's versioned docs folder. ---- - -Apache Polaris (Incubating) is a catalog implementation for Apache Iceberg™ tables and is built on the open source Apache Iceberg™ REST protocol. - -With Polaris, you can provide centralized, secure read and write access to your Iceberg tables across different REST-compatible query engines. - -![Conceptual diagram of Apache Polaris (Incubating).](/img/overview.svg "Apache Polaris (Incubating) overview") - -## Key concepts - -This section introduces key concepts associated with using Apache Polaris (Incubating). - -In the following diagram, a sample [Apache Polaris (Incubating) structure](#catalog) with nested [namespaces](#namespace) is shown for Catalog1. No tables -or namespaces have been created yet for Catalog2 or Catalog3. - -![Diagram that shows an example Apache Polaris (Incubating) structure.](/img/sample-catalog-structure.svg "Sample Apache Polaris (Incubating) structure") - -### Catalog - -In Polaris, you can create one or more catalog resources to organize Iceberg tables. - -Configure your catalog by setting values in the storage configuration for S3, Azure, or Google Cloud Storage. An Iceberg catalog enables a -query engine to manage and organize tables. The catalog forms the first architectural layer in the [Apache Iceberg™ table specification](https://iceberg.apache.org/spec/#overview) and must support the following tasks: - -- Storing the current metadata pointer for one or more Iceberg tables. A metadata pointer maps a table name to the location of that table's - current metadata file. - -- Performing atomic operations so that you can update the current metadata pointer for a table to the metadata pointer of a new version of - the table. - -To learn more about Iceberg catalogs, see the [Apache Iceberg™ documentation](https://iceberg.apache.org/concepts/catalog/). - -#### Catalog types - -A catalog can be one of the following two types: - -- Internal: The catalog is managed by Polaris. Tables from this catalog can be read and written in Polaris. - -- External: The catalog is externally managed by another Iceberg catalog provider (for example, Snowflake, Glue, Dremio Arctic). Tables from - this catalog are synced to Polaris. These tables are read-only in Polaris. - -A catalog is configured with a storage configuration that can point to S3, Azure storage, or GCS. - -### Namespace - -You create *namespaces* to logically group Iceberg tables within a catalog. A catalog can have multiple namespaces. You can also create -nested namespaces. Iceberg tables belong to namespaces. - -> [!Important] -> For the access privileges defined for a catalog to be enforced correctly, the following conditions must be met: -> -> - The directory only contains the data files that belong to a single table. -> - The directory hierarchy matches the namespace hierarchy for the catalog. -> -> For example, if a catalog includes the following items: -> -> - Top-level namespace namespace1 -> - Nested namespace namespace1a -> - A customers table, which is grouped under nested namespace namespace1a -> - An orders table, which is grouped under nested namespace namespace1a -> -> The directory hierarchy for the catalog must follow this structure: -> -> - /namespace1/namespace1a/customers/ -> - /namespace1/namespace1a/orders/ - -### Storage configuration - -A storage configuration stores a generated identity and access management (IAM) entity for your cloud storage and is created -when you create a catalog. The storage configuration is used to set the values to connect Polaris to your cloud storage. During the -catalog creation process, an IAM entity is generated and used to create a trust relationship between the cloud storage provider and Polaris -Catalog. - -When you create a catalog, you supply the following information about your cloud storage: - -| Cloud storage provider | Information | -| -----------------------| ----------- | -| Amazon S3 |
  • Default base location for your Amazon S3 bucket
  • Locations for your Amazon S3 bucket
  • S3 role ARN
  • External ID (optional)
| -| Google Cloud Storage (GCS) |
  • Default base location for your GCS bucket
  • Locations for your GCS bucket
| -| Azure |
  • Default base location for your Microsoft Azure container
  • Locations for your Microsoft Azure container
  • Azure tenant ID
| - -## Example workflow - -In the following example workflow, Bob creates an Apache Iceberg™ table named Table1 and Alice reads data from Table1. - -1. Bob uses Apache Spark™ to create the Table1 table under the - Namespace1 namespace in the Catalog1 catalog and insert values into - Table1. - - Bob can create Table1 and insert data into it because he is using a - service connection with a service principal that has - the privileges to perform these actions. - -2. Alice uses Snowflake to read data from Table1. - - Alice can read data from Table1 because she is using a service - connection with a service principal with a catalog integration that - has the privileges to perform this action. Alice - creates an unmanaged table in Snowflake to read data from Table1. - -![Diagram that shows an example workflow for Apache Polaris (Incubating)](/img/example-workflow.svg "Example workflow for Apache Polaris (Incubating)") - -## Security and access control - -### Credential vending - -To secure interactions with service connections, Polaris vends temporary storage credentials to the query engine during query -execution. These credentials allow the query engine to run the query without requiring access to your cloud storage for -Iceberg tables. This process is called credential vending. - -As of now, the following limitation is known regarding Apache Iceberg support: - -- **remove_orphan_files:** Apache Spark can't use credential vending - for this due to a known issue. See [apache/iceberg#7914](https://github.com/apache/iceberg/pull/7914) for details. - -### Identity and access management (IAM) - -Polaris uses the identity and access management (IAM) entity to securely connect to your storage for accessing table data, Iceberg -metadata, and manifest files that store the table schema, partitions, and other metadata. Polaris retains the IAM entity for your -storage location. - -### Access control - -Polaris enforces the access control that you configure across all tables registered with the service and governs security for all -queries from query engines in a consistent manner. - -Polaris uses a role-based access control (RBAC) model that lets you centrally configure access for Polaris service principals to catalogs, -namespaces, and tables. - -Polaris RBAC uses two different role types to delegate privileges: - -- **Principal roles:** Granted to Polaris service principals and - analogous to roles in other access control systems that you grant to - service principals. - -- **Catalog roles:** Configured with certain privileges on Polaris - catalog resources and granted to principal roles. - -For more information, see [Access control]({{% ref "access-control" %}}). - -## Legal Notices - -Apache®, Apache Iceberg™, Apache Spark™, Apache Flink®, and Flink® are either registered trademarks or trademarks of the Apache Software Foundation in the United States and/or other countries. - - - diff --git a/1.1.0/access-control.md b/1.1.0/access-control.md deleted file mode 100644 index 727b4e60f3..0000000000 --- a/1.1.0/access-control.md +++ /dev/null @@ -1,200 +0,0 @@ ---- -# -# Licensed to the Apache Software Foundation (ASF) under one -# or more contributor license agreements. See the NOTICE file -# distributed with this work for additional information -# regarding copyright ownership. The ASF licenses this file -# to you under the Apache License, Version 2.0 (the -# "License"); you may not use this file except in compliance -# with the License. You may obtain a copy of the License at -# -# http://www.apache.org/licenses/LICENSE-2.0 -# -# Unless required by applicable law or agreed to in writing, -# software distributed under the License is distributed on an -# "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY -# KIND, either express or implied. See the License for the -# specific language governing permissions and limitations -# under the License. -# -Title: Access Control -type: docs -weight: 500 ---- - -This section provides information about how access control works for Apache Polaris (Incubating). - -Polaris uses a role-based access control (RBAC) model in which the Polaris administrator assigns access privileges to catalog roles -and then grants access to resources to principals by assigning catalog roles to principal roles. - -These are the key concepts to understanding access control in Polaris: - -- **Securable object** -- **Principal role** -- **Catalog role** -- **Privilege** - -## Securable object - -A securable object is an object to which access can be granted. Polaris -has the following securable objects: - -- Catalog -- Namespace -- Iceberg table -- View -- Policy - -## Principal role - -A principal role is a resource in Polaris that you can use to logically group Polaris principals together and grant privileges on -securable objects. - -Polaris supports a many-to-many relationship between principals and principal roles. For example, to grant the same privileges to -multiple principals, you can assign a single principal role to those principals. Likewise, a principal can be granted -multiple principal roles. - -You don't grant privileges directly to a principal role. Instead, you configure object permissions at the catalog role level, and then grant -catalog roles to a principal role. - -The following table shows examples of principal roles that you might configure in Polaris: - -| Principal role name | Description | -| -----------------------| ----------- | -| Data_engineer | A role that is granted to multiple principals for running data engineering jobs. | -| Data_scientist | A role that is granted to multiple principals for running data science or AI jobs. | - -## Catalog role - -A catalog role belongs to a particular catalog resource in Polaris and specifies a set of permissions for actions on the catalog or objects -in the catalog, such as catalog namespaces or tables. You can create one or more catalog roles for a catalog. - -You grant privileges to a catalog role and then grant the catalog role to a principal role to bestow the privileges to one or more principals. - -Polaris also supports a many-to-many relationship between catalog roles and principal roles. You can grant the same catalog role to one or more -principal roles. Likewise, a principal role can be granted to one or more catalog roles. - -The following table displays examples of catalog roles that you might -configure in Polaris: - -| Example Catalog role | Description| -| -----------------------|-----------| -| Catalog administrators | A role that has been granted multiple privileges to emulate full access to the catalog.
Principal roles that have been granted this role are permitted to create, alter, read, write, and drop tables in the catalog. | -| Catalog readers | A role that has been granted read-only privileges to tables in the catalog.
Principal roles that have been granted this role are allowed to read from tables in the catalog. | -| Catalog contributor | A role that has been granted read and write access privileges to all tables that belong to the catalog.
Principal roles that have been granted this role are allowed to perform read and write operations on tables in the catalog. | - -## RBAC model - -The following diagram illustrates the RBAC model used by Polaris. For each catalog, the Polaris administrator assigns access -privileges to catalog roles and then grants principals access to resources by assigning catalog roles to principal roles. Polaris -supports a many-to-many relationship between principals and principal roles. - -![Diagram that shows the RBAC model for Apache Polaris.](/img/rbac-model.svg "Apache Polaris RBAC model") - -## Access control privileges - -This section describes the privileges that are available in the Polaris access control model. Privileges are granted to catalog roles, catalog -roles are granted to principal roles, and principal roles are granted to principals to specify the operations that principals can -perform on objects in Polaris. - -To grant the full set of privileges (drop, list, read, write, etc.) on an object, you can use the *full privilege* option. - -### Table privileges - -| Privilege | Description | -| --------- | ----------- | -| TABLE_CREATE | Enables registering a table with the catalog. | -| TABLE_DROP | Enables dropping a table from the catalog. | -| TABLE_LIST | Enables listing any table in the catalog. | -| TABLE_READ_PROPERTIES | Enables reading properties of the table. | -| TABLE_WRITE_PROPERTIES | Enables configuring properties for the table. | -| TABLE_READ_DATA | Enables reading data from the table by receiving short-lived read-only storage credentials from the catalog. | -| TABLE_WRITE_DATA | Enables writing data to the table by receiving short-lived read+write storage credentials from the catalog. | -| TABLE_FULL_METADATA | Grants all table privileges, except TABLE_READ_DATA and TABLE_WRITE_DATA, which need to be granted individually. | -| TABLE_ATTACH_POLICY | Enables attaching policy to a table. | -| TABLE_DETACH_POLICY | Enables detaching policy from a table. | - -### View privileges - -| Privilege | Description | -| --------- | ----------- | -| VIEW_CREATE | Enables registering a view with the catalog. | -| VIEW_DROP | Enables dropping a view from the catalog. | -| VIEW_LIST | Enables listing any views in the catalog. | -| VIEW_READ_PROPERTIES | Enables reading all the view properties. | -| VIEW_WRITE_PROPERTIES | Enables configuring view properties. | -| VIEW_FULL_METADATA | Grants all view privileges. | - -### Namespace privileges - -| Privilege | Description | -| --------- | ----------- | -| NAMESPACE_CREATE | Enables creating a namespace in a catalog. | -| NAMESPACE_DROP | Enables dropping the namespace from the catalog. | -| NAMESPACE_LIST | Enables listing any object in the namespace, including nested namespaces and tables. | -| NAMESPACE_READ_PROPERTIES | Enables reading all the namespace properties. | -| NAMESPACE_WRITE_PROPERTIES | Enables configuring namespace properties. | -| NAMESPACE_FULL_METADATA | Grants all namespace privileges. | -| NAMESPACE_ATTACH_POLICY | Enables attaching policy to a namespace. | -| NAMESPACE_DETACH_POLICY | Enables detaching policy from a namespace. | - -### Catalog privileges - -| Privilege | Description | -| -----------------------| ----------- | -| CATALOG_MANAGE_ACCESS | Includes the ability to grant or revoke privileges on objects in a catalog to catalog roles, and the ability to grant or revoke catalog roles to or from principal roles. | -| CATALOG_MANAGE_CONTENT | Enables full management of content for the catalog. This privilege encompasses the following privileges:
  • CATALOG_MANAGE_METADATA
  • TABLE_FULL_METADATA
  • NAMESPACE_FULL_METADATA
  • VIEW_FULL_METADATA
  • TABLE_WRITE_DATA
  • TABLE_READ_DATA
  • CATALOG_READ_PROPERTIES
  • CATALOG_WRITE_PROPERTIES
| -| CATALOG_MANAGE_METADATA | Enables full management of the catalog, catalog roles, namespaces, and tables. | -| CATALOG_READ_PROPERTIES | Enables listing catalogs and reading properties of the catalog. | -| CATALOG_WRITE_PROPERTIES | Enables configuring catalog properties. | -| CATALOG_ATTACH_POLICY | Enables attaching policy to a catalog. | -| CATALOG_DETACH_POLICY | Enables detaching policy from a catalog. | - -### Policy privileges - -| Privilege | Description | -| -----------------------| ----------- | -| POLICY_CREATE | Enables creating a policy under specified namespace. | -| POLICY_READ | Enables reading policy content and metadata. | -| POLICY_WRITE | Enables updating the policy details such as its content or description. | -| POLICY_LIST | Enables listing any policy from the catalog. | -| POLICY_DROP | Enables dropping a policy if it is not attached to any resource entity. | -| POLICY_FULL_METADATA | Grants all policy privileges. | -| POLICY_ATTACH | Enables policy to be attached to entities. | -| POLICY_DETACH | Enables policy to be detached from entities. | - -## RBAC example - -The following diagram illustrates how RBAC works in Polaris and -includes the following users: - -- **Alice:** A service admin who signs up for Polaris. Alice can - create principals. She can also create catalogs and - namespaces and configure access control for Polaris resources. - -- **Bob:** A data engineer who uses Apache Spark™ to - interact with Polaris. - - - Alice has created a principal for Bob. It has been - granted the Data_engineer principal role, which in turn has been - granted the following catalog roles: Catalog contributor and - Data administrator (for both the Silver and Gold zone catalogs - in the following diagram). - - - The Catalog contributor role grants permission to create - namespaces and tables in the Bronze zone catalog. - - - The Data administrator roles grant full administrative rights to - the Silver zone catalog and Gold zone catalog. - -- **Mark:** A data scientist who uses trains models with data managed - by Polaris. - - - Alice has created a principal for Mark. It has been - granted the Data_scientist principal role, which in turn has - been granted the catalog role named Catalog reader. - - - The Catalog reader role grants read-only access for a catalog - named Gold zone catalog. - -![Diagram that shows an example of how RBAC works in Apache Polaris.](/img/rbac-example.svg "Apache Polaris RBAC example") diff --git a/1.1.0/admin-tool.md b/1.1.0/admin-tool.md deleted file mode 100644 index 4caa9d5343..0000000000 --- a/1.1.0/admin-tool.md +++ /dev/null @@ -1,142 +0,0 @@ ---- -# -# Licensed to the Apache Software Foundation (ASF) under one -# or more contributor license agreements. See the NOTICE file -# distributed with this work for additional information -# regarding copyright ownership. The ASF licenses this file -# to you under the Apache License, Version 2.0 (the -# "License"); you may not use this file except in compliance -# with the License. You may obtain a copy of the License at -# -# http://www.apache.org/licenses/LICENSE-2.0 -# -# Unless required by applicable law or agreed to in writing, -# software distributed under the License is distributed on an -# "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY -# KIND, either express or implied. See the License for the -# specific language governing permissions and limitations -# under the License. -# -title: Admin Tool -type: docs -weight: 300 ---- - -Polaris includes a tool for administrators to manage the metastore. - -The tool must be built with the necessary JDBC drivers to access the metastore database. For -example, to build the tool with support for Postgres, run the following: - -```shell -./gradlew \ - :polaris-admin:assemble \ - :polaris-admin:quarkusAppPartsBuild --rerun \ - -Dquarkus.container-image.build=true -``` - -The above command will generate: - -- One Fast-JAR in `runtime/admin/build/quarkus-app/quarkus-run.jar` -- Two Docker images named `apache/polaris-admin-tool:latest` and `apache/polaris-admin-tool:` - -## Usage - -Please make sure the admin tool and Polaris server are with the same version before using it. -To run the standalone JAR, use the following command: - -```shell -java -jar runtime/admin/build/quarkus-app/quarkus-run.jar --help -``` - -To run the Docker image, use the following command: - -```shell -docker run apache/polaris-admin-tool:latest --help -``` - -The basic usage of the Polaris Admin Tool is outlined below: - -``` -Usage: polaris-admin-runner.jar [-hV] [COMMAND] -Polaris Admin Tool - -h, --help Show this help message and exit. - -V, --version Print version information and exit. -Commands: - help Display help information about the specified command. - bootstrap Bootstraps realms and principal credentials. - purge Purge principal credentials. -``` - -## Configuration - -The Polaris Admin Tool must be executed with the same configuration as the Polaris server. The -configuration can be done via environment variables or system properties. - -At a minimum, it is necessary to configure the Polaris Admin Tool to connect to the same database -used by the Polaris server. - -See the [metastore documentation]({{% ref "metastores" %}}) for more information on configuring the -database connection. - -Note: Polaris will always create schema 'polaris_schema' during bootstrap under the configured database. - -## Bootstrapping Realms and Principal Credentials - -The `bootstrap` command is used to bootstrap realms and create the necessary principal credentials -for the Polaris server. This command is idempotent and can be run multiple times without causing any -issues. If a realm is already bootstrapped, running the `bootstrap` command again will not have any -effect on that realm. - -```shell -java -jar runtime/admin/build/quarkus-app/quarkus-run.jar bootstrap --help -``` - -The basic usage of the `bootstrap` command is outlined below: - -``` -Usage: polaris-admin-runner.jar bootstrap [-hV] [-c=]... -r= [-r=]... -Bootstraps realms and root principal credentials. - -c, --credential= - Root principal credentials to bootstrap. Must be of the form - 'realm,clientId,clientSecret'. - -h, --help Show this help message and exit. - -r, --realm= The name of a realm to bootstrap. - -V, --version Print version information and exit. -``` - -For example, to bootstrap the `realm1` realm and create its root principal credential with the -client ID `admin` and client secret `admin`, you can run the following command: - -```shell -java -jar runtime/admin/build/quarkus-app/quarkus-run.jar bootstrap -r realm1 -c realm1,admin,admin -``` - -## Purging Realms and Principal Credentials - -The `purge` command is used to remove realms and principal credentials from the Polaris server. - -> [!WARNING] -> Running the `purge` command will remove all data associated with the specified realms! - This includes all entities (catalogs, namespaces, tables, views, roles), all principal - credentials, grants, and any other data associated with the realms. - -```shell -java -jar runtime/admin/build/quarkus-app/quarkus-run.jar purge --help -``` - -The basic usage of the `purge` command is outlined below: - -``` -Usage: polaris-admin-runner.jar purge [-hV] -r= [-r=]... -Purge realms and all associated entities. - -h, --help Show this help message and exit. - -r, --realm= The name of a realm to purge. - -V, --version Print version information and exit. -``` - -For example, to purge the `realm1` realm, you can run the following command: - -```shell -java -jar runtime/admin/build/quarkus-app/quarkus-run.jar purge -r realm1 -``` diff --git a/1.1.0/command-line-interface.md b/1.1.0/command-line-interface.md deleted file mode 100644 index 094b5dbdbb..0000000000 --- a/1.1.0/command-line-interface.md +++ /dev/null @@ -1,1250 +0,0 @@ ---- -# -# Licensed to the Apache Software Foundation (ASF) under one -# or more contributor license agreements. See the NOTICE file -# distributed with this work for additional information -# regarding copyright ownership. The ASF licenses this file -# to you under the Apache License, Version 2.0 (the -# "License"); you may not use this file except in compliance -# with the License. You may obtain a copy of the License at -# -# http://www.apache.org/licenses/LICENSE-2.0 -# -# Unless required by applicable law or agreed to in writing, -# software distributed under the License is distributed on an -# "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY -# KIND, either express or implied. See the License for the -# specific language governing permissions and limitations -# under the License. -# -title: Command Line Interface -type: docs -weight: 300 ---- - -In order to help administrators quickly set up and manage their Polaris server, Polaris provides a simple command-line interface (CLI) for common tasks. - -The basic syntax of the Polaris CLI is outlined below: - -``` -polaris [options] COMMAND ... - -options: ---host ---port ---base-url ---client-id ---client-secret ---access-token ---profile -``` - -`COMMAND` must be one of the following: -1. catalogs -2. principals -3. principal-roles -4. catalog-roles -5. namespaces -6. privileges -7. profiles -8. repair - -Each _command_ supports several _subcommands_, and some _subcommands_ have _actions_ that come after the subcommand in turn. Finally, _arguments_ follow to form a full invocation. Within a set of named arguments at the end of an invocation ordering is generally not important. Many invocations also have a required positional argument of the type that the _command_ refers to. Again, the ordering of this positional argument relative to named arguments is not important. - -Some example full invocations: - -``` -polaris principals list -polaris catalogs delete some_catalog_name -polaris catalogs update --property foo=bar some_other_catalog -polaris catalogs update another_catalog --property k=v -polaris privileges namespace grant --namespace some.schema --catalog fourth_catalog --catalog-role some_catalog_role TABLE_READ_DATA -polaris profiles list -polaris repair -``` - -### Authentication - -As outlined above, the Polaris CLI may take credentials using the `--client-id` and `--client-secret` options. For example: - -``` -polaris --client-id 4b5ed1ca908c3cc2 --client-secret 07ea8e4edefb9a9e57c247e8d1a4f51c principals ... -``` - -If `--client-id` and `--client-secret` are not provided, the Polaris CLI will try to read the client ID and client secret from environment variables called `CLIENT_ID` and `CLIENT_SECRET` respectively. If these flags are not provided and the environment variables are not set, the CLI will fail. - -Alternatively, the `--access-token` option can be used instead of `--client-id` and `--client-secret`, but both authentication methods cannot be used simultaneously. - -Additionally, the `--profile` option can be used to specify a saved profile instead of providing authentication details directly. If `--profile` is not provided, the CLI will check the `CLIENT_PROFILE` environment variable. Profiles store authentication details and connection settings, simplifying repeated CLI usage. - -If the `--host` and `--port` options are not provided, the CLI will default to communicating with `localhost:8181`. - -Alternatively, the `--base-url` option can be used instead of `--host` and `--port`, but both options cannot be used simultaneously. This allows specifying arbitrary Polaris URLs, including HTTPS ones, that have additional base prefixes before the `/api/*/v1` subpaths. - -### PATH - -These examples assume the Polaris CLI is on the PATH and so can be invoked just by the command `polaris`. You can add the CLI to your PATH environment variable with a command like the following: - -``` -export PATH="~/polaris:$PATH" -``` - -Alternatively, you can run the CLI by providing a path to it, such as with the following invocation: - -``` -~/polaris principals list -``` - -## Commands - -Each of the commands `catalogs`, `principals`, `principal-roles`, `catalog-roles`, and `privileges` is used to manage a different type of entity within Polaris. - -In addition to these, the `profiles` command is available for managing stored authentication profiles, allowing login credentials to be configured for reuse. This provides an alternative to passing authentication details with every command. - -To find details on the options that can be provided to a particular command or subcommand ad-hoc, you may wish to use the `--help` flag. For example: - -``` -polaris catalogs --help -polaris principals create --help -polaris profiles --help -``` - -### catalogs - -The `catalogs` command is used to create, discover, and otherwise manage catalogs within Polaris. - -`catalogs` supports the following subcommands: - -1. create -2. delete -3. get -4. list -5. update - -#### create - -The `create` subcommand is used to create a catalog. - -``` -input: polaris catalogs create --help -options: - create - Named arguments: - --type The type of catalog to create in [INTERNAL, EXTERNAL]. INTERNAL by default. - --storage-type (Required) The type of storage to use for the catalog - --default-base-location (Required) Default base location of the catalog - --allowed-location An allowed location for files tracked by the catalog. Multiple locations can be provided by specifying this option more than once. - --role-arn (Required for S3) A role ARN to use when connecting to S3 - --region (Only for S3) The region to use when connecting to S3 - --external-id (Only for S3) The external ID to use when connecting to S3 - --tenant-id (Required for Azure) A tenant ID to use when connecting to Azure Storage - --multi-tenant-app-name (Only for Azure) The app name to use when connecting to Azure Storage - --consent-url (Only for Azure) A consent URL granting permissions for the Azure Storage location - --service-account (Only for GCS) The service account to use when connecting to GCS - --property A key/value pair such as: tag=value. Multiple can be provided by specifying this option more than once - --catalog-connection-type The type of external catalog in [ICEBERG, HADOOP]. - --iceberg-remote-catalog-name The remote catalog name when federating to an Iceberg REST catalog - --hadoop-warehouse The warehouse to use when federating to a HADOOP catalog - --catalog-authentication-type The type of authentication in [OAUTH, BEARER, SIGV4] - --catalog-service-identity-type The type of service identity in [AWS_IAM] - --catalog-service-identity-iam-arn When using the AWS_IAM service identity type, this is the ARN of the IAM user or IAM role Polaris uses to assume roles and then access external resources. - --catalog-uri The URI of the external catalog - --catalog-token-uri (For authentication type OAUTH) Token server URI - --catalog-client-id (For authentication type OAUTH) oauth client id - --catalog-client-secret (For authentication type OAUTH) oauth client secret (input-only) - --catalog-client-scope (For authentication type OAUTH) oauth scopes to specify when exchanging for a short-lived access token. Multiple can be provided by specifying this option more than once - --catalog-bearer-token (For authentication type BEARER) Bearer token (input-only) - --catalog-role-arn (For authentication type SIGV4) The aws IAM role arn assumed by polaris userArn when signing requests - --catalog-role-session-name (For authentication type SIGV4) The role session name to be used by the SigV4 protocol for signing requests - --catalog-external-id (For authentication type SIGV4) An optional external id used to establish a trust relationship with AWS in the trust policy - --catalog-signing-region (For authentication type SIGV4) Region to be used by the SigV4 protocol for signing requests - --catalog-signing-name (For authentication type SIGV4) The service name to be used by the SigV4 protocol for signing requests, the default signing name is "execute-api" is if not provided - Positional arguments: - catalog -``` - -##### Examples - -``` -polaris catalogs create \ - --storage-type s3 \ - --default-base-location s3://example-bucket/my_data \ - --role-arn ${ROLE_ARN} \ - my_catalog - -polaris catalogs create \ - --storage-type s3 \ - --default-base-location s3://example-bucket/my_other_data \ - --allowed-location s3://example-bucket/second_location \ - --allowed-location s3://other-bucket/third_location \ - --role-arn ${ROLE_ARN} \ - my_other_catalog - -polaris catalogs create \ - --storage-type file \ - --default-base-location file:///example/tmp \ - quickstart_catalog -``` - -#### delete - -The `delete` subcommand is used to delete a catalog. - -``` -input: polaris catalogs delete --help -options: - delete - Positional arguments: - catalog -``` - -##### Examples - -``` -polaris catalogs delete some_catalog -``` - -#### get - -The `get` subcommand is used to retrieve details about a catalog. - -``` -input: polaris catalogs get --help -options: - get - Positional arguments: - catalog -``` - -##### Examples - -``` -polaris catalogs get some_catalog - -polaris catalogs get another_catalog -``` - -#### list - -The `list` subcommand is used to show details about all catalogs, or those that a certain principal role has access to. The principal used to perform this operation must have the `CATALOG_LIST` privilege. - -``` -input: polaris catalogs list --help -options: - list - Named arguments: - --principal-role The name of a principal role -``` - -##### Examples - -``` -polaris catalogs list - -polaris catalogs list --principal-role some_user -``` - -#### update - -The `update` subcommand is used to update a catalog. Currently, this command supports changing the properties of a catalog or updating its storage configuration. - -``` -input: polaris catalogs update --help -options: - update - Named arguments: - --default-base-location A new default base location for the catalog - --allowed-location An allowed location for files tracked by the catalog. Multiple locations can be provided by specifying this option more than once. - --region (Only for S3) The region to use when connecting to S3 - --set-property A key/value pair such as: tag=value. Merges the specified key/value into an existing properties map by updating the value if the key already exists or creating a new entry if not. Multiple can be provided by specifying this option more than once - --remove-property A key to remove from a properties map. If the key already does not exist then no action is takn for the specified key. If properties are also being set in the same update command then the list of removals is applied last. Multiple can be provided by specifying this option more than once - Positional arguments: - catalog -``` - -##### Examples - -``` -polaris catalogs update --property tag=new_value my_catalog - -polaris catalogs update --default-base-location s3://new-bucket/my_data my_catalog -``` - -### Principals - -The `principals` command is used to manage principals within Polaris. - -`principals` supports the following subcommands: - -1. create -2. delete -3. get -4. list -5. rotate-credentials -6. update -7. access - -#### create - -The `create` subcommand is used to create a new principal. - -``` -input: polaris principals create --help -options: - create - Named arguments: - --type The type of principal to create in [SERVICE] - --property A key/value pair such as: tag=value. Multiple can be provided by specifying this option more than once - Positional arguments: - principal -``` - -##### Examples - -``` -polaris principals create some_user - -polaris principals create --client-id ${CLIENT_ID} --property admin=true some_admin_user -``` - -#### delete - -The `delete` subcommand is used to delete a principal. - -``` -input: polaris principals delete --help -options: - delete - Positional arguments: - principal -``` - -##### Examples - -``` -polaris principals delete some_user - -polaris principals delete some_admin_user -``` - -#### get - -The `get` subcommand retrieves details about a principal. - -``` -input: polaris principals get --help -options: - get - Positional arguments: - principal -``` - -##### Examples - -``` -polaris principals get some_user - -polaris principals get some_admin_user -``` - -#### list - -The `list` subcommand shows details about all principals. - -##### Examples - -``` -polaris principals list -``` - -#### rotate-credentials - -The `rotate-credentials` subcommand is used to update the credentials used by a principal. After this command runs successfully, the new credentials will be printed to stdout. - -``` -input: polaris principals rotate-credentials --help -options: - rotate-credentials - Positional arguments: - principal -``` - -##### Examples - -``` -polaris principals rotate-credentials some_user - -polaris principals rotate-credentials some_admin_user -``` - -#### update - -The `update` subcommand is used to update a principal. Currently, this supports rewriting the properties associated with a principal. - -``` -input: polaris principals update --help -options: - update - Named arguments: - --set-property A key/value pair such as: tag=value. Merges the specified key/value into an existing properties map by updating the value if the key already exists or creating a new entry if not. Multiple can be provided by specifying this option more than once - --remove-property A key to remove from a properties map. If the key already does not exist then no action is takn for the specified key. If properties are also being set in the same update command then the list of removals is applied last. Multiple can be provided by specifying this option more than once - Positional arguments: - principal -``` - -##### Examples - -``` -polaris principals update --property key=value --property other_key=other_value some_user - -polaris principals update --property are_other_keys_removed=yes some_user -``` - -#### access - -The `access` subcommand retrieves entities relation about a principal. - -``` -input: polaris principals access --help -options: - access - Positional arguments: - principal -``` - -##### Examples - -``` -polaris principals access quickstart_user -``` - -### Principal Roles - -The `principal-roles` command is used to create, discover, and manage principal roles within Polaris. Additionally, this command can identify principals or catalog roles associated with a principal role, and can be used to grant a principal role to a principal. - -`principal-roles` supports the following subcommands: - -1. create -2. delete -3. get -4. list -5. update -6. grant -7. revoke - -#### create - -The `create` subcommand is used to create a new principal role. - -``` -input: polaris principal-roles create --help -options: - create - Named arguments: - --property A key/value pair such as: tag=value. Multiple can be provided by specifying this option more than once - Positional arguments: - principal_role -``` - -##### Examples - -``` -polaris principal-roles create data_engineer - -polaris principal-roles create --property key=value data_analyst -``` - -#### delete - -The `delete` subcommand is used to delete a principal role. - -``` -input: polaris principal-roles delete --help -options: - delete - Positional arguments: - principal_role -``` - -##### Examples - -``` -polaris principal-roles delete data_engineer - -polaris principal-roles delete data_analyst -``` - -#### get - -The `get` subcommand retrieves details about a principal role. - -``` -input: polaris principal-roles get --help -options: - get - Positional arguments: - principal_role -``` - -##### Examples - -``` -polaris principal-roles get data_engineer - -polaris principal-roles get data_analyst -``` - -#### list - -The list subcommand is used to print out all principal roles or, alternatively, to list all principal roles associated with a given principal or with a given catalog role. - -``` -input: polaris principal-roles list --help -options: - list - Named arguments: - --catalog-role The name of a catalog role. If provided, show only principal roles assigned to this catalog role. - --principal The name of a principal. If provided, show only principal roles assigned to this principal. -``` - -##### Examples - -``` -polaris principal-roles list - -polaris principal-roles --principal d.knuth - -polaris principal-roles --catalog-role super_secret_data -``` - -#### update - -The `update` subcommand is used to update a principal role. Currently, this supports updating the properties tied to a principal role. - -``` -input: polaris principal-roles update --help -options: - update - Named arguments: - --set-property A key/value pair such as: tag=value. Merges the specified key/value into an existing properties map by updating the value if the key already exists or creating a new entry if not. Multiple can be provided by specifying this option more than once - --remove-property A key to remove from a properties map. If the key already does not exist then no action is takn for the specified key. If properties are also being set in the same update command then the list of removals is applied last. Multiple can be provided by specifying this option more than once - Positional arguments: - principal_role -``` - -##### Examples - -``` -polaris principal-roles update --property key=value2 data_engineer - -polaris principal-roles update data_analyst --property key=value3 -``` - -#### grant - -The `grant` subcommand is used to grant a principal role to a principal. - -``` -input: polaris principal-roles grant --help -options: - grant - Named arguments: - --principal A principal to grant this principal role to - Positional arguments: - principal_role -``` - -##### Examples - -``` -polaris principal-roles grant --principal d.knuth data_engineer - -polaris principal-roles grant data_scientist --principal a.ng -``` - -#### revoke - -The `revoke` subcommand is used to revoke a principal role from a principal. - -``` -input: polaris principal-roles revoke --help -options: - revoke - Named arguments: - --principal A principal to revoke this principal role from - Positional arguments: - principal_role -``` - -##### Examples - -``` -polaris principal-roles revoke --principal former.employee data_engineer - -polaris principal-roles revoke data_scientist --principal changed.role -``` - -### Catalog Roles - -The catalog-roles command is used to create, discover, and manage catalog roles within Polaris. Additionally, this command can be used to grant a catalog role to a principal role. - -`catalog-roles` supports the following subcommands: - -1. create -2. delete -3. get -4. list -5. update -6. grant -7. revoke - -#### create - -The `create` subcommand is used to create a new catalog role. - -``` -input: polaris catalog-roles create --help -options: - create - Named arguments: - --catalog The name of an existing catalog - --property A key/value pair such as: tag=value. Multiple can be provided by specifying this option more than once - Positional arguments: - catalog_role -``` - -##### Examples - -``` -polaris catalog-roles create --property key=value --catalog some_catalog sales_data - -polaris catalog-roles create --catalog other_catalog sales_data -``` - -#### delete - -The `delete` subcommand is used to delete a catalog role. - -``` -input: polaris catalog-roles delete --help -options: - delete - Named arguments: - --catalog The name of an existing catalog - Positional arguments: - catalog_role -``` - -##### Examples - -``` -polaris catalog-roles delete --catalog some_catalog sales_data - -polaris catalog-roles delete --catalog other_catalog sales_data -``` - -#### get - -The `get` subcommand retrieves details about a catalog role. - -``` -input: polaris catalog-roles get --help -options: - get - Named arguments: - --catalog The name of an existing catalog - Positional arguments: - catalog_role -``` - -##### Examples - -``` -polaris catalog-roles get --catalog some_catalog inventory_data - -polaris catalog-roles get --catalog other_catalog inventory_data -``` - -#### list - -The `list` subcommand is used to print all catalog roles. Alternatively, if a principal role is provided, only catalog roles associated with that principal are shown. - -``` -input: polaris catalog-roles list --help -options: - list - Named arguments: - --principal-role The name of a principal role - Positional arguments: - catalog -``` - -##### Examples - -``` -polaris catalog-roles list - -polaris catalog-roles list --principal-role data_engineer -``` - -#### update - -The `update` subcommand is used to update a catalog role. Currently, only updating properties associated with the catalog role is supported. - -``` -input: polaris catalog-roles update --help -options: - update - Named arguments: - --catalog The name of an existing catalog - --set-property A key/value pair such as: tag=value. Merges the specified key/value into an existing properties map by updating the value if the key already exists or creating a new entry if not. Multiple can be provided by specifying this option more than once - --remove-property A key to remove from a properties map. If the key already does not exist then no action is takn for the specified key. If properties are also being set in the same update command then the list of removals is applied last. Multiple can be provided by specifying this option more than once - Positional arguments: - catalog_role -``` - -##### Examples - -``` -polaris catalog-roles update --property contains_pii=true --catalog some_catalog sales_data - -polaris catalog-roles update sales_data --catalog some_catalog --property key=value -``` - -#### grant - -The `grant` subcommand is used to grant a catalog role to a principal role. - -``` -input: polaris catalog-roles grant --help -options: - grant - Named arguments: - --catalog The name of an existing catalog - --principal-role The name of a catalog role - Positional arguments: - catalog_role -``` - -##### Examples - -``` -polaris catalog-roles grant sensitive_data --catalog some_catalog --principal-role power_user - -polaris catalog-roles grant --catalog sales_data contains_cc_info_catalog_role --principal-role financial_analyst_role -``` - -#### revoke - -The `revoke` subcommand is used to revoke a catalog role from a principal role. - -``` -input: polaris catalog-roles revoke --help -options: - revoke - Named arguments: - --catalog The name of an existing catalog - --principal-role The name of a catalog role - Positional arguments: - catalog_role -``` - -##### Examples - -``` -polaris catalog-roles revoke sensitive_data --catalog some_catalog --principal-role power_user - -polaris catalog-roles revoke --catalog sales_data contains_cc_info_catalog_role --principal-role financial_analyst_role -``` - -### Namespaces - -The `namespaces` command is used to manage namespaces within Polaris. - -`namespaces` supports the following subcommands: - -1. create -2. delete -3. get -4. list - -#### create - -The `create` subcommand is used to create a new namespace. - -When creating a namespace with an explicit location, that location must reside within the parent catalog or namespace. - -``` -input: polaris namespaces create --help -options: - create - Named arguments: - --catalog The name of an existing catalog - --location If specified, the location at which to store the namespace and entities inside it - --property A key/value pair such as: tag=value. Multiple can be provided by specifying this option more than once - Positional arguments: - namespace -``` - -##### Examples - -``` -polaris namespaces create --catalog my_catalog outer - -polaris namespaces create --catalog my_catalog --location 's3://bucket/outer/inner_SUFFIX' outer.inner -``` - -#### delete - -The `delete` subcommand is used to delete a namespace. - -``` -input: polaris namespaces delete --help -options: - delete - Named arguments: - --catalog The name of an existing catalog - Positional arguments: - namespace -``` - -##### Examples - -``` -polaris namespaces delete outer_namespace.inner_namespace --catalog my_catalog - -polaris namespaces delete --catalog my_catalog outer_namespace -``` - -#### get - -The `get` subcommand retrieves details about a namespace. - -``` -input: polaris namespaces get --help -options: - get - Named arguments: - --catalog The name of an existing catalog - Positional arguments: - namespace -``` - -##### Examples - -``` -polaris namespaces get --catalog some_catalog a.b - -polaris namespaces get a.b.c --catalog some_catalog -``` - -#### list - -The `list` subcommand shows details about all namespaces directly within a catalog or, optionally, within some parent prefix in that catalog. - -``` -input: polaris namespaces list --help -options: - list - Named arguments: - --catalog The name of an existing catalog - --parent If specified, list namespaces inside this parent namespace -``` - -##### Examples - -``` -polaris namespaces list --catalog my_catalog - -polaris namespaces list --catalog my_catalog --parent a - -polaris namespaces list --catalog my_catalog --parent a.b -``` - -### Privileges - -The `privileges` command is used to grant various privileges to a catalog role, or to revoke those privileges. Privileges can be on the level of a catalog, a namespace, a table, or a view. For more information on privileges, please refer to the [docs]({{% ref "entities#privilege" %}}). - -Note that when using the `privileges` command, the user specifies the relevant catalog and catalog role before selecting a subcommand. - -`privileges` supports the following subcommands: - -1. list -2. catalog -3. namespace -4. table -5. view - -Each of these subcommands, except `list`, supports the `grant` and `revoke` actions and requires an action to be specified. - -Note that each subcommand's `revoke` action always accepts the same options that the corresponding `grant` action does, but with the addition of the `cascade` option. `cascade` is used to revoke all other privileges that depend on the specified privilege. - -#### list - -The `list` subcommand shows details about all privileges for a catalog role. - -``` -input: polaris privileges list --help -options: - list - Named arguments: - --catalog The name of an existing catalog - --catalog-role The name of a catalog role -``` - -##### Examples - -``` -polaris privileges list --catalog my_catalog --catalog-role my_role - -polaris privileges my_role list --catalog-role my_other_role --catalog my_catalog -``` - -#### catalog - -The `catalog` subcommand manages privileges at the catalog level. `grant` is used to grant catalog privileges to the specified catalog role, and `revoke` is used to revoke them. - -``` -input: polaris privileges catalog --help -options: - catalog - grant - Named arguments: - --catalog The name of an existing catalog - --catalog-role The name of a catalog role - Positional arguments: - privilege - revoke - Named arguments: - --cascade When revoking privileges, additionally revoke privileges that depend on the specified privilege - --catalog The name of an existing catalog - --catalog-role The name of a catalog role - Positional arguments: - privilege -``` - -##### Examples - -``` -polaris privileges \ - catalog \ - grant \ - --catalog my_catalog \ - --catalog-role catalog_role \ - TABLE_CREATE - -polaris privileges \ - catalog \ - revoke \ - --catalog my_catalog \ - --catalog-role catalog_role \ - --cascade \ - TABLE_CREATE -``` - -#### namespace - -The `namespace` subcommand manages privileges at the namespace level. - -``` -input: polaris privileges namespace --help -options: - namespace - grant - Named arguments: - --namespace A period-delimited namespace - --catalog The name of an existing catalog - --catalog-role The name of a catalog role - Positional arguments: - privilege - revoke - Named arguments: - --namespace A period-delimited namespace - --cascade When revoking privileges, additionally revoke privileges that depend on the specified privilege - --catalog The name of an existing catalog - --catalog-role The name of a catalog role - Positional arguments: - privilege -``` - -##### Examples - -``` -polaris privileges \ - namespace \ - grant \ - --catalog my_catalog \ - --catalog-role catalog_role \ - --namespace a.b \ - TABLE_LIST - -polaris privileges \ - namespace \ - revoke \ - --catalog my_catalog \ - --catalog-role catalog_role \ - --namespace a.b \ - TABLE_LIST -``` - -#### table - -The `table` subcommand manages privileges at the table level. - -``` -input: polaris privileges table --help -options: - table - grant - Named arguments: - --namespace A period-delimited namespace - --table The name of a table - --catalog The name of an existing catalog - --catalog-role The name of a catalog role - Positional arguments: - privilege - revoke - Named arguments: - --namespace A period-delimited namespace - --table The name of a table - --cascade When revoking privileges, additionally revoke privileges that depend on the specified privilege - --catalog The name of an existing catalog - --catalog-role The name of a catalog role - Positional arguments: - privilege -``` - -##### Examples - -``` -polaris privileges \ - table \ - grant \ - --catalog my_catalog \ - --catalog-role catalog_role \ - --namespace a.b \ - --table t \ - TABLE_DROP - -polaris privileges \ - table \ - grant \ - --catalog my_catalog \ - --catalog-role catalog_role \ - --namespace a.b \ - --table t \ - --cascade \ - TABLE_DROP -``` - -#### view - -The `view` subcommand manages privileges at the view level. - -``` -input: polaris privileges view --help -options: - view - grant - Named arguments: - --namespace A period-delimited namespace - --view The name of a view - --catalog The name of an existing catalog - --catalog-role The name of a catalog role - Positional arguments: - privilege - revoke - Named arguments: - --namespace A period-delimited namespace - --view The name of a view - --cascade When revoking privileges, additionally revoke privileges that depend on the specified privilege - --catalog The name of an existing catalog - --catalog-role The name of a catalog role - Positional arguments: - privilege -``` - -##### Examples - -``` -polaris privileges \ - view \ - grant \ - --catalog my_catalog \ - --catalog-role catalog_role \ - --namespace a.b.c \ - --view v \ - VIEW_FULL_METADATA - -polaris privileges \ - view \ - grant \ - --catalog my_catalog \ - --catalog-role catalog_role \ - --namespace a.b.c \ - --view v \ - --cascade \ - VIEW_FULL_METADATA -``` - -### profiles - -The `profiles` command is used to manage stored authentication profiles in Polaris. Profiles allow authentication credentials to be saved and reused, eliminating the need to pass credentials with every command. - -`profiles` supports the following subcommands: - -1. create -2. delete -3. get -4. list -5. update - -#### create - -The `create` subcommand is used to create a new authentication profile. - -``` -input: polaris profiles create --help -options: - create - Positional arguments: - profile -``` - -##### Examples - -``` -polaris profiles create dev -``` - -#### delete - -The `delete` subcommand removes a stored profile. - -``` -input: polaris profiles delete --help -options: - delete - Positional arguments: - profile -``` - -##### Examples - -``` -polaris profiles delete dev -``` - -#### get - -The `get` subcommand removes a stored profile. - -``` -input: polaris profiles get --help -options: - get - Positional arguments: - profile -``` - -##### Examples - -``` -polaris profiles get dev -``` - -#### list - -The `list` subcommand displays all stored profiles. - -``` -input: polaris profiles list --help -options: - list -``` - -##### Examples - -``` -polaris profiles list -``` - -#### update - -The `update` subcommand modifies an existing profile. - -``` -input: polaris profiles update --help -options: - update - Positional arguments: - profile -``` - -##### Examples - -``` -polaris profiles update dev -``` - -### repair - -The `repair` command is a bash script wrapper used to regenerate Python client code and update necessary dependencies, ensuring the Polaris client remains up-to-date and functional. **Please note that this command does not support any options and its usage information is not available via a `--help` flag.** - -## Examples - -This section outlines example code for a few common operations as well as for some more complex ones. - -For especially complex operations, you may wish to instead directly use the Python API. - -### Creating a principal and a catalog - -``` -polaris principals create my_user - -polaris catalogs create \ - --type internal \ - --storage-type s3 \ - --default-base-location s3://iceberg-bucket/polaris-base \ - --role-arn arn:aws:iam::111122223333:role/ExampleCorpRole \ - --allowed-location s3://iceberg-bucket/polaris-alt-location-1 \ - --allowed-location s3://iceberg-bucket/polaris-alt-location-2 \ - my_catalog -``` - -### Granting a principal the ability to manage the content of a catalog - -``` -polaris principal-roles create power_user -polaris principal-roles grant --principal my_user power_user - -polaris catalog-roles create --catalog my_catalog my_catalog_role -polaris catalog-roles grant \ - --catalog my_catalog \ - --principal-role power_user \ - my_catalog_role - -polaris privileges \ - catalog \ - --catalog my_catalog \ - --catalog-role my_catalog_role \ - grant \ - CATALOG_MANAGE_CONTENT -``` - -### Identifying the tables a given principal has been granted explicit access to read - -_Note that some other privileges, such as `CATALOG_MANAGE_CONTENT`, subsume `TABLE_READ_DATA` and would not be discovered here._ - -``` -principal_roles=$(polaris principal-roles list --principal my_principal) -for principal_role in ${principal_roles}; do - catalog_roles=$(polaris catalog-roles --list --principal-role "${principal_role}") - for catalog_role in ${catalog_roles}; do - grants=$(polaris privileges list --catalog-role "${catalog_role}" --catalog "${catalog}") - for grant in $(echo "${grants}" | jq -c '.[] | select(.privilege == "TABLE_READ_DATA")'); do - echo "${grant}" - done - done -done -``` diff --git a/1.1.0/configuration.md b/1.1.0/configuration.md deleted file mode 100644 index fec8940d6b..0000000000 --- a/1.1.0/configuration.md +++ /dev/null @@ -1,190 +0,0 @@ ---- -# -# Licensed to the Apache Software Foundation (ASF) under one -# or more contributor license agreements. See the NOTICE file -# distributed with this work for additional information -# regarding copyright ownership. The ASF licenses this file -# to you under the Apache License, Version 2.0 (the -# "License"); you may not use this file except in compliance -# with the License. You may obtain a copy of the License at -# -# http://www.apache.org/licenses/LICENSE-2.0 -# -# Unless required by applicable law or agreed to in writing, -# software distributed under the License is distributed on an -# "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY -# KIND, either express or implied. See the License for the -# specific language governing permissions and limitations -# under the License. -# -title: Configuring Polaris -type: docs -weight: 550 ---- - -## Overview - -This page provides information on how to configure Apache Polaris (Incubating). Unless stated -otherwise, this information is valid both for Polaris Docker images (and Kubernetes deployments) as -well as for Polaris binary distributions. - -> [!NOTE] -> For Production tips and best practices, refer to [Configuring Polaris for Production]({{% ref "configuring-polaris-for-production.md" %}}). - -First off, Polaris server runs on Quarkus, and uses its configuration mechanisms. Read Quarkus -[configuration guide](https://quarkus.io/guides/config) to get familiar with the basics. - -Quarkus aggregates configuration properties from multiple sources, applying them in a specific order -of precedence. When a property is defined in multiple sources, the value from the source with the -higher priority overrides those from lower-priority sources. - -The sources are listed below, from highest to lowest priority: - -1. System properties: properties set via the Java command line using `-Dproperty.name=value`. -2. Environment variables (see below for important details). -3. Settings in `$PWD/config/application.properties` file. -4. The `application.properties` files packaged in Polaris. -5. Default values: hardcoded defaults within the application. - -When using environment variables, there are two naming conventions: - -1. If possible, just use the property name as the environment variable name. This works fine in most - cases, e.g. in Kubernetes deployments. For example, `polaris.realm-context.realms` can be - included as is in a container YAML definition: - ```yaml - env: - - name: "polaris.realm-context.realms" - value: "realm1,realm2" - ``` - -2. If running from a script or shell prompt, however, stricter naming rules apply: variable names - can consist solely of uppercase letters, digits, and the `_` (underscore) sign. In such - situations, the environment variable name must be derived from the property name, by using - uppercase letters, and replacing all dots, dashes and quotes by underscores. For example, - `polaris.realm-context.realms` becomes `POLARIS_REALM_CONTEXT_REALMS`. See - [here](https://smallrye.io/smallrye-config/Main/config/environment-variables/) for more details. - -> [!IMPORTANT] -> While convenient, uppercase-only environment variables can be problematic for complex property -> names. In these situations, it's preferable to use system properties or a configuration file. - -As stated above, a configuration file can also be provided at runtime; it should be available -(mounted) at `$PWD/config/application.properties` for Polaris server to recognize it. In Polaris -official Docker images, this location is `/deployment/config/application.properties`. - -For Kubernetes deployments, the configuration file is typically defined as a `ConfigMap`, then -mounted in the container at `/deployment/config/application.properties`. It can be mounted in -read-only mode, as Polaris only reads the configuration file once, at startup. - -## Polaris Configuration Options Reference - -| Configuration Property | Default Value | Description | -|----------------------------------------------------------------------------------------|-----------------------|---------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------| -| `polaris.persistence.type` | `relational-jdbc` | Define the persistence backend used by Polaris (`in-memory`, `relational-jdbc`, `eclipse-link` (deprecated)). See [Configuring Apache Polaris for Production)[{{% ref "configuring-polaris-for-production.md" %}}) | -| `polaris.persistence.relational.jdbc.max-retries` | `1` | Total number of retries JDBC persistence will attempt on connection resets or serialization failures before giving up. | -| `polaris.persistence.relational.jdbc.max_duaration_in_ms` | `5000 ms` | Max time interval (ms) since the start of a transaction when retries can be attempted. | -| `polaris.persistence.relational.jdbc.initial_delay_in_ms` | `100 ms` | Initial delay before retrying. The delay is doubled after each retry. | -| `polaris.persistence.eclipselink.configurationFile` | | Define the location of the `persistence.xml`. By default, it's the built-in `persistence.xml` in use. | -| `polaris.persistence.eclipselink.persistenceUnit` | `polaris` | Define the name of the persistence unit to use, as defined in the `persistence.xml`. | -| `polaris.realm-context.type` | `default` | Define the type of the Polaris realm to use. | -| `polaris.realm-context.realms` | `POLARIS` | Define the list of realms to use. | -| `polaris.realm-context.header-name` | `Polaris-Realm` | Define the header name defining the realm context. | -| `polaris.features."ENFORCE_PRINCIPAL_CREDENTIAL_ROTATION_REQUIRED_CHECKING"` | `false` | Flag to enforce check if credential rotation. | -| `polaris.features."SUPPORTED_CATALOG_STORAGE_TYPES"` | `FILE` | Define the catalog supported storage. Supported values are `S3`, `GCS`, `AZURE`, `FILE`. | -| `polaris.features.realm-overrides."my-realm"."SKIP_CREDENTIAL_SUBSCOPING_INDIRECTION"` | `true` | "Override" realm features, here the skip credential subscoping indirection flag. | -| `polaris.authentication.authenticator.type` | `default` | Define the Polaris authenticator type. | -| `polaris.authentication.token-service.type` | `default` | Define the Polaris token service type. | -| `polaris.authentication.token-broker.type` | `rsa-key-pair` | Define the Polaris token broker type. Also configure the location of the key files. For RSA: if the locations of the key files are not configured, an ephemeral key-pair will be created on each Polaris server instance startup, which breaks existing tokens after server restarts and is also incompatible with running multiple Polaris server instances. | -| `polaris.authentication.token-broker.max-token-generation` | `PT1H` | Define the max token generation policy on the token broker. | -| `polaris.authentication.token-broker.rsa-key-pair.private-key-file` | | Define the location of the RSA-256 private key file, if present the `public-key` file must be specified, too. | -| `polaris.authentication.token-broker.rsa-key-pair.public-key-file` | | Define the location of the RSA-256 public key file, if present the `private-key` file must be specified, too. | -| `polaris.authentication.token-broker.symmetric-key.secret` | `secret` | Define the secret of the symmetric key. | -| `polaris.authentication.token-broker.symmetric-key.file` | `/tmp/symmetric.key` | Define the location of the symmetric key file. | -| `polaris.storage.aws.access-key` | `accessKey` | Define the AWS S3 access key. If unset, the default credential provider chain will be used. | -| `polaris.storage.aws.secret-key` | `secretKey` | Define the AWS S3 secret key. If unset, the default credential provider chain will be used. | -| `polaris.storage.gcp.token` | `token` | Define the Google Cloud Storage token. If unset, the default credential provider chain will be used. | -| `polaris.storage.gcp.lifespan` | `PT1H` | Define the Google Cloud Storage lifespan type. If unset, the default credential provider chain will be used. | -| `polaris.log.request-id-header-name` | `Polaris-Request-Id` | Define the header name to match request ID in the log. | -| `polaris.log.mdc.aid` | `polaris` | Define the log context (e.g. MDC) AID. | -| `polaris.log.mdc.sid` | `polaris-service` | Define the log context (e.g. MDC) SID. | -| `polaris.rate-limiter.filter.type` | `no-op` | Define the Polaris rate limiter. Supported values are `no-op`, `token-bucket`. | -| `polaris.rate-limiter.token-bucket.type` | `default` | Define the token bucket rate limiter. | -| `polaris.rate-limiter.token-bucket.requests-per-second` | `9999` | Define the number of requests per second for the token bucket rate limiter. | -| `polaris.rate-limiter.token-bucket.window` | `PT10S` | Define the window type for the token bucket rate limiter. | -| `polaris.metrics.tags.=` | `application=Polaris` | Define arbitrary metric tags to include in every request. | -| `polaris.metrics.realm-id-tag.api-metrics-enabled` | `false` | Whether to enable the `realm_id` metric tag in API metrics. | -| `polaris.metrics.realm-id-tag.http-metrics-enabled` | `false` | Whether to enable the `realm_id` metric tag in HTTP request metrics. | -| `polaris.metrics.realm-id-tag.http-metrics-max-cardinality` | `100` | The maximum cardinality for the `realm_id` tag in HTTP request metrics. | -| `polaris.tasks.max-concurrent-tasks` | `100` | Define the max number of concurrent tasks. | -| `polaris.tasks.max-queued-tasks` | `1000` | Define the max number of tasks in queue. | - | `polaris.config.rollback.compaction.on-conflicts.enabled` | `false` | When set to true Polaris will apply the deconfliction by rollbacking those REPLACE operations snapshots which have the property of `polaris.internal.rollback.compaction.on-conflict` in their snapshot summary set to `rollback`, to resolve conflicts at the server end. | - -There are non Polaris configuration properties that can be useful: - -| Configuration Property | Default Value | Description | -|------------------------------------------------------|---------------------------------|-----------------------------------------------------------------------------| -| `quarkus.log.level` | `INFO` | Define the root log level. | -| `quarkus.log.category."org.apache.polaris".level` | | Define the log level for a specific category. | -| `quarkus.default-locale` | System locale | Force the use of a specific locale, for instance `en_US`. | -| `quarkus.http.port` | `8181` | Define the HTTP port number. | -| `quarkus.http.auth.basic` | `false` | Enable the HTTP basic authentication. | -| `quarkus.http.limits.max-body-size` | `10240K` | Define the HTTP max body size limit. | -| `quarkus.http.cors.origins` | | Define the HTTP CORS origins. | -| `quarkus.http.cors.methods` | `PATCH, POST, DELETE, GET, PUT` | Define the HTTP CORS covered methods. | -| `quarkus.http.cors.headers` | `*` | Define the HTTP CORS covered headers. | -| `quarkus.http.cors.exposed-headers` | `*` | Define the HTTP CORS covered exposed headers. | -| `quarkus.http.cors.access-control-max-age` | `PT10M` | Define the HTTP CORS access control max age. | -| `quarkus.http.cors.access-control-allow-credentials` | `true` | Define the HTTP CORS access control allow credentials flag. | -| `quarkus.management.enabled` | `true` | Enable the management server. | -| `quarkus.management.port` | `8182` | Define the port number of the Polaris management server. | -| `quarkus.management.root-path` | | Define the root path where `/metrics` and `/health` endpoints are based on. | -| `quarkus.otel.sdk.disabled` | `true` | Enable the OpenTelemetry layer. | - -> [!NOTE] -> This section is only relevant for Polaris Docker images and Kubernetes deployments. - -There are many other actionable environment variables available in the official Polaris Docker -image; they come from the base image used by Polaris, [ubi9/openjdk-21-runtime]. They should be used -to fine-tune the Java runtime directly, e.g. to enable debugging or to set the heap size. These -variables are not specific to Polaris, but are inherited from the base image. If in doubt, leave -everything at its default! - -[ubi9/openjdk-21-runtime]: https://catalog.redhat.com/software/containers/ubi9/openjdk-21-runtime/6501ce769a0d86945c422d5f - -| Environment variable | Description | -|----------------------------------|----------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------| -| `JAVA_OPTS` or `JAVA_OPTIONS` | **NOT RECOMMENDED**. JVM options passed to the `java` command (example: "-verbose:class"). Setting this variable will override all options set by any of the other variables in this table. To pass extra settings, use `JAVA_OPTS_APPEND` instead. | -| `JAVA_OPTS_APPEND` | User specified Java options to be appended to generated options in `JAVA_OPTS` (example: "-Dsome.property=foo"). | -| `JAVA_TOOL_OPTIONS` | This variable is defined and honored by all OpenJDK distros, see [here](https://bugs.openjdk.org/browse/JDK-4971166). Options defined here take precedence over all else; using this variable is generally not necessary, but can be useful e.g. to enforce JVM startup parameters, to set up remote debug, or to define JVM agents. | -| `JAVA_MAX_MEM_RATIO` | Is used to calculate a default maximal heap memory based on a containers restriction. If used in a container without any memory constraints for the container then this option has no effect. If there is a memory constraint then `-XX:MaxRAMPercentage` is set to a ratio of the container available memory as set here. The default is `80` which means 80% of the available memory is used as an upper boundary. You can skip this mechanism by setting this value to `0` in which case no `-XX:MaxRAMPercentage` option is added. | -| `JAVA_DEBUG` | If set remote debugging will be switched on. Disabled by default (example: true"). | -| `JAVA_DEBUG_PORT` | Port used for remote debugging. Defaults to "5005" (tip: use "*:5005" to enable debugging on all network interfaces). | -| `GC_MIN_HEAP_FREE_RATIO` | Minimum percentage of heap free after GC to avoid expansion. Default is 10. | -| `GC_MAX_HEAP_FREE_RATIO` | Maximum percentage of heap free after GC to avoid shrinking. Default is 20. | -| `GC_TIME_RATIO` | Specifies the ratio of the time spent outside the garbage collection. Default is 4. | -| `GC_ADAPTIVE_SIZE_POLICY_WEIGHT` | The weighting given to the current GC time versus previous GC times. Default is 90. | -| `GC_METASPACE_SIZE` | The initial metaspace size. There is no default (example: "20"). | -| `GC_MAX_METASPACE_SIZE` | The maximum metaspace size. There is no default (example: "100"). | -| `GC_CONTAINER_OPTIONS` | Specify Java GC to use. The value of this variable should contain the necessary JRE command-line options to specify the required GC, which will override the default of `-XX:+UseParallelGC` (example: `-XX:+UseG1GC`). | -Here are some examples: - -| Example | `docker run` option | -|--------------------------------------------|---------------------------------------------------------------------------------------------------------------------| -| Using another GC | `-e GC_CONTAINER_OPTIONS="-XX:+UseShenandoahGC"` lets Polaris use Shenandoah GC instead of the default parallel GC. | -| Set the Java heap size to a _fixed_ amount | `-e JAVA_OPTS_APPEND="-Xms8g -Xmx8g"` lets Polaris use a Java heap of 8g. | -| Set the maximum heap percentage | `-e JAVA_MAX_MEM_RATIO="70"` lets Polaris use 70% percent of the available memory. | - - -## Troubleshooting Configuration Issues - -If you encounter issues with the configuration, you can ask Polaris to print out the configuration it -is using. To do this, set the log level for the `io.smallrye.config` category to `DEBUG`, and also -set the console appender level to `DEBUG`: - -```properties -quarkus.log.console.level=DEBUG -quarkus.log.category."io.smallrye.config".level=DEBUG -``` - -> [!IMPORTANT] This will print out all configuration values, including sensitive ones like -> passwords. Don't do this in production, and don't share this output with anyone you don't trust! diff --git a/1.1.0/configuring-polaris-for-production.md b/1.1.0/configuring-polaris-for-production.md deleted file mode 100644 index befe4bba81..0000000000 --- a/1.1.0/configuring-polaris-for-production.md +++ /dev/null @@ -1,220 +0,0 @@ ---- -# -# Licensed to the Apache Software Foundation (ASF) under one -# or more contributor license agreements. See the NOTICE file -# distributed with this work for additional information -# regarding copyright ownership. The ASF licenses this file -# to you under the Apache License, Version 2.0 (the -# "License"); you may not use this file except in compliance -# with the License. You may obtain a copy of the License at -# -# http://www.apache.org/licenses/LICENSE-2.0 -# -# Unless required by applicable law or agreed to in writing, -# software distributed under the License is distributed on an -# "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY -# KIND, either express or implied. See the License for the -# specific language governing permissions and limitations -# under the License. -# -title: Configuring Polaris for Production -linkTitle: Production Configuration -type: docs -weight: 600 ---- - -The default server configuration is intended for development and testing. When you deploy Polaris in production, -review and apply the following checklist: -- [ ] Configure OAuth2 keys -- [ ] Enforce realm header validation (`require-header=true`) -- [ ] Use a durable metastore (JDBC + PostgreSQL) -- [ ] Bootstrap valid realms in the metastore -- [ ] Disable local FILE storage - -### Configure OAuth2 - -Polaris authentication requires specifying a token broker factory type. Two implementations are -supported out of the box: - -- [rsa-key-pair] uses a pair of public and private keys; -- [symmetric-key] uses a shared secret. - -[rsa-key-pair]: https://github.com/apache/polaris/blob/390f1fa57bb1af24a21aa95fdbff49a46e31add7/service/common/src/main/java/org/apache/polaris/service/auth/JWTRSAKeyPairFactory.java -[symmetric-key]: https://github.com/apache/polaris/blob/390f1fa57bb1af24a21aa95fdbff49a46e31add7/service/common/src/main/java/org/apache/polaris/service/auth/JWTSymmetricKeyFactory.java - -By default, Polaris uses `rsa-key-pair`, with randomly generated keys. - -> [!IMPORTANT] -> The default `rsa-key-pair` configuration is not suitable when deploying many replicas of Polaris, -> as each replica will have its own set of keys. This will cause token validation to fail when a -> request is routed to a different replica than the one that issued the token. - -It is highly recommended to configure Polaris with previously-generated RSA keys. This can be done -by setting the following properties: - -```properties -polaris.authentication.token-broker.type=rsa-key-pair -polaris.authentication.token-broker.rsa-key-pair.public-key-file=/tmp/public.key -polaris.authentication.token-broker.rsa-key-pair.private-key-file=/tmp/private.key -``` - -To generate an RSA key pair in PKCS#8 format, you can use the following commands: - -```shell -openssl genpkey -algorithm RSA -out private.key -pkeyopt rsa_keygen_bits:2048 -openssl rsa -in private.key -pubout -out public.key -``` - -Alternatively, you can use a symmetric key by setting the following properties: - -```properties -polaris.authentication.token-broker.type=symmetric-key -polaris.authentication.token-broker.symmetric-key.file=/tmp/symmetric.key -``` - -Note: it is also possible to set the symmetric key secret directly in the configuration file. If -possible, pass the secret as an environment variable to avoid storing sensitive information in the -configuration file: - -```properties -polaris.authentication.token-broker.symmetric-key.secret=${POLARIS_SYMMETRIC_KEY_SECRET} -``` - -Finally, you can also configure the token broker to use a maximum lifespan by setting the following -property: - -```properties -polaris.authentication.token-broker.max-token-generation=PT1H -``` - -Typically, in Kubernetes, you would define the keys as a `Secret` and mount them as files in the -container. - -### Realm Context Resolver - -By default, Polaris resolves realms based on incoming request headers. You can configure the realm -context resolver by setting the following properties in `application.properties`: - -```properties -polaris.realm-context.realms=POLARIS,MY-REALM -polaris.realm-context.header-name=Polaris-Realm -``` - -Where: - -- `realms` is a comma-separated list of allowed realms. This setting _must_ be correctly configured. - At least one realm must be specified. -- `header-name` is the name of the header used to resolve the realm; by default, it is - `Polaris-Realm`. - -If a request contains the specified header, Polaris will use the realm specified in the header. If -the realm is not in the list of allowed realms, Polaris will return a `404 Not Found` response. - -If a request _does not_ contain the specified header, however, by default Polaris will use the first -realm in the list as the default realm. In the above example, `POLARIS` is the default realm and -would be used if the `Polaris-Realm` header is not present in the request. - -This is not recommended for production use, as it may lead to security vulnerabilities. To avoid -this, set the following property to `true`: - -```properties -polaris.realm-context.require-header=true -``` - -This will cause Polaris to also return a `404 Not Found` response if the realm header is not present -in the request. - -### Metastore Configuration - -A metastore should be configured with an implementation that durably persists Polaris entities. By -default, Polaris uses an in-memory metastore. - -> [!IMPORTANT] -> The default in-memory metastore is not suitable for production use, as it will lose all data -> when the server is restarted; it is also unusable when multiple Polaris replicas are used. - -To enable a durable metastore, configure your system to use the Relational JDBC-backed metastore. -This implementation leverages Quarkus for datasource management and supports configuration through -environment variables or JVM -D flags at startup. For more information, refer to the [Quarkus configuration reference](https://quarkus.io/guides/config-reference#env-file). - -Configure the metastore by setting the following ENV variables: - -``` -POLARIS_PERSISTENCE_TYPE=relational-jdbc - -QUARKUS_DATASOURCE_USERNAME= -QUARKUS_DATASOURCE_PASSWORD= -QUARKUS_DATASOURCE_JDBC_URL= -``` - - -The relational JDBC metastore is a Quarkus-managed datasource and only supports Postgres and H2 as of now. -Please refer to the documentation here: -[Configure data sources in Quarkus](https://quarkus.io/guides/datasource) - -> [!IMPORTANT] -> Be sure to secure your metastore backend since it will be storing sensitive data and catalog -> metadata. - -Note: Polaris will always create schema 'polaris_schema' during bootstrap under the configured database. - -### Bootstrapping - -Before using Polaris, you must **bootstrap** the metastore. This is a manual operation that must be -performed **only once** for each realm in order to prepare the metastore to integrate with Polaris. - -By default, when bootstrapping a new realm, Polaris will create randomised `CLIENT_ID` and -`CLIENT_SECRET` for the `root` principal and store their hashes in the metastore backend. - -Depending on your database, this may not be convenient as the generated credentials are not stored -in clear text in the database. - -In order to provide your own credentials for `root` principal (so you can request tokens via -`api/catalog/v1/oauth/tokens`), use the [Polaris Admin Tool]({{% ref "admin-tool" %}}) - -You can verify the setup by attempting a token issue for the `root` principal: - -```bash -curl -X POST http://localhost:8181/api/catalog/v1/oauth/tokens \ - -d "grant_type=client_credentials" \ - -d "client_id=my-client-id" \ - -d "client_secret=my-client-secret" \ - -d "scope=PRINCIPAL_ROLE:ALL" -``` - -Which should return an access token: - -```json -{ - "access_token": "...", - "token_type": "bearer", - "issued_token_type": "urn:ietf:params:oauth:token-type:access_token", - "expires_in": 3600 -} -``` - -If you used a non-default realm name, add the appropriate request header to the `curl` command, -otherwise Polaris will resolve the realm to the first one in the configuration -`polaris.realm-context.realms`. Here is an example to set realm header: - -```bash -curl -X POST http://localhost:8181/api/catalog/v1/oauth/tokens \ - -H "Polaris-Realm: my-realm" \ - -d "grant_type=client_credentials" \ - -d "client_id=my-client-id" \ - -d "client_secret=my-client-secret" \ - -d "scope=PRINCIPAL_ROLE:ALL" -``` - -### Disable FILE Storage Type -By default, Polaris allows using the local file system (`FILE`) for catalog storage. This is fine for testing, -but **not recommended for production**. To disable it, set the supported storage types like this: -```hocon -polaris.features."SUPPORTED_CATALOG_STORAGE_TYPES" = [ "S3", "Azure" ] -``` -Leave out `FILE` to prevent its use. Only include the storage types your setup needs. - -### Upgrade Considerations - -The [Polaris Evolution](../evolution) page discusses backward compatibility and -upgrade concerns. diff --git a/1.1.0/entities.md b/1.1.0/entities.md deleted file mode 100644 index df53a0787f..0000000000 --- a/1.1.0/entities.md +++ /dev/null @@ -1,91 +0,0 @@ ---- -# -# Licensed to the Apache Software Foundation (ASF) under one -# or more contributor license agreements. See the NOTICE file -# distributed with this work for additional information -# regarding copyright ownership. The ASF licenses this file -# to you under the Apache License, Version 2.0 (the -# "License"); you may not use this file except in compliance -# with the License. You may obtain a copy of the License at -# -# http://www.apache.org/licenses/LICENSE-2.0 -# -# Unless required by applicable law or agreed to in writing, -# software distributed under the License is distributed on an -# "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY -# KIND, either express or implied. See the License for the -# specific language governing permissions and limitations -# under the License. -# -Title: Entities -type: docs -weight: 400 ---- - -This page documents various entities that can be managed in Apache Polaris (Incubating). - -## Catalog - -A catalog is a top-level entity in Polaris that may contain other entities like [namespaces](#namespace) and [tables](#table). These map directly to [Apache Iceberg catalogs](https://iceberg.apache.org/terms/#catalog). - -For information on managing catalogs with the REST API or for more information on what data can be associated with a catalog, see [the CreateCatalogRequest OpenAPI](https://github.com/apache/polaris/blob/main/spec/polaris-management-service.yml). - -### Storage Type - -All catalogs in Polaris are associated with a _storage type_. Valid Storage Types are `S3`, `Azure`, and `GCS`. The `FILE` type is also additionally available for testing. Each of these types relates to a different storage provider where data within the catalog may reside. Depending on the storage type, various other configurations may be set for a catalog including credentials to be used when accessing data inside the catalog. - -For details on how to use Storage Types in the REST API, see [the StorageConfigInfo OpenAPI](https://github.com/apache/polaris/blob/main/spec/polaris-management-service.yml). - -For usage examples of storage types, see [docs]({{% ref "command-line-interface" %}}). - -## Namespace - -A namespace is a logical entity that resides within a [catalog](#catalog) and can contain other entities such as [tables](#table) or [views](#view). Some other systems may refer to namespaces as _schemas_ or _databases_. - -In Polaris, namespaces can be nested. For example, `a.b.c.d.e.f.g` is a valid namespace. `b` is said to reside within `a`, and so on. - -For information on managing namespaces with the REST API or for more information on what data can be associated with a namespace, see [the CreateNamespaceRequest OpenAPI](https://github.com/apache/polaris/blob/main/spec/polaris-management-service.yml). - -## Table - -Polaris tables are entities that map to [Apache Iceberg tables](https://iceberg.apache.org/docs/nightly/configuration/), [Delta tables](https://docs.databricks.com/aws/en/delta/table-properties), or [Hudi tables](https://hudi.apache.org/docs/next/configurations#TABLE_CONFIG). - -For information on managing tables with the REST API or for more information on what data can be associated with a table, see [the CreateTableRequest OpenAPI](https://github.com/apache/polaris/blob/main/spec/polaris-management-service.yml). - -## View - -Polaris views are entities that map to [Apache Iceberg views](https://iceberg.apache.org/view-spec/). - -For information on managing views with the REST API or for more information on what data can be associated with a view, see [the CreateViewRequest OpenAPI](https://github.com/apache/polaris/blob/main/spec/polaris-management-service.yml). - -## Principal - -Polaris principals are unique identities that can be used to represent users or services. Each principal may have one or more [principal roles](#principal-role) assigned to it for the purpose of accessing catalogs and the entities within them. - -For information on managing principals with the REST API or for more information on what data can be associated with a principal, see [the CreatePrincipalRequest OpenAPI](https://github.com/apache/polaris/blob/main/spec/polaris-management-service.yml). - -## Principal Role - -Polaris principal roles are labels that may be granted to [principals](#principal). Each principal may have one or more principal roles, and the same principal role may be granted to multiple principals. Principal roles may be assigned based on the persona or responsibilities of a given principal, or on how that principal will need to access different entities within Polaris. - -For information on managing principal roles with the REST API or for more information on what data can be associated with a principal role, see [the CreatePrincipalRoleRequest OpenAPI](https://github.com/apache/polaris/blob/main/spec/polaris-management-service.yml) - -## Catalog Role - -Polaris catalog roles are labels that may be granted to [catalogs](#catalog). Each catalog may have one or more catalog roles, and the same catalog role may be granted to multiple catalogs. Catalog roles may be assigned based on the nature of data that will reside in a catalog, or by the groups of users and services that might need to access that data. - -Each catalog role may have multiple [privileges](#privilege) granted to it, and each catalog role can be granted to one or more [principal roles](#principal-role). This is the mechanism by which principals are granted access to entities inside a catalog such as namespaces and tables. - -## Policy - -Polaris policy is a set of rules governing actions on specified resources under predefined conditions. Polaris support policy for Iceberg table compaction, snapshot expiry, row-level access control, and custom policy definitions. - -Policy can be applied at catalog level, namespace level, or table level. Policy inheritance can be achieved by attaching one to a higher-level scope, such as namespace or catalog. As a result, tables registered under those entities do not need to be declared individually for the same policy. If a table or a namespace requires a different policy, user can assign a different policy, hence overriding policy of the same type declared at the higher level entities. - -## Privilege - -Polaris privileges are granted to [catalog roles](#catalog-role) in order to grant principals with a given principal role some degree of access to catalogs with a given catalog role. When a privilege is granted to a catalog role, any principal roles granted that catalog role receive the privilege. In turn, any principals who are granted that principal role receive it. - -A privilege can be scoped to any entity inside a catalog, including the catalog itself. - -For a list of supported privileges for each privilege class, see [the OpenAPI](https://github.com/apache/polaris/blob/main/spec/polaris-management-service.yml) (TablePrivilege, ViewPrivilege, NamespacePrivilege, CatalogPrivilege). diff --git a/1.1.0/evolution.md b/1.1.0/evolution.md deleted file mode 100644 index b3a57c7525..0000000000 --- a/1.1.0/evolution.md +++ /dev/null @@ -1,115 +0,0 @@ ---- -# -# Licensed to the Apache Software Foundation (ASF) under one -# or more contributor license agreements. See the NOTICE file -# distributed with this work for additional information -# regarding copyright ownership. The ASF licenses this file -# to you under the Apache License, Version 2.0 (the -# "License"); you may not use this file except in compliance -# with the License. You may obtain a copy of the License at -# -# http://www.apache.org/licenses/LICENSE-2.0 -# -# Unless required by applicable law or agreed to in writing, -# software distributed under the License is distributed on an -# "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY -# KIND, either express or implied. See the License for the -# specific language governing permissions and limitations -# under the License. -# -title: Polaris Evolution -type: docs -weight: 1000 ---- - -This page discusses what can be expected from Apache Polaris as the project evolves. - -## Using Polaris as a Catalog - -Polaris is primarily intended to be used as a Catalog of Tables and Views. As such, -it implements the Iceberg REST Catalog API and its own REST APIs. - -Revisions of the Iceberg REST Catalog API are controlled by the [Apache Iceberg](https://iceberg.apache.org/) -community. Polaris attempts to accurately implement this specification. Nonetheless, -optional REST Catalog features may or may not be supported immediately. In general, -there is no guarantee that Polaris releases always implement the latest version of -the Iceberg REST Catalog API. - -Any API under Polaris control that is not in an "experimental" or "beta" state -(e.g. the Management API) is maintained as a versioned REST API. New releases of Polaris -may include changes to the current version of the API. When that happens those changes -are intended to be compatible with prior versions of Polaris clients. Certain endpoints -and parameters may be deprecated. - -In case a major change is required to an API that cannot be implemented in a -backward-compatible way, new endpoints (URI paths) may be introduced. New URI "roots" may -be introduced too (e.g. `api/catalog/v2`). - -Note that those "v1", "v2", etc. URI path segments are not meant to be 1:1 with Polaris -releases or Polaris project version numbers (e.g. a "v2" path segment does not mean that -it is added in Polaris 2.0). - -Polaris servers will support deprecated API endpoints / parameters / versions / etc. -for some transition period to allow clients to migrate. - -### Managing Polaris Database - -Polaris stores its data in a database, which is sometimes referred to as "Metastore" or -"Persistence" in other docs. - -Each Polaris release may support multiple Persistence [implementations](../metastores), -for example, "EclipseLink" (deprecated) and "JDBC" (current). - -Each type of Persistence evolves individually. Within each Persistence type, Polaris -attempts to support rolling upgrades (both version X and X + 1 servers running at the -same time). - -However, migrating between different Persistence types is not supported in a rolling -upgrade manner (for example, migrating from "EclipseLink" to "JDBC"). Polaris provides -[tools](https://github.com/apache/polaris-tools/) for migrating between different -catalogs and those tools may be used to migrate between different Persistence types -as well. Service interruption (downtime) should be expected in those cases. - -## Using Polaris as a Build-Time Dependency - -Polaris produces several jars. These jars or custom builds of Polaris code may be used in -downstream projects according to the terms of the license included into Polaris distributions. - -The minimal version of the JRE required by Polaris code (compilation target) may be updated in -any release. Different Polaris jars may have different minimal JRE version requirements. - -Changes in Java class should be expected at any time regardless of the module name or -whether the class / method is `public` or not. - -This approach is not meant to discourage the use of Polaris code in downstream projects, but -to allow more flexibility in evolving the codebase to support new catalog-level features -and improve code efficiency. Maintainers of downstream projects are encouraged to join Polaris -mailing lists to monitor project changes, suggest improvements, and engage with the Polaris -community in case of specific compatibility concerns. - -## Semantic Versioning - -Polaris strives to follow [Semantic Versioning](https://semver.org/) conventions both with -respect to REST APIs (beta and experimental APIs excepted), [Polaris Policies](../policy/) -and user-facing [configuration](../configuration/). - -The following are some examples of Polaris approach to SemVer in REST APIs / configuration. -These examples are for illustration purposes and should not be considered to be -exhaustive. - -* Polaris implementing an optional Iceberg REST Catalog feature that was unimplemented -in the previous release is not considered a major change. - -* Supporting a new revision of the Iceberg REST Catalog spec in a backward-compatible way -is not considered a major change. Specifically, supporting new REST API prefixes (e.g. `v2`) -is not a major change because it does not affect older clients. - -* Changing the implementation of an Iceberg REST Catalog feature / endpoint in a non-backward -compatible way (e.g. removing or renaming a request parameter) is a major change. - -* Dropping support for a configuration property with the `polaris.` name prefix is a major change. - -* Dropping support for any previously defined [Policy](../policy/) type or property is a major change. - -* Upgrading Quarkus Runtime to its next major version is a major change (because -Quarkus-managed configuration may change). diff --git a/1.1.0/external-idp.md b/1.1.0/external-idp.md deleted file mode 100644 index 4e79a71995..0000000000 --- a/1.1.0/external-idp.md +++ /dev/null @@ -1,360 +0,0 @@ ---- -# -# Licensed to the Apache Software Foundation (ASF) under one -# or more contributor license agreements. See the NOTICE file -# distributed with this work for additional information -# regarding copyright ownership. The ASF licenses this file -# to you under the Apache License, Version 2.0 (the -# "License"); you may not use this file except in compliance -# with the License. You may obtain a copy of the License at -# -# http://www.apache.org/licenses/LICENSE-2.0 -# -# Unless required by applicable law or agreed to in writing, -# software distributed under the License is distributed on an -# "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY -# KIND, either express or implied. See the License for the -# specific language governing permissions and limitations -# under the License. -# -title: External Identity Providers -type: docs -weight: 550 ---- - -Apache Polaris supports authentication via external identity providers (IdPs) using OpenID Connect (OIDC) in addition to the internal authentication system. This feature enables flexible identity federation with enterprise IdPs and allows gradual migration or hybrid authentication strategies across realms in Polaris. - -## Authentication Types - -Polaris supports three authentication modes: - -1. `internal` (Default) - - Only Polaris internal authentication is used. -2. `external` - - Authenticates using external OIDC providers (via Quarkus OIDC). - - Disables the internal token endpoint (returns HTTP 501). -3. `mixed` - - Tries internal authentication first; if this fails, it falls back to OIDC. - -Authentication can be configured globally or per realm by setting the following properties: - -```properties -# Global default -polaris.authentication.type=internal -# Per-realm override -polaris.authentication.realm1.type=external -polaris.authentication.realm2.type=mixed -``` - -## Key Components - -### Authenticator - -The [`Authenticator`](https://github.com/apache/polaris/blob/main/service/common/src/main/java/org/apache/polaris/service/auth/Authenticator.java) is a component responsible for resolving the principal and the principal roles, and for creating a `PolarisPrincipal` from the credentials provided by the authentication process. It is a central component and is invoked for all types of authentication. - -The `type` property is used to define the `Authenticator` implementation. It is overridable per realm: - -```properties -polaris.authentication.authenticator.type=default -polaris.authentication.realm1.authenticator.type=custom -``` - -### Active Roles Provider - -The [`ActiveRolesProvider`](https://github.com/apache/polaris/blob/main/service/common/src/main/java/org/apache/polaris/service/auth/ActiveRolesProvider.java) is a component responsible for determining which roles the principal is requesting and should be activated. It is common to all authentication types. - -Only the `type` property is defined; it is used to define the provider implementation. It is overridable per realm: - -```properties -polaris.authentication.active-roles-provider.type=default -``` - -## Internal Authentication Configuration - -### Token Broker - -The [`TokenBroker`](https://github.com/apache/polaris/blob/main/service/common/src/main/java/org/apache/polaris/service/auth/TokenBroker.java) signs and verifies tokens to ensure that they can be validated and remain unaltered. - -```properties -polaris.authentication.token-broker.type=rsa-key-pair -polaris.authentication.token-broker.max-token-generation=PT1H -``` - -Two types are available: - -- `rsa-key-pair` (recommended for production): Uses an RSA key pair for token signing and validation. -- `symmetric-key`: Uses a shared secret for both operations; suitable for single-node deployments or testing. - -The property `polaris.authentication.token-broker.max-token-generation` specifies the maximum validity duration of tokens issued by the internal `TokenBroker`. - -- Format: ISO-8601 duration (e.g., `PT1H` for 1 hour, `PT30M` for 30 minutes). -- Default: `PT1H`. - -### Token Service - -The Token Service and `TokenServiceConfiguration` (Quarkus) is responsible for issuing and validating tokens (e.g., bearer tokens) for authenticated principals when internal authentication is used. It works in coordination with the `Authenticator` and `TokenBroker`. The default implementation is `default`, and this must be configured when using internal authentication. - -```properties -polaris.authentication.token-service.type=default -``` - -### Role Mapping - -When using internal authentication, token requests should include a `scope` parameter that specifies the roles to be activated for the principal. The `scope` parameter is a space-separated list of role names. - -The default `ActiveRolesProvider` expects role names to be in the following format: `PRINCIPAL_ROLE:`. - -For example, if the principal has the roles `service_admin` and `catalog_admin` and wants both activated, the `scope` parameter should look like this: - -```properties -scope=PRINCIPAL_ROLE:service_admin PRINCIPAL_ROLE:catalog_admin -``` - -Here is an example of a full request to the Polaris token endpoint using internal authentication: - -```http request -POST /api/catalog/v1/oauth/tokens HTTP/1.1 -Host: polaris.example.com:8181 -Content-Type: application/x-www-form-urlencoded - -grant_type=client_credentials&client_id=root&client_secret=s3cr3t&scope=PRINCIPAL_ROLE%3Aservice_admin%20PRINCIPAL_ROLE%3Acatalog_admin -``` - -## External Authentication Configuration - -External authentication is configured via Quarkus OIDC and Polaris-specific OIDC extensions. The following settings are used to integrate with an identity provider and extract identity and role information from tokens. - -### OIDC Tenant Configuration - -At least one OIDC tenant must be explicitly enabled. In Polaris, realms and OIDC tenants are distinct concepts. An OIDC tenant represents a specific identity provider configuration (e.g., `quarkus.oidc.idp1`). A [realm]({{% ref "realm" %}}) is a logical partition within Polaris. - -- Multiple realms can share a single OIDC tenant. -- Each realm can be associated with only one OIDC tenant. - -Therefore, multi-realm deployments can share a common identity provider while still enforcing realm-level scoping. To configure the default tenant: - -```properties -quarkus.oidc.tenant-enabled=true -quarkus.oidc.auth-server-url=https://auth.example.com/realms/polaris -quarkus.oidc.client-id=polaris -``` - -Alternatively, it is possible to use multiple named tenants. Each OIDC-named tenant is then configured with standard Quarkus settings: - -```properties -quarkus.oidc.oidc-tenant1.auth-server-url=http://localhost:8080/realms/polaris -quarkus.oidc.oidc-tenant1.client-id=client1 -quarkus.oidc.oidc-tenant1.application-type=service -``` - -When using multiple OIDC tenants, it's your responsibility to configure tenant resolution appropriately. See the [Quarkus OpenID Connect Multitenany Guide](https://quarkus.io/guides/security-openid-connect-multitenancy#tenant-resolution). - -### Principal Mapping - -While OIDC tenant resolution is entirely delegated to Quarkus, Polaris requires additional configuration to extract the Polaris principal and its roles from the credentials generated and validated by Quarkus. This part of the authentication process is configured with Polaris-specific properties that map JWT claims to Polaris principal fields: - -```properties -polaris.oidc.principal-mapper.type=default -polaris.oidc.principal-mapper.id-claim-path=polaris/principal_id -polaris.oidc.principal-mapper.name-claim-path=polaris/principal_name -``` - -These properties are overridable per OIDC tenant: - -```properties -polaris.oidc.oidc-tenant1.principal-mapper.id-claim-path=polaris/principal_id -polaris.oidc.oidc-tenant1.principal-mapper.name-claim-path=polaris/principal_name -``` - -> [!IMPORTANT]: The default implementation of PrincipalMapper can only work with JWT tokens. If your IDP issues opaque tokens instead, you will need to provide a custom implementation. - -### Role Mapping - -Similarly, Polaris requires additional configuration to map roles provided by Quarkus to roles defined in Polaris. The process happens in two phases: first, Quarkus maps the JWT claims to security roles, using the `quarkus.oidc.roles.*` properties; then, Polaris-specific properties are used to map the Quarkus-provided security roles to Polaris roles: - -```properties -quarkus.oidc.roles.role-claim-path=polaris/roles -polaris.oidc.principal-roles-mapper.type=default -polaris.oidc.principal-roles-mapper.filter=^(?!profile$|email$).* -polaris.oidc.principal-roles-mapper.mappings[0].regex=^.*$ -polaris.oidc.principal-roles-mapper.mappings[0].replacement=PRINCIPAL_ROLE:$0 -``` - -These mappings can be overridden per OIDC tenant and used across different realms that rely on external identity providers. For example: - -```properties -polaris.oidc.oidc-tenant1.principal-roles-mapper.type=custom -polaris.oidc.oidc-tenant1.principal-roles-mapper.filter=PRINCIPAL_ROLE:.* -polaris.oidc.oidc-tenant1.principal-roles-mapper.mappings[0].regex=PRINCIPAL_ROLE:(.*) -polaris.oidc.oidc-tenant1.principal-roles-mapper.mappings[0].replacement=PRINCIPAL_ROLE:$1 -``` - -The default `ActiveRolesProvider` expects the security identity to expose role names in the following format: `PRINCIPAL_ROLE:`. You can use the `filter` and `mappings` properties to adjust the role names as they appear in the JWT claims. - -For example, assume that the security identity produced by Quarkus exposes the following roles: `role_service_admin` and `role_catalog_admin`. Polaris expects `PRINCIPAL_ROLE:service_admin` and `PRINCIPAL_ROLE:catalog_admin` respectively. The following configuration can be used to achieve the desired mapping: - -```properties -# Exclude role names that don't start with "role_" -polaris.oidc.principal-roles-mapper.filter=role_.* -# Extract the text after "role_" -polaris.oidc.principal-roles-mapper.mappings[0].regex=role_(.*) -# Replace the extracted text with "PRINCIPAL_ROLE:" -polaris.oidc.principal-roles-mapper.mappings[0].replacement=PRINCIPAL_ROLE:$1 -``` - -See more examples below. - -## Developer Architecture Notes - -The following sections describe internal implementation details for developers who want to understand or extend Polaris authentication. - -### Authentication Architecture - -Polaris separates authentication into two logical phases using [Quarkus Security](https://quarkus.io/guides/security-overview): - -1. Credential extraction – parsing headers and tokens -2. Credential authentication – validating identity and assigning roles - -### Key Interfaces - -- [`Authenticator`](https://github.com/apache/polaris/blob/main/service/common/src/main/java/org/apache/polaris/service/auth/Authenticator.java): A core interface used to authenticate credentials. -- [`DecodedToken`](https://github.com/apache/polaris/blob/main/service/common/src/main/java/org/apache/polaris/service/auth/DecodedToken.java): Used in internal auth and inherits from `PrincipalCredential`. -- [`ActiveRolesProvider`](https://github.com/apache/polaris/blob/main/service/common/src/main/java/org/apache/polaris/service/auth/ActiveRolesProvider.java): Resolves the set of roles associated with the authenticated user for the current request. Roles may be derived from OIDC claims or internal mappings. - -The [`DefaultAuthenticator`](https://github.com/apache/polaris/blob/main/service/common/src/main/java/org/apache/polaris/service/auth/DefaultAuthenticator.java) is used to implement realm-specific logic based on these abstractions. - -### Token Broker Configuration - -When internal authentication is enabled, Polaris uses token brokers to handle the decoding and validation of authentication tokens. These brokers are request-scoped and can be configured per realm. Each realm may use its own strategy, such as RSA key pairs or shared secrets, depending on security requirements. - -## Developer Authentication Workflows - -### Internal Authentication - -1. [`InternalAuthenticationMechanism`](https://github.com/apache/polaris/blob/main/runtime/service/src/main/java/org/apache/polaris/service/quarkus/auth/internal/InternalAuthenticationMechanism.java) parses the auth header. -2. Uses [`TokenBroker`](https://github.com/apache/polaris/blob/main/service/common/src/main/java/org/apache/polaris/service/auth/TokenBroker.java) to decode the token. -3. Builds [`PrincipalAuthInfo`](https://github.com/apache/polaris/blob/main/service/common/src/main/java/org/apache/polaris/service/auth/PrincipalAuthInfo.java) and generates `SecurityIdentity` (Quarkus). -4. `Authenticator.authenticate()` validates the credential. -5. [`ActiveRolesProvider`](https://github.com/apache/polaris/blob/main/service/common/src/main/java/org/apache/polaris/service/auth/ActiveRolesProvider.java) assigns roles. - -### External Authentication - -1. `OidcAuthenticationMechanism` (Quarkus) processes the auth header. -2. [`OidcTenantResolvingAugmentor`](https://github.com/apache/polaris/blob/main/runtime/service/src/main/java/org/apache/polaris/service/quarkus/auth/external/OidcTenantResolvingAugmentor.java) selects the OIDC tenant. -3. [`PrincipalAuthInfoAugmentor`](https://github.com/apache/polaris/blob/main/runtime/service/src/main/java/org/apache/polaris/service/quarkus/auth/external/PrincipalAuthInfoAugmentor.java) extracts JWT claims. -4. `Authenticator.authenticate()` validates the claims. -5. [`ActiveRolesProvider`](https://github.com/apache/polaris/blob/main/service/common/src/main/java/org/apache/polaris/service/auth/ActiveRolesProvider.java) assigns roles. - -### Mixed Authentication - -1. [`InternalAuthenticationMechanism`](https://github.com/apache/polaris/blob/main/runtime/service/src/main/java/org/apache/polaris/service/quarkus/auth/internal/InternalAuthenticationMechanism.java) tries decoding. -2. If successful, proceed with internal authentication. -3. Otherwise, fall back to external (OIDC) authentication. - -## OIDC Configuration Reference - -### Principal Mapping - -- Interface: [`PrincipalMapper`](https://github.com/apache/polaris/blob/main/runtime/service/src/main/java/org/apache/polaris/service/quarkus/auth/external/mapping/PrincipalMapper.java) - - The `PrincipalMapper` is responsible for extracting the Polaris principal ID and display name from OIDC tokens. - -- Implementation selector: - - This property selects the implementation of the `PrincipalMapper` interface. The default implementation extracts fields from specific claim paths. - - ```properties - polaris.oidc.principal-mapper.type=default - ``` - -- Configuration properties for the default implementation: - - ```properties - polaris.oidc.principal-mapper.id-claim-path=polaris/principal_id - polaris.oidc.principal-mapper.name-claim-path=polaris/principal_name - ``` - -- It can be overridden per OIDC tenant. - -### Roles Mapping - -- Interface: [`PrincipalRolesMapper`](https://github.com/apache/polaris/blob/main/runtime/service/src/main/java/org/apache/polaris/service/quarkus/auth/external/mapping/PrincipalRolesMapper.java) - - Polaris uses this component to transform role claims from OIDC tokens into Polaris roles. - -- Quarkus OIDC configuration: - - This setting instructs Quarkus on where to locate roles within the OIDC token. - - ```properties - quarkus.oidc.roles.role-claim-path=polaris/roles - ``` - -- Implementation selector: - - This property selects the implementation of `PrincipalRolesMapper`. The `default` implementation applies regular expression (regex) transformations to OIDC roles. - - ```properties - polaris.oidc.principal-roles-mapper.type=default - ``` - -- Configuration properties for the default implementation: - - ```properties - polaris.oidc.principal-roles-mapper.filter=^(?!profile$|email$).* - polaris.oidc.principal-roles-mapper.mappings[0].regex=^.*$ - polaris.oidc.principal-roles-mapper.mappings[0].replacement=PRINCIPAL_ROLE:$0 - ``` - -### Example JWT Mappings - -#### Example 1: Custom Claim Paths - -- JWT - - ```json - { - "polaris": - { - "roles": ["PRINCIPAL_ROLE:ALL"], - "principal_name": "root", - "principal_id": 1 - } - } - ``` - -- Configuration - - ```properties - quarkus.oidc.roles.role-claim-path=polaris/roles - polaris.oidc.principal-mapper.id-claim-path=polaris/principal_id - polaris.oidc.principal-mapper.name-claim-path=polaris/principal_name - ``` - -#### Example 2: Generic OIDC Claims - -- JWT - - ```json - { - "sub": "1", - "scope": "service_admin catalog_admin profile email", - "preferred_username": "root" - } - ``` - -- Configuration - - ```properties - quarkus.oidc.roles.role-claim-path=scope - polaris.oidc.principal-mapper.id-claim-path=sub - polaris.oidc.principal-mapper.name-claim-path=preferred_username - polaris.oidc.principal-roles-mapper.filter=^(?!profile$|email$).* - polaris.oidc.principal-roles-mapper.mappings[0].regex=^.*$ - polaris.oidc.principal-roles-mapper.mappings[0].replacement=PRINCIPAL_ROLE:$0 - ``` - -- Result - - Polaris roles: `PRINCIPAL_ROLE:service_admin` and `PRINCIPAL_ROLE:catalog_admin` - diff --git a/1.1.0/generic-table.md b/1.1.0/generic-table.md deleted file mode 100644 index 63ef38a1da..0000000000 --- a/1.1.0/generic-table.md +++ /dev/null @@ -1,169 +0,0 @@ ---- -# -# Licensed to the Apache Software Foundation (ASF) under one -# or more contributor license agreements. See the NOTICE file -# distributed with this work for additional information -# regarding copyright ownership. The ASF licenses this file -# to you under the Apache License, Version 2.0 (the -# "License"); you may not use this file except in compliance -# with the License. You may obtain a copy of the License at -# -# http://www.apache.org/licenses/LICENSE-2.0 -# -# Unless required by applicable law or agreed to in writing, -# software distributed under the License is distributed on an -# "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY -# KIND, either express or implied. See the License for the -# specific language governing permissions and limitations -# under the License. -# -title: Generic Table (Beta) -type: docs -weight: 435 ---- - -The Generic Table in Apache Polaris is designed to provide support for non-Iceberg tables across different table formats includes delta, csv etc. It currently provides the following capabilities: -- Create a generic table under a namespace -- Load a generic table -- Drop a generic table -- List all generic tables under a namespace - -**NOTE** The current generic table is in beta release. Please use it with caution and report any issue if encountered. - -## What is a Generic Table? - -A generic table in Polaris is an entity that defines the following fields: - -- **name** (required): A unique identifier for the table within a namespace -- **format** (required): The format for the generic table, i.e. "delta", "csv" -- **base-location** (optional): Table base location in URI format. For example: s3:///path/to/table - - The table base location is a location that includes all files for the table - - A table with multiple disjoint locations (i.e. containing files that are outside the configured base location) is not compliant with the current generic table support in Polaris. - - If no location is provided, clients or users are responsible for managing the location. -- **properties** (optional): Properties for the generic table passed on creation. - - Currently, there is no reserved property key defined. - - The property definition and interpretation is delegated to client or engine implementations. -- **doc** (optional): Comment or description for the table - -## Generic Table API Vs. Iceberg Table API - -Generic Table provides a different set of APIs to operate on the generic table entities while Iceberg APIs operates on -the Iceberg table entities. - -| Operations | **Iceberg Table API** | **Generic Table API** | -|--------------|---------------------------------------------------------------------------------------------------------------------------------------------------------------------|-------------------------------------------------------------------------------------------------------------------------------| -| Create Table | Create an Iceberg table | Create a generic table | -| Load Table | Load an Iceberg table. If the table to load is a generic table, you need to call the Generic Table loadTable API, otherwise a TableNotFoundException will be thrown | Load a generic table. Similarly, try to load an Iceberg table through Generic Table API will thrown a TableNotFoundException. | -| Drop Table | Drop an Iceberg table. Similar as load table, if the table to drop is a Generic table, a tableNotFoundException will be thrown. | Drop a generic table. Drop an Iceberg table through Generic table endpoint will thrown an TableNotFound Exception | -| List Table | List all Iceberg tables | List all generic tables | - -Note that generic table shares the same namespace with Iceberg tables, the table name has to be unique under the same namespace. Furthermore, since -there is currently no support for Update Generic Table, any update to the existing table requires a drop and re-create. - -## Working with Generic Table - -There are two ways to work with Polaris Generic Tables today: -1) Directly communicate with Polaris through REST API calls using tools such as `curl`. Details will be described in the later section. -2) Use the Spark client provided if you are working with Spark. Please refer to [Polaris Spark Client]({{% ref "polaris-spark-client" %}}) for detailed instructions. - -### Create a Generic Table - -To create a generic table, you need to provide the corresponding fields as described in [What is a Generic Table](#what-is-a-generic-table). - -The REST API for creating a generic Table is `POST /polaris/v1/{prefix}/namespaces/{namespace}/generic-tables`, and the -request body looks like the following: - -```json -{ - "name": "", - "format": "", - "base-location": "", - "doc": "", - "properties": { - "": "" - } -} -``` - -Here is an example to create a generic table with name `delta_table` and format as `delta` under a namespace `delta_ns` -for catalog `delta_catalog` using curl: - -```shell -curl -X POST http://localhost:8181/api/catalog/polaris/v1/delta_catalog/namespaces/delta_ns/generic-tables \ - -H "Content-Type: application/json" \ - -d '{ - "name": "delta_table", - "format": "delta", - "base-location": "s3:///path/to/table", - "doc": "delta table example", - "properties": { - "key1": "value1" - } - }' -``` - -### Load a Generic Table -The REST endpoint for load a generic table is `GET /polaris/v1/{prefix}/namespaces/{namespace}/generic-tables/{generic-table}`. - -Here is an example to load the table `delta_table` using curl: -```shell -curl -X GET http://localhost:8181/api/catalog/polaris/v1/delta_catalog/namespaces/delta_ns/generic-tables/delta_table -``` -And the response looks like the following: -```json -{ - "table": { - "name": "delta_table", - "format": "delta", - "base-location": "s3:///path/to/table", - "doc": "delta table example", - "properties": { - "key1": "value1" - } - } -} -``` - -### List Generic Tables -The REST endpoint for listing the generic tables under a given -namespace is `GET /polaris/v1/{prefix}/namespaces/{namespace}/generic-tables/`. - -Following curl command lists all tables under namespace delta_namespace: -```shell -curl -X GET http://localhost:8181/api/catalog/polaris/v1/delta_catalog/namespaces/delta_ns/generic-tables/ -``` -Example Response: -```json -{ - "identifiers": [ - { - "namespace": ["delta_ns"], - "name": "delta_table" - } - ], - "next-page-token": null -} -``` - -### Drop a Generic Table -The drop generic table REST endpoint is `DELETE /polaris/v1/{prefix}/namespaces/{namespace}/generic-tables/{generic-table}` - -The following curl call drops the table `delat_table`: -```shell -curl -X DELETE http://localhost:8181/api/catalog/polaris/v1/delta_catalog/namespaces/delta_ns/generic-tables/{generic-table} -``` - -### API Reference - -For the complete and up-to-date API specification, see the [Catalog API Spec](https://editor-next.swagger.io/?url=https://raw.githubusercontent.com/apache/polaris/refs/heads/main/spec/generated/bundled-polaris-catalog-service.yaml). - -## Limitations - -Current limitations of Generic Table support: -1) Limited spec information. Currently, there is no spec for information like Schema, Partition etc. -2) No commit coordination or update capability provided at the catalog service level. - -Therefore, the catalog itself is unaware of anything about the underlying table except some of the loosely defined metadata. -It is the responsibility of the engine (and plugins used by the engine) to determine exactly how loading or commiting data -should look like based on the metadata. For example, with the delta support, th delta log serialization, deserialization -and update all happens at client side. diff --git a/1.1.0/getting-started/_index.md b/1.1.0/getting-started/_index.md deleted file mode 100644 index b7e926da9e..0000000000 --- a/1.1.0/getting-started/_index.md +++ /dev/null @@ -1,23 +0,0 @@ ---- -# -# Licensed to the Apache Software Foundation (ASF) under one -# or more contributor license agreements. See the NOTICE file -# distributed with this work for additional information -# regarding copyright ownership. The ASF licenses this file -# to you under the Apache License, Version 2.0 (the -# "License"); you may not use this file except in compliance -# with the License. You may obtain a copy of the License at -# -# http://www.apache.org/licenses/LICENSE-2.0 -# -# Unless required by applicable law or agreed to in writing, -# software distributed under the License is distributed on an -# "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY -# KIND, either express or implied. See the License for the -# specific language governing permissions and limitations -# under the License. -# -title: 'Getting Started' -type: docs -weight: 101 ---- diff --git a/1.1.0/getting-started/deploying-polaris/_index.md b/1.1.0/getting-started/deploying-polaris/_index.md deleted file mode 100644 index c6b293d29f..0000000000 --- a/1.1.0/getting-started/deploying-polaris/_index.md +++ /dev/null @@ -1,27 +0,0 @@ ---- -# -# Licensed to the Apache Software Foundation (ASF) under one -# or more contributor license agreements. See the NOTICE file -# distributed with this work for additional information -# regarding copyright ownership. The ASF licenses this file -# to you under the Apache License, Version 2.0 (the -# "License"); you may not use this file except in compliance -# with the License. You may obtain a copy of the License at -# -# http://www.apache.org/licenses/LICENSE-2.0 -# -# Unless required by applicable law or agreed to in writing, -# software distributed under the License is distributed on an -# "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY -# KIND, either express or implied. See the License for the -# specific language governing permissions and limitations -# under the License. -# -Title: Deploying Polaris on Cloud Providers -type: docs -weight: 300 ---- - -We will now demonstrate how to deploy Polaris locally, as well as with all supported Cloud Providers: Amazon Web Services (AWS), Azure, and Google Cloud Platform (GCP). - -Locally, Polaris can be deployed using both Docker and local build. On the cloud, this tutorial will deploy Polaris using Docker only - but local builds can also be executed. diff --git a/1.1.0/getting-started/deploying-polaris/quickstart-deploy-aws.md b/1.1.0/getting-started/deploying-polaris/quickstart-deploy-aws.md deleted file mode 100644 index 832cc67bcf..0000000000 --- a/1.1.0/getting-started/deploying-polaris/quickstart-deploy-aws.md +++ /dev/null @@ -1,57 +0,0 @@ ---- -# -# Licensed to the Apache Software Foundation (ASF) under one -# or more contributor license agreements. See the NOTICE file -# distributed with this work for additional information -# regarding copyright ownership. The ASF licenses this file -# to you under the Apache License, Version 2.0 (the -# "License"); you may not use this file except in compliance -# with the License. You may obtain a copy of the License at -# -# http://www.apache.org/licenses/LICENSE-2.0 -# -# Unless required by applicable law or agreed to in writing, -# software distributed under the License is distributed on an -# "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY -# KIND, either express or implied. See the License for the -# specific language governing permissions and limitations -# under the License. -# -Title: Deploying Polaris on Amazon Web Services (AWS) -type: docs -weight: 310 ---- - -Build and launch Polaris using the AWS Startup Script at the location provided in the command below. This script will start an [Amazon RDS for PostgreSQL](https://aws.amazon.com/rds/postgresql/) instance, which will be used as the backend Postgres instance holding all Polaris data. -Additionally, Polaris will be bootstrapped to use this database and Docker containers will be spun up for Spark SQL and Trino. - -The requirements to run the script below are: -* There must be at least two subnets created in the VPC and region in which your EC2 instance reside. The span of subnets MUST include at least 2 availability zones (AZs) within the same region. -* Your EC2 instance must be enabled with [IMDSv1 or IMDSv2 with 2+ hop limit](https://docs.aws.amazon.com/AWSEC2/latest/UserGuide/configuring-IMDS-new-instances.html#configure-IMDS-new-instances-instance-settings). -* The AWS identity that you will use to run this script must have the following AWS permissions: - * "ec2:DescribeInstances" - * "rds:CreateDBInstance" - * "rds:DescribeDBInstances" - * "rds:CreateDBSubnetGroup" - * "sts:AssumeRole" on the same role as the Instance Profile role of the EC2 instance on which you are running this script. Additionally, you should ensure that the Instance Profile contains a trust policy that allows the role to trust itself to be assumed. - -```shell -chmod +x getting-started/assets/cloud_providers/deploy-aws.sh -export ASSETS_PATH=$(pwd)/getting-started/assets/ -export CLIENT_ID=root -export CLIENT_SECRET=s3cr3t -./getting-started/assets/cloud_providers/deploy-aws.sh -``` - -## Next Steps -Congrats, you now have a running instance of1 Polaris! For details on how to use Polaris, check out the [Using Polaris]({{% relref "../using-polaris" %}}) page. - -## Cleanup Instructions -To shut down the Polaris server, run the following commands: - -```shell -export ASSETS_PATH=$(pwd)/getting-started/assets/ -docker compose -p polaris -f getting-started/eclipselink/docker-compose.yml down -``` - -To deploy Polaris in a production setting, please review further recommendations at the [Configuring Polaris for Production]({{% relref "../../configuring-polaris-for-production" %}}) page. diff --git a/1.1.0/getting-started/deploying-polaris/quickstart-deploy-azure.md b/1.1.0/getting-started/deploying-polaris/quickstart-deploy-azure.md deleted file mode 100644 index da60198b0f..0000000000 --- a/1.1.0/getting-started/deploying-polaris/quickstart-deploy-azure.md +++ /dev/null @@ -1,52 +0,0 @@ ---- -# -# Licensed to the Apache Software Foundation (ASF) under one -# or more contributor license agreements. See the NOTICE file -# distributed with this work for additional information -# regarding copyright ownership. The ASF licenses this file -# to you under the Apache License, Version 2.0 (the -# "License"); you may not use this file except in compliance -# with the License. You may obtain a copy of the License at -# -# http://www.apache.org/licenses/LICENSE-2.0 -# -# Unless required by applicable law or agreed to in writing, -# software distributed under the License is distributed on an -# "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY -# KIND, either express or implied. See the License for the -# specific language governing permissions and limitations -# under the License. -# -Title: Deploying Polaris on Azure -type: docs -weight: 320 ---- - -Build and launch Polaris using the AWS Startup Script at the location provided in the command below. This script will start an [Azure Database for PostgreSQL - Flexible Server](https://learn.microsoft.com/en-us/azure/postgresql/flexible-server/overview) instance, which will be used as the backend Postgres instance holding all Polaris data. -Additionally, Polaris will be bootstrapped to use this database and Docker containers will be spun up for Spark SQL and Trino. - -The requirements to run the script below are: -* Install the AZ CLI, if it is not already installed on the Azure VM. Instructions to download the AZ CLI can be found [here](https://learn.microsoft.com/en-us/cli/azure/install-azure-cli). -* You must be logged into the AZ CLI. Please run `az account show` to ensure that you are logged in prior to running this script. -* Assign a System-Assigned Managed Identity to the Azure VM. - -```shell -chmod +x getting-started/assets/cloud_providers/deploy-azure.sh -export ASSETS_PATH=$(pwd)/getting-started/assets/ -export CLIENT_ID=root -export CLIENT_SECRET=s3cr3t -./getting-started/assets/cloud_providers/deploy-azure.sh -``` - -## Next Steps -Congrats, you now have a running instance of Polaris! For further information regarding how to use Polaris, check out the [Using Polaris]({{% relref "../using-polaris" %}}) page. - -## Cleanup Instructions -To shut down the Polaris server, run the following commands: - -```shell -export ASSETS_PATH=$(pwd)/getting-started/assets/ -docker compose -p polaris -f getting-started/eclipselink/docker-compose.yml down -``` - -To deploy Polaris in a production setting, please review further recommendations at the [Configuring Polaris for Production]({{% relref "../../configuring-polaris-for-production" %}}) page. diff --git a/1.1.0/getting-started/deploying-polaris/quickstart-deploy-gcp.md b/1.1.0/getting-started/deploying-polaris/quickstart-deploy-gcp.md deleted file mode 100644 index 30c99c61ba..0000000000 --- a/1.1.0/getting-started/deploying-polaris/quickstart-deploy-gcp.md +++ /dev/null @@ -1,52 +0,0 @@ ---- -# -# Licensed to the Apache Software Foundation (ASF) under one -# or more contributor license agreements. See the NOTICE file -# distributed with this work for additional information -# regarding copyright ownership. The ASF licenses this file -# to you under the Apache License, Version 2.0 (the -# "License"); you may not use this file except in compliance -# with the License. You may obtain a copy of the License at -# -# http://www.apache.org/licenses/LICENSE-2.0 -# -# Unless required by applicable law or agreed to in writing, -# software distributed under the License is distributed on an -# "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY -# KIND, either express or implied. See the License for the -# specific language governing permissions and limitations -# under the License. -# -Title: Deploying Polaris on Google Cloud Platform (GCP) -type: docs -weight: 330 ---- - -Build and launch Polaris using the AWS Startup Script at the location provided in the command below. This script will start a [Cloud SQL for PostgreSQL](https://cloud.google.com/sql/docs/postgres) instance, which will be used as the backend Postgres instance holding all Polaris data. -Additionally, Polaris will be bootstrapped to use this database and Docker containers will be spun up for Spark SQL and Trino. - -The requirements to run the script below are: -* Install the `gcloud` CLI, if it is not already installed on the GCP VM. Instructions to download the `gcloud` CLI can be found [here](https://cloud.google.com/sdk/docs/install). -* Ensure the `Cloud SQL Admin API` has been enabled in your project and that your VM's Principal has access to the correct role: `roles/cloudsql.admin`. -* Ensure the VM's Principal has access to at least Read-only scope on Compute Engine: `compute.readonly`. - -```shell -chmod +x getting-started/assets/cloud_providers/deploy-gcp.sh -export ASSETS_PATH=$(pwd)/getting-started/assets/ -export CLIENT_ID=root -export CLIENT_SECRET=s3cr3t -./getting-started/assets/cloud_providers/deploy-gcp.sh -``` - -## Next Steps -Congrats, you now have a running instance of Polaris! For further information regarding how to use Polaris, check out the [Using Polaris]({{% relref "../using-polaris" %}}) page. - -## Cleanup Instructions -To shut down the Polaris server, run the following commands: - -```shell -export ASSETS_PATH=$(pwd)/getting-started/assets/ -docker compose -p polaris -f getting-started/eclipselink/docker-compose.yml down -``` - -To deploy Polaris in a production setting, please review further recommendations at the [Configuring Polaris for Production]({{% relref "../../configuring-polaris-for-production" %}}) page. diff --git a/1.1.0/getting-started/install-dependencies.md b/1.1.0/getting-started/install-dependencies.md deleted file mode 100644 index 66640104d4..0000000000 --- a/1.1.0/getting-started/install-dependencies.md +++ /dev/null @@ -1,120 +0,0 @@ ---- -# -# Licensed to the Apache Software Foundation (ASF) under one -# or more contributor license agreements. See the NOTICE file -# distributed with this work for additional information -# regarding copyright ownership. The ASF licenses this file -# to you under the Apache License, Version 2.0 (the -# "License"); you may not use this file except in compliance -# with the License. You may obtain a copy of the License at -# -# http://www.apache.org/licenses/LICENSE-2.0 -# -# Unless required by applicable law or agreed to in writing, -# software distributed under the License is distributed on an -# "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY -# KIND, either express or implied. See the License for the -# specific language governing permissions and limitations -# under the License. -# -Title: Installing Dependencies -type: docs -weight: 100 ---- - -This guide serves as an introduction to several key entities that can be managed with Apache Polaris (Incubating), describes how to build and deploy Polaris locally, and finally includes examples of how to use Polaris with Apache Spark™. - -# Prerequisites - -This guide covers building Polaris, deploying it locally or via [Docker](https://www.docker.com/), and interacting with it using the command-line interface and [Apache Spark](https://spark.apache.org/). Before proceeding with Polaris, be sure to satisfy the relevant prerequisites listed here. - -## Git - -To get the latest Polaris code, you'll need to clone the repository using [git](https://git-scm.com/). You can install git using [homebrew](https://brew.sh/) on MacOS: - -```shell -brew install git -``` - -Please follow instructions from the [Git Documentation](https://git-scm.com/book/en/v2/Getting-Started-Installing-Git) for instructions on installing Git on other platforms. - -Then, use git to clone the Polaris repo: - -```shell -git clone https://github.com/apache/polaris.git ~/polaris -``` - -## Docker - -It is recommended to deploy Polaris inside [Docker](https://www.docker.com/) for the Quickstart workflow. Instructions for deploying the Quickstart workflow on the supported Cloud Providers (AWS, Azure, GCP) will be provided only with Docker. However, non-Docker deployment instructions for local deployments can also be followed on Cloud Providers. - -Instructions to install Docker can be found on the [Docker website](https://docs.docker.com/engine/install/). Ensure that Docker and the Docker Compose plugin are both installed. - -### Docker on MacOS -Docker can be installed using [homebrew](https://brew.sh/): - -```shell -brew install --cask docker -``` - -There could be a [Docker permission issues](https://github.com/apache/polaris/pull/971) related to seccomp configuration. To resolve these issues, set the `seccomp` profile to "unconfined" when running a container. For example: - -```shell -docker run --security-opt seccomp=unconfined apache/polaris:latest -``` - -Note: Setting the seccomp profile to "unconfined" disables the default system call filtering, which may pose security risks. Use this configuration with caution, especially in production environments. - -### Docker on Amazon Linux -Docker can be installed using a modification to the CentOS instructions. For example: - -```shell -sudo dnf update -y -# Remove old version -sudo dnf remove -y docker docker-client docker-client-latest docker-common docker-latest docker-latest-logrotate docker-logrotate docker-engine -# Install dnf plugin -sudo dnf -y install dnf-plugins-core -# Add CentOS repository -sudo dnf config-manager --add-repo https://download.docker.com/linux/centos/docker-ce.repo -# Adjust release server version in the path as it will not match with Amazon Linux 2023 -sudo sed -i 's/$releasever/9/g' /etc/yum.repos.d/docker-ce.repo -# Install as usual -sudo dnf -y install docker-ce docker-ce-cli containerd.io docker-buildx-plugin docker-compose-plugin -``` - -### Confirm Docker Installation - -Once installed, make sure that both Docker and the Docker Compose plugin are installed: - -```shell -docker version -docker compose version -``` - -Also make sure Docker is running and is able to run a sample Docker container: - -```shell -docker run hello-world -``` - -## Java - -If you plan to build Polaris from source yourself or using this tutorial's instructions on a Cloud Provider, you will need to satisfy a few prerequisites first. - -Polaris is built using [gradle](https://gradle.org/) and is compatible with Java 21. We recommend the use of [jenv](https://www.jenv.be/) to manage multiple Java versions. For example, to install Java 21 via [homebrew](https://brew.sh/) and configure it with jenv: - -```shell -cd ~/polaris -brew install openjdk@21 jenv -jenv add $(brew --prefix openjdk@21) -jenv local 21 -``` - -Ensure that `java --version` and `javac` both return non-zero responses. - -## jq - -Most Polaris Quickstart scripts require [jq]((https://jqlang.org/download/)). You can install jq using [homebrew](https://brew.sh/): -```shell -brew install jq -``` diff --git a/1.1.0/getting-started/minio.md b/1.1.0/getting-started/minio.md deleted file mode 100644 index 3eda7db622..0000000000 --- a/1.1.0/getting-started/minio.md +++ /dev/null @@ -1,115 +0,0 @@ ---- -# -# Licensed to the Apache Software Foundation (ASF) under one -# or more contributor license agreements. See the NOTICE file -# distributed with this work for additional information -# regarding copyright ownership. The ASF licenses this file -# to you under the Apache License, Version 2.0 (the -# "License"); you may not use this file except in compliance -# with the License. You may obtain a copy of the License at -# -# http://www.apache.org/licenses/LICENSE-2.0 -# -# Unless required by applicable law or agreed to in writing, -# software distributed under the License is distributed on an -# "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY -# KIND, either express or implied. See the License for the -# specific language governing permissions and limitations -# under the License. -# -Title: Deploying Polaris on MinIO -type: docs -weight: 350 ---- - -In this guide we walk through setting up a simple Polaris Server with local [MinIO](https://www.min.io/) storage. - -Similar configurations are expected to work with other S3-compatible systems that also have the -[STS](https://docs.aws.amazon.com/STS/latest/APIReference/welcome.html) API. - -# Setup - -Clone the Polaris source repository, then build a docker image for Polaris. - -```shell -./gradlew :polaris-server:assemble -Dquarkus.container-image.build=true -``` - -Start MinIO with Polaris using the `docker compose` example. - -```shell -docker compose -f getting-started/minio/docker-compose.yml up -``` - -The compose script will start MinIO on default ports (API on 9000, UI on 9001) -plus a Polaris Server pre-configured to that MinIO instance. - -In this example the `root` principal has its password set to `s3cr3t`. - -# Connecting from Spark - -Start Spark. - -```shell -export AWS_REGION=us-west-2 - -bin/spark-sql \ - --packages org.apache.iceberg:iceberg-spark-runtime-3.5_2.12:1.9.0,org.apache.iceberg:iceberg-aws-bundle:1.9.0 \ - --conf spark.sql.extensions=org.apache.iceberg.spark.extensions.IcebergSparkSessionExtensions \ - --conf spark.sql.catalog.polaris=org.apache.iceberg.spark.SparkCatalog \ - --conf spark.sql.catalog.polaris.type=rest \ - --conf spark.sql.catalog.polaris.uri=http://localhost:8181/api/catalog \ - --conf spark.sql.catalog.polaris.token-refresh-enabled=false \ - --conf spark.sql.catalog.polaris.warehouse=quickstart_catalog \ - --conf spark.sql.catalog.polaris.scope=PRINCIPAL_ROLE:ALL \ - --conf spark.sql.catalog.polaris.header.X-Iceberg-Access-Delegation=vended-credentials \ - --conf spark.sql.catalog.polaris.credential=root:s3cr3t -``` - -Note: `AWS_REGION` is required by the AWS SDK used by Spark, but the value is irrelevant in this case. - -Create a table in Spark. - -```sql -use polaris; -create namespace ns; -create table ns.t1 as select 'abc'; -select * from ns.t1; -``` - -# Connecting from MinIO client - -```shell -mc alias set pol http://localhost:9000 minio_root m1n1opwd -mc ls pol/bucket123/ns/t1 -[2025-08-13 18:52:38 EDT] 0B data/ -[2025-08-13 18:52:38 EDT] 0B metadata/ -``` - -Note: the values of `minio_root`, `m1n1opwd` and `bucket123` are defined in the docker compose file. - -# Notes on Storage Configuation - -In this example the Polaris Catalog is defined as (excluding uninteresting properties): - -```json - { - "name": "quickstart_catalog", - "storageConfigInfo": { - "endpoint": "http://localhost:9000", - "endpointInternal": "http://minio:9000", - "pathStyleAccess": true, - "storageType": "S3", - "allowedLocations": [ - "s3://bucket123" - ] - } - } -``` - -Note that the `roleArn` parameter, which is required for AWS storage, does not need to be set for MinIO. - -Note the two endpoint values. `endpointInternal` is used by the Polaris Server, while `endpoint` is communicated -to clients (such as Spark) in Iceberg REST API responses. This distinction allows the system to work smoothly -when the clients and the server have different views of the network (in this example the host name `minio` is -resolvable only inside the docker compose environment). \ No newline at end of file diff --git a/1.1.0/getting-started/quickstart.md b/1.1.0/getting-started/quickstart.md deleted file mode 100644 index 6d92c1635a..0000000000 --- a/1.1.0/getting-started/quickstart.md +++ /dev/null @@ -1,116 +0,0 @@ ---- -# -# Licensed to the Apache Software Foundation (ASF) under one -# or more contributor license agreements. See the NOTICE file -# distributed with this work for additional information -# regarding copyright ownership. The ASF licenses this file -# to you under the Apache License, Version 2.0 (the -# "License"); you may not use this file except in compliance -# with the License. You may obtain a copy of the License at -# -# http://www.apache.org/licenses/LICENSE-2.0 -# -# Unless required by applicable law or agreed to in writing, -# software distributed under the License is distributed on an -# "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY -# KIND, either express or implied. See the License for the -# specific language governing permissions and limitations -# under the License. -# -Title: Quickstart -type: docs -weight: 200 ---- - -Polaris can be deployed via a docker image or as a standalone process. Before starting, be sure that you've satisfied the relevant prerequisites detailed in the previous page. - -## Common Setup -Before running Polaris, ensure you have completed the following setup steps: - -1. **Build Polaris** -```shell -cd ~/polaris -./gradlew \ - :polaris-server:assemble \ - :polaris-server:quarkusAppPartsBuild --rerun \ - :polaris-admin:assemble \ - :polaris-admin:quarkusAppPartsBuild --rerun \ - -Dquarkus.container-image.build=true -``` -- **For standalone**: Omit the `-Dquarkus.container-image.tag` and `-Dquarkus.container-image.build` options if you do not need to build a Docker image. - -## Running Polaris with Docker - -To start using Polaris in Docker and launch Polaris, which is packaged with a Postgres instance, Apache Spark, and Trino. - -```shell -export ASSETS_PATH=$(pwd)/getting-started/assets/ -export QUARKUS_DATASOURCE_JDBC_URL=jdbc:postgresql://postgres:5432/POLARIS -export QUARKUS_DATASOURCE_USERNAME=postgres -export QUARKUS_DATASOURCE_PASSWORD=postgres -export CLIENT_ID=root -export CLIENT_SECRET=s3cr3t -docker compose -p polaris -f getting-started/assets/postgres/docker-compose-postgres.yml \ - -f getting-started/jdbc/docker-compose-bootstrap-db.yml \ - -f getting-started/jdbc/docker-compose.yml up -d -``` - -You should see output for some time as Polaris, Spark, and Trino build and start up. Eventually, you won’t see any more logs and see some logs relating to Spark, resembling the following: - -``` -spark-sql-1 | Spark Web UI available at http://8bc4de8ed854:4040 -spark-sql-1 | Spark master: local[*], Application Id: local-1743745174604 -spark-sql-1 | 25/04/04 05:39:38 WARN SparkSQLCLIDriver: WARNING: Directory for Hive history file: /home/spark does not exist. History will not be available during this session. -spark-sql-1 | 25/04/04 05:39:39 WARN RESTSessionCatalog: Iceberg REST client is missing the OAuth2 server URI configuration and defaults to http://polaris:8181/api/catalogv1/oauth/tokens. This automatic fallback will be removed in a future Iceberg release.It is recommended to configure the OAuth2 endpoint using the 'oauth2-server-uri' property to be prepared. This warning will disappear if the OAuth2 endpoint is explicitly configured. See https://github.com/apache/iceberg/issues/10537 -``` - -The Docker image pre-configures a sample catalog called `quickstart_catalog` that uses a local file system. - -## Running Polaris as a Standalone Process - -You can also start Polaris through Gradle (packaged within the Polaris repository): - -1. **Start the Server** - -Run the following command to start Polaris: - -```shell -./gradlew run -``` - -You should see output for some time as Polaris builds and starts up. Eventually, you won’t see any more logs and should see messages that resemble the following: - -``` -INFO [io.quarkus] [,] [,,,] (main) Apache Polaris Server (incubating) on JVM (powered by Quarkus ) started in 1.911s. Listening on: http://0.0.0.0:8181. Management interface listening on http://0.0.0.0:8182. -INFO [io.quarkus] [,] [,,,] (main) Profile prod activated. -INFO [io.quarkus] [,] [,,,] (main) Installed features: [...] -``` - -At this point, Polaris is running. - -When using a Gradle-launched Polaris instance in this tutorial, we'll launch an instance of Polaris that stores entities only in-memory. This means that any entities that you define will be destroyed when Polaris is shut down. -For more information on how to configure Polaris for production usage, see the [docs]({{% relref "../configuring-polaris-for-production" %}}). - -When Polaris is run using the `./gradlew run` command, the root principal credentials are `root` and `s3cr3t` for the `CLIENT_ID` and `CLIENT_SECRET`, respectively. - -### Installing Apache Spark and Trino Locally for Testing - -#### Apache Spark - -If you want to connect to Polaris with [Apache Spark](https://spark.apache.org/), you'll need to start by cloning Spark. As in the [prerequisites]({{% ref "install-dependencies#git" %}}), make sure [git](https://git-scm.com/) is installed first. - -Then, clone Spark and check out a versioned branch. This guide uses [Spark 3.5](https://spark.apache.org/releases/spark-release-3-5-0.html). - -```shell -git clone --branch branch-3.5 https://github.com/apache/spark.git ~/spark -``` - -#### Trino -If you want to connect to Polaris with [Trino](https://trino.io/), it is recommended to set up a test instance of Trino using Docker. As in the [prerequisites]({{% ref "install-dependencies#docker" %}}), make sure [Docker](https://www.docker.com/) is installed first - -```shell -docker run --name trino -d -p 8080:8080 trinodb/trino -``` - -## Next Steps -Congrats, you now have a running instance of Polaris! For further information regarding how to use Polaris, check out the [Using Polaris]({{% ref "using-polaris" %}}) page. diff --git a/1.1.0/getting-started/using-polaris.md b/1.1.0/getting-started/using-polaris.md deleted file mode 100644 index 5ce5c3c0d3..0000000000 --- a/1.1.0/getting-started/using-polaris.md +++ /dev/null @@ -1,362 +0,0 @@ ---- -# -# Licensed to the Apache Software Foundation (ASF) under one -# or more contributor license agreements. See the NOTICE file -# distributed with this work for additional information -# regarding copyright ownership. The ASF licenses this file -# to you under the Apache License, Version 2.0 (the -# "License"); you may not use this file except in compliance -# with the License. You may obtain a copy of the License at -# -# http://www.apache.org/licenses/LICENSE-2.0 -# -# Unless required by applicable law or agreed to in writing, -# software distributed under the License is distributed on an -# "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY -# KIND, either express or implied. See the License for the -# specific language governing permissions and limitations -# under the License. -# -Title: Using Polaris -type: docs -weight: 400 ---- - -## Setup - -Ensure your `CLIENT_ID` & `CLIENT_SECRET` variables are already defined, as they were required for starting the Polaris server earlier. - -```shell -export CLIENT_ID=YOUR_CLIENT_ID -export CLIENT_SECRET=YOUR_CLIENT_SECRET -``` - -## Defining a Catalog - -In Polaris, the [catalog]({{% relref "../entities#catalog" %}}) is the top-level entity that objects like [tables]({{% relref "../entities#table" %}}) and [views]({{% relref "../entities#view" %}}) are organized under. With a Polaris service running, you can create a catalog like so: - -```shell -cd ~/polaris - -./polaris \ - --client-id ${CLIENT_ID} \ - --client-secret ${CLIENT_SECRET} \ - catalogs \ - create \ - --storage-type s3 \ - --default-base-location ${DEFAULT_BASE_LOCATION} \ - --role-arn ${ROLE_ARN} \ - quickstart_catalog -``` - -This will create a new catalog called **quickstart_catalog**. If you are using one of the Getting Started locally-built Docker images, we have already created a catalog named `quickstart_catalog` for you. - -The `DEFAULT_BASE_LOCATION` you provide will be the default location that objects in this catalog should be stored in, and the `ROLE_ARN` you provide should be a [Role ARN](https://docs.aws.amazon.com/IAM/latest/UserGuide/reference-arns.html) with access to read and write data in that location. These credentials will be provided to engines reading data from the catalog once they have authenticated with Polaris using credentials that have access to those resources. - -If you’re using a storage type other than S3, such as Azure, you’ll provide a different type of credential than a Role ARN. For more details on supported storage types, see the [docs]({{% relref "../entities#storage-type" %}}). - -Additionally, if Polaris is running somewhere other than `localhost:8181`, you can specify the correct hostname and port by providing `--host` and `--port` flags. For the full set of options supported by the CLI, please refer to the [docs]({{% relref "../command-line-interface" %}}). - - -### Creating a Principal and Assigning it Privileges - -With a catalog created, we can create a [principal]({{% relref "../entities#principal" %}}) that has access to manage that catalog. For details on how to configure the Polaris CLI, see [the section above](#defining-a-catalog) or refer to the [docs]({{% relref "../command-line-interface" %}}). - -```shell -./polaris \ - --client-id ${CLIENT_ID} \ - --client-secret ${CLIENT_SECRET} \ - principals \ - create \ - quickstart_user - -./polaris \ - --client-id ${CLIENT_ID} \ - --client-secret ${CLIENT_SECRET} \ - principal-roles \ - create \ - quickstart_user_role - -./polaris \ - --client-id ${CLIENT_ID} \ - --client-secret ${CLIENT_SECRET} \ - catalog-roles \ - create \ - --catalog quickstart_catalog \ - quickstart_catalog_role -``` - -Be sure to provide the necessary credentials, hostname, and port as before. - -When the `principals create` command completes successfully, it will return the credentials for this new principal. Export them for future use. For example: - -```shell -./polaris ... principals create example -{"clientId": "XXXX", "clientSecret": "YYYY"} -export USER_CLIENT_ID=XXXX -export USER_CLIENT_SECRET=YYYY -``` - -Now, we grant the principal the [principal role]({{% relref "../entities#principal-role" %}}) we created, and grant the [catalog role]({{% relref "../entities#catalog-role" %}}) the principal role we created. For more information on these entities, please refer to the linked documentation. - -```shell -./polaris \ - --client-id ${CLIENT_ID} \ - --client-secret ${CLIENT_SECRET} \ - principal-roles \ - grant \ - --principal quickstart_user \ - quickstart_user_role - -./polaris \ - --client-id ${CLIENT_ID} \ - --client-secret ${CLIENT_SECRET} \ - catalog-roles \ - grant \ - --catalog quickstart_catalog \ - --principal-role quickstart_user_role \ - quickstart_catalog_role -``` - -Now, we’ve linked our principal to the catalog via roles like so: - -![Principal to Catalog](/img/quickstart/privilege-illustration-1.png "Principal to Catalog") - -In order to give this principal the ability to interact with the catalog, we must assign some [privileges]({{% relref "../entities#privilege" %}}). For the time being, we will give this principal the ability to fully manage content in our new catalog. We can do this with the CLI like so: - -```shell -./polaris \ - --client-id ${CLIENT_ID} \ - --client-secret ${CLIENT_SECRET} \ - privileges \ - catalog \ - grant \ - --catalog quickstart_catalog \ - --catalog-role quickstart_catalog_role \ - CATALOG_MANAGE_CONTENT -``` - -This grants the [catalog privileges]({{% relref "../entities#privilege" %}}) `CATALOG_MANAGE_CONTENT` to our catalog role, linking everything together like so: - -![Principal to Catalog with Catalog Role](/img/quickstart/privilege-illustration-2.png "Principal to Catalog with Catalog Role") - -`CATALOG_MANAGE_CONTENT` has create/list/read/write privileges on all entities within the catalog. The same privilege could be granted to a namespace, in which case the principal could create/list/read/write any entity under that namespace. - -## Using Iceberg & Polaris - -At this point, we’ve created a principal and granted it the ability to manage a catalog. We can now use an external engine to assume that principal, access our catalog, and store data in that catalog using [Apache Iceberg](https://iceberg.apache.org/). Polaris is compatible with any [Apache Iceberg](https://iceberg.apache.org/) client that supports the REST API. Depending on the client you plan to use, refer to the respective examples below. - -### Connecting with Spark - -#### Using a Local Build of Spark - -To use a Polaris-managed catalog in [Apache Spark](https://spark.apache.org/), we can configure Spark to use the Iceberg catalog REST API. - -This guide uses [Apache Spark 3.5](https://spark.apache.org/releases/spark-release-3-5-0.html), but be sure to find [the appropriate iceberg-spark package for your Spark version](https://mvnrepository.com/artifact/org.apache.iceberg/iceberg-spark). From a local Spark clone on the `branch-3.5` branch we can run the following: - -_Note: the credentials provided here are those for our principal, not the root credentials._ - -```shell -bin/spark-sql \ ---packages org.apache.iceberg:iceberg-spark-runtime-3.5_2.12:1.9.1,org.apache.iceberg:iceberg-aws-bundle:1.9.1 \ ---conf spark.sql.extensions=org.apache.iceberg.spark.extensions.IcebergSparkSessionExtensions \ ---conf spark.sql.catalog.quickstart_catalog.warehouse=quickstart_catalog \ ---conf spark.sql.catalog.quickstart_catalog.header.X-Iceberg-Access-Delegation=vended-credentials \ ---conf spark.sql.catalog.quickstart_catalog=org.apache.iceberg.spark.SparkCatalog \ ---conf spark.sql.catalog.quickstart_catalog.catalog-impl=org.apache.iceberg.rest.RESTCatalog \ ---conf spark.sql.catalog.quickstart_catalog.uri=http://localhost:8181/api/catalog \ ---conf spark.sql.catalog.quickstart_catalog.credential=${USER_CLIENT_ID}:${USER_CLIENT_SECRET} \ ---conf spark.sql.catalog.quickstart_catalog.scope='PRINCIPAL_ROLE:ALL' \ ---conf spark.sql.catalog.quickstart_catalog.token-refresh-enabled=true \ ---conf spark.sql.catalog.quickstart_catalog.client.region=us-west-2 -``` - -Similar to the CLI commands above, this configures Spark to use the Polaris running at `localhost:8181`. If your Polaris server is running elsewhere, but sure to update the configuration appropriately. - -Finally, note that we include the `iceberg-aws-bundle` package here. If your table is using a different filesystem, be sure to include the appropriate dependency. - -#### Using Spark SQL from a Docker container - -Refresh the Docker container with the user's credentials: -```shell -docker compose -p polaris -f getting-started/eclipselink/docker-compose.yml stop spark-sql -docker compose -p polaris -f getting-started/eclipselink/docker-compose.yml rm -f spark-sql -docker compose -p polaris -f getting-started/eclipselink/docker-compose.yml up -d --no-deps spark-sql -``` - -Attach to the running spark-sql container: - -```shell -docker attach $(docker ps -q --filter name=spark-sql) -``` - -#### Sample Commands - -Once the Spark session starts, we can create a namespace and table within the catalog: - -```sql -USE quickstart_catalog; -CREATE NAMESPACE IF NOT EXISTS quickstart_namespace; -CREATE NAMESPACE IF NOT EXISTS quickstart_namespace.schema; -USE NAMESPACE quickstart_namespace.schema; -CREATE TABLE IF NOT EXISTS quickstart_table (id BIGINT, data STRING) USING ICEBERG; -``` - -We can now use this table like any other: - -``` -INSERT INTO quickstart_table VALUES (1, 'some data'); -SELECT * FROM quickstart_table; -. . . -+---+---------+ -|id |data | -+---+---------+ -|1 |some data| -+---+---------+ -``` - -If at any time access is revoked... - -```shell -./polaris \ - --client-id ${CLIENT_ID} \ - --client-secret ${CLIENT_SECRET} \ - privileges \ - catalog \ - revoke \ - --catalog quickstart_catalog \ - --catalog-role quickstart_catalog_role \ - CATALOG_MANAGE_CONTENT -``` - -Spark will lose access to the table: - -``` -INSERT INTO quickstart_table VALUES (1, 'some data'); - -org.apache.iceberg.exceptions.ForbiddenException: Forbidden: Principal 'quickstart_user' with activated PrincipalRoles '[]' and activated grants via '[quickstart_catalog_role, quickstart_user_role]' is not authorized for op LOAD_TABLE_WITH_READ_DELEGATION -``` - -### Connecting with Trino - -Refresh the Docker container with the user's credentials: - -```shell -docker compose -p polaris -f getting-started/eclipselink/docker-compose.yml stop trino -docker compose -p polaris -f getting-started/eclipselink/docker-compose.yml rm -f trino -docker compose -p polaris -f getting-started/eclipselink/docker-compose.yml up -d --no-deps trino -``` - -Attach to the running Trino container: - -```shell -docker exec -it $(docker ps -q --filter name=trino) trino -``` - -You may not see Trino's prompt immediately, type ENTER to see it. A few commands that you can try: - -```sql -SHOW CATALOGS; -SHOW SCHEMAS FROM iceberg; -CREATE SCHEMA iceberg.quickstart_schema; -CREATE TABLE iceberg.quickstart_schema.quickstart_table AS SELECT 1 x; -SELECT * FROM iceberg.quickstart_schema.quickstart_table; -``` - -If at any time access is revoked... - -```shell -./polaris \ - --client-id ${CLIENT_ID} \ - --client-secret ${CLIENT_SECRET} \ - privileges \ - catalog \ - revoke \ - --catalog quickstart_catalog \ - --catalog-role quickstart_catalog_role \ - CATALOG_MANAGE_CONTENT -``` - -Trino will lose access to the table: - -```sql -SELECT * FROM iceberg.quickstart_schema.quickstart_table; - -org.apache.iceberg.exceptions.ForbiddenException: Forbidden: Principal 'quickstart_user' with activated PrincipalRoles '[]' and activated grants via '[quickstart_catalog_role, quickstart_user_role]' is not authorized for op LOAD_TABLE_WITH_READ_DELEGATION -``` - -### Connecting with PyIceberg - -#### Using Credentials - -```python -from pyiceberg.catalog import load_catalog - -catalog = load_catalog( - type='rest', - uri='http://localhost:8181/api/catalog', - warehouse='quickstart_catalog', - scope="PRINCIPAL_ROLE:ALL", - credential=f"{CLIENT_ID}:{CLIENT_SECRET}", -) -``` - -If the `load_catalog` function is used with credentials, then PyIceberg will automatically request an authorization token from the `v1/oauth/tokens` endpoint, and will later use this token to prove its identity to the Polaris Catalog. - -#### Using a Token - -```python -from pyiceberg.catalog import load_catalog -import requests - -# Step 1: Get OAuth token -response = requests.post( - "http://localhost:8181/api/catalog/v1/oauth/tokens", - auth =(CLIENT_ID, CLIENT_SECRET), - data = { - "grant_type": "client_credentials", - "scope": "PRINCIPAL_ROLE:ALL" - }) -token = response.json()["access_token"] - -# Step 2: Load the catalog using the token -catalog = load_catalog( - type='rest', - uri='http://localhost:8181/api/catalog', - warehouse='quickstart_catalog', - token=token, -) -``` - -It is possible to use `load_catalog` function by providing an authorization token directly. This method is useful when using an external identity provider (e.g. Google Identity). - -### Connecting Using REST APIs - -To access Polaris from the host machine, first request an access token: - -```shell -export POLARIS_TOKEN=$(curl -s http://polaris:8181/api/catalog/v1/oauth/tokens \ - --resolve polaris:8181:127.0.0.1 \ - --user ${CLIENT_ID}:${CLIENT_SECRET} \ - -d 'grant_type=client_credentials' \ - -d 'scope=PRINCIPAL_ROLE:ALL' | jq -r .access_token) -``` - -Then, use the access token in the Authorization header when accessing Polaris: - -```shell -curl -v http://127.0.0.1:8181/api/management/v1/principal-roles -H "Authorization: Bearer $POLARIS_TOKEN" -curl -v http://127.0.0.1:8181/api/management/v1/catalogs/quickstart_catalog -H "Authorization: Bearer $POLARIS_TOKEN" -``` - -## Next Steps -* Visit [Configuring Polaris for Production]({{% relref "../configuring-polaris-for-production" %}}). -* A Getting Started experience for using Spark with Jupyter Notebooks is documented [here](https://github.com/apache/polaris/blob/main/getting-started/spark/README.md). -* To shut down a locally-deployed Polaris server and clean up all related Docker containers, run the command listed below. Cloud Deployments have their respective termination commands on their Deployment page, while Polaris running on Gradle will terminate when the Gradle process terminates. -```shell -docker compose -p polaris \ - -f getting-started/assets/postgres/docker-compose-postgres.yml \ - -f getting-started/jdbc/docker-compose-bootstrap-db.yml \ - -f getting-started/jdbc/docker-compose.yml \ - down -``` diff --git a/1.1.0/helm.md b/1.1.0/helm.md deleted file mode 100644 index 843d8bbf47..0000000000 --- a/1.1.0/helm.md +++ /dev/null @@ -1,369 +0,0 @@ ---- -# -# Licensed to the Apache Software Foundation (ASF) under one -# or more contributor license agreements. See the NOTICE file -# distributed with this work for additional information -# regarding copyright ownership. The ASF licenses this file -# to you under the Apache License, Version 2.0 (the -# "License"); you may not use this file except in compliance -# with the License. You may obtain a copy of the License at -# -# http://www.apache.org/licenses/LICENSE-2.0 -# -# Unless required by applicable law or agreed to in writing, -# software distributed under the License is distributed on an -# "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY -# KIND, either express or implied. See the License for the -# specific language governing permissions and limitations -# under the License. -# -Title: Polaris Helm Chart -type: docs -weight: 675 ---- - - - -![Version: 1.1.0-incubating](https://img.shields.io/badge/Version-1.1.0--incubating-informational?style=flat-square) ![Type: application](https://img.shields.io/badge/Type-application-informational?style=flat-square) ![AppVersion: 1.1.0-incubating](https://img.shields.io/badge/AppVersion-1.1.0--incubating-informational?style=flat-square) - -A Helm chart for Apache Polaris (incubating). - -**Homepage:** - -## Source Code - -* - -## Installation - -### Running locally with a Minikube cluster - -The below instructions assume Minikube and Helm are installed. - -Start the Minikube cluster, build and load image into the Minikube cluster: - -```bash -minikube start -eval $(minikube docker-env) - -./gradlew \ - :polaris-server:assemble \ - :polaris-server:quarkusAppPartsBuild --rerun \ - :polaris-admin:assemble \ - :polaris-admin:quarkusAppPartsBuild --rerun \ - -Dquarkus.container-image.build=true -``` - -### Installing the chart locally - -The below instructions assume a local Kubernetes cluster is running and Helm is installed. - -#### Common setup - -Create the target namespace: -```bash -kubectl create namespace polaris -``` - -Create all the required resources in the `polaris` namespace. This usually includes a Postgres -database, Kubernetes secrets, and service accounts. The Polaris chart does not create -these resources automatically, as they are not required for all Polaris deployments. The chart will -fail if these resources are not created beforehand. You can find some examples in the -`helm/polaris/ci/fixtures` directory, but beware that these are primarily intended for tests. - -Below are two sample deployment models for installing the chart: one with a non-persistent backend and another with a persistent backend. - -> [!WARNING] -> The examples below use values files located in the `helm/polaris/ci` directory. -> **These files are intended for testing purposes primarily, and may not be suitable for production use**. -> For production deployments, create your own values files based on the provided examples. - -#### Non-persistent backend - -Install the chart with a non-persistent backend. From Polaris repo root: -```bash -helm upgrade --install --namespace polaris \ - polaris helm/polaris -``` - -#### Persistent backend - -> [!WARNING] -> The Postgres deployment set up in the fixtures directory is intended for testing purposes only and is not suitable for production use. For production deployments, use a managed Postgres service or a properly configured and secured Postgres instance. - -Install the chart with a persistent backend. From Polaris repo root: -```bash -helm upgrade --install --namespace polaris \ - --values helm/polaris/ci/persistence-values.yaml \ - polaris helm/polaris -kubectl wait --namespace polaris --for=condition=ready pod --selector=app.kubernetes.io/name=polaris --timeout=120s -``` - -To access Polaris and Postgres locally, set up port forwarding for both services (This is needed for bootstrap processes): -```bash -kubectl port-forward -n polaris $(kubectl get pod -n polaris -l app.kubernetes.io/name=polaris -o jsonpath='{.items[0].metadata.name}') 8181:8181 - -kubectl port-forward -n polaris $(kubectl get pod -n polaris -l app.kubernetes.io/name=postgres -o jsonpath='{.items[0].metadata.name}') 5432:5432 -``` - -Run the catalog bootstrap using the Polaris admin tool. This step initializes the catalog with the required configuration: -```bash -container_envs=$(kubectl exec -it -n polaris $(kubectl get pod -n polaris -l app.kubernetes.io/name=polaris -o jsonpath='{.items[0].metadata.name}') -- env) -export QUARKUS_DATASOURCE_USERNAME=$(echo "$container_envs" | grep quarkus.datasource.username | awk -F '=' '{print $2}' | tr -d '\n\r') -export QUARKUS_DATASOURCE_PASSWORD=$(echo "$container_envs" | grep quarkus.datasource.password | awk -F '=' '{print $2}' | tr -d '\n\r') -export QUARKUS_DATASOURCE_JDBC_URL=$(echo "$container_envs" | grep quarkus.datasource.jdbc.url | sed 's/postgres/localhost/2' | awk -F '=' '{print $2}' | tr -d '\n\r') - -java -jar runtime/admin/build/quarkus-app/quarkus-run.jar bootstrap -c POLARIS,root,pass -r POLARIS -``` - -### Uninstalling - -```bash -helm uninstall --namespace polaris polaris - -kubectl delete --namespace polaris -f helm/polaris/ci/fixtures/ - -kubectl delete namespace polaris -``` - -## Development & Testing - -This section is intended for developers who want to run the Polaris Helm chart tests. - -### Prerequisites - -The following tools are required to run the tests: - -* [Helm Unit Test](https://github.com/helm-unittest/helm-unittest) -* [Chart Testing](https://github.com/helm/chart-testing) - -Quick installation instructions for these tools: -```bash -helm plugin install https://github.com/helm-unittest/helm-unittest.git -brew install chart-testing -``` - -The integration tests also require some fixtures to be deployed. The `ci/fixtures` directory -contains the required resources. To deploy them, run the following command: -```bash -kubectl apply --namespace polaris -f helm/polaris/ci/fixtures/ -kubectl wait --namespace polaris --for=condition=ready pod --selector=app.kubernetes.io/name=postgres --timeout=120s -``` - -The `helm/polaris/ci` contains a number of values files that will be used to install the chart with -different configurations. - -### Running the unit tests - -Helm unit tests do not require a Kubernetes cluster. To run the unit tests, execute Helm Unit from -the Polaris repo root: -```bash -helm unittest helm/polaris -``` - -You can also lint the chart using the Chart Testing tool, with the following command: - -```bash -ct lint --charts helm/polaris -``` - -### Running the integration tests - -Integration tests require a Kubernetes cluster. See installation instructions above for setting up -a local cluster. - -Integration tests are run with the Chart Testing tool: -```bash -ct install --namespace polaris --charts ./helm/polaris -``` - -## Values - -| Key | Type | Default | Description | -|-----|------|---------|-------------| -| advancedConfig | object | `{}` | Advanced configuration. You can pass here any valid Polaris or Quarkus configuration property. Any property that is defined here takes precedence over all the other configuration values generated by this chart. Properties can be passed "flattened" or as nested YAML objects (see examples below). Note: values should be strings; avoid using numbers, booleans, or other types. | -| affinity | object | `{}` | Affinity and anti-affinity for polaris pods. See https://kubernetes.io/docs/concepts/scheduling-eviction/assign-pod-node/#affinity-and-anti-affinity. | -| authentication | object | `{"authenticator":{"type":"default"},"realmOverrides":{},"tokenBroker":{"maxTokenGeneration":"PT1H","secret":{"name":null,"privateKey":"private.pem","publicKey":"public.pem","rsaKeyPair":{"privateKey":"private.pem","publicKey":"public.pem"},"secretKey":"symmetric.pem","symmetricKey":{"secretKey":"symmetric.key"}},"type":"rsa-key-pair"},"tokenService":{"type":"default"},"type":"internal"}` | Polaris authentication configuration. | -| authentication.authenticator | object | `{"type":"default"}` | The `Authenticator` implementation to use. Only one built-in type is supported: default. | -| authentication.realmOverrides | object | `{}` | Authentication configuration overrides per realm. | -| authentication.tokenBroker | object | `{"maxTokenGeneration":"PT1H","secret":{"name":null,"privateKey":"private.pem","publicKey":"public.pem","rsaKeyPair":{"privateKey":"private.pem","publicKey":"public.pem"},"secretKey":"symmetric.pem","symmetricKey":{"secretKey":"symmetric.key"}},"type":"rsa-key-pair"}` | The `TokenBroker` implementation to use. Two built-in types are supported: rsa-key-pair and symmetric-key. Only relevant when using internal (or mixed) authentication. When using external authentication, the token broker is not used. | -| authentication.tokenBroker.maxTokenGeneration | string | `"PT1H"` | Maximum token generation duration (e.g., PT1H for 1 hour). | -| authentication.tokenBroker.secret | object | `{"name":null,"privateKey":"private.pem","publicKey":"public.pem","rsaKeyPair":{"privateKey":"private.pem","publicKey":"public.pem"},"secretKey":"symmetric.pem","symmetricKey":{"secretKey":"symmetric.key"}}` | The secret name to pull the public and private keys, or the symmetric key secret from. | -| authentication.tokenBroker.secret.name | string | `nil` | The name of the secret to pull the keys from. If not provided, a key pair will be generated. This is not recommended for production. | -| authentication.tokenBroker.secret.privateKey | string | `"private.pem"` | DEPRECATED: Use `authentication.tokenBroker.secret.rsaKeyPair.privateKey` instead. Key name inside the secret for the private key | -| authentication.tokenBroker.secret.publicKey | string | `"public.pem"` | DEPRECATED: Use `authentication.tokenBroker.secret.rsaKeyPair.publicKey` instead. Key name inside the secret for the public key | -| authentication.tokenBroker.secret.rsaKeyPair | object | `{"privateKey":"private.pem","publicKey":"public.pem"}` | Optional: configuration specific to RSA key pair secret. | -| authentication.tokenBroker.secret.rsaKeyPair.privateKey | string | `"private.pem"` | Key name inside the secret for the private key | -| authentication.tokenBroker.secret.rsaKeyPair.publicKey | string | `"public.pem"` | Key name inside the secret for the public key | -| authentication.tokenBroker.secret.secretKey | string | `"symmetric.pem"` | DEPRECATED: Use `authentication.tokenBroker.secret.symmetricKey.secretKey` instead. Key name inside the secret for the symmetric key | -| authentication.tokenBroker.secret.symmetricKey | object | `{"secretKey":"symmetric.key"}` | Optional: configuration specific to symmetric key secret. | -| authentication.tokenBroker.secret.symmetricKey.secretKey | string | `"symmetric.key"` | Key name inside the secret for the symmetric key | -| authentication.tokenService | object | `{"type":"default"}` | The token service (`IcebergRestOAuth2ApiService`) implementation to use. Two built-in types are supported: default and disabled. Only relevant when using internal (or mixed) authentication. When using external authentication, the token service is always disabled. | -| authentication.type | string | `"internal"` | The type of authentication to use. Three built-in types are supported: internal, external, and mixed. | -| autoscaling.enabled | bool | `false` | Specifies whether automatic horizontal scaling should be enabled. Do not enable this when using in-memory version store type. | -| autoscaling.maxReplicas | int | `3` | The maximum number of replicas to maintain. | -| autoscaling.minReplicas | int | `1` | The minimum number of replicas to maintain. | -| autoscaling.targetCPUUtilizationPercentage | int | `80` | Optional; set to zero or empty to disable. | -| autoscaling.targetMemoryUtilizationPercentage | string | `nil` | Optional; set to zero or empty to disable. | -| configMapLabels | object | `{}` | Additional Labels to apply to polaris configmap. | -| containerSecurityContext | object | `{"allowPrivilegeEscalation":false,"capabilities":{"drop":["ALL"]},"runAsNonRoot":true,"runAsUser":10000,"seccompProfile":{"type":"RuntimeDefault"}}` | Security context for the polaris container. See https://kubernetes.io/docs/tasks/configure-pod-container/security-context/. | -| containerSecurityContext.runAsUser | int | `10000` | UID 10000 is compatible with Polaris OSS default images; change this if you are using a different image. | -| cors | object | `{"accessControlAllowCredentials":null,"accessControlMaxAge":null,"allowedHeaders":[],"allowedMethods":[],"allowedOrigins":[],"exposedHeaders":[]}` | Polaris CORS configuration. | -| cors.accessControlAllowCredentials | string | `nil` | The `Access-Control-Allow-Credentials` response header. The value of this header will default to `true` if `allowedOrigins` property is set and there is a match with the precise `Origin` header. | -| cors.accessControlMaxAge | string | `nil` | The `Access-Control-Max-Age` response header value indicating how long the results of a pre-flight request can be cached. Must be a valid duration. | -| cors.allowedHeaders | list | `[]` | HTTP headers allowed for CORS, ex: X-Custom, Content-Disposition. If this is not set or empty, all requested headers are considered allowed. | -| cors.allowedMethods | list | `[]` | HTTP methods allowed for CORS, ex: GET, PUT, POST. If this is not set or empty, all requested methods are considered allowed. | -| cors.allowedOrigins | list | `[]` | Origins allowed for CORS, e.g. http://polaris.apache.org, http://localhost:8181. In case an entry of the list is surrounded by forward slashes, it is interpreted as a regular expression. | -| cors.exposedHeaders | list | `[]` | HTTP headers exposed to the client, ex: X-Custom, Content-Disposition. The default is an empty list. | -| extraEnv | list | `[]` | Advanced configuration via Environment Variables. Extra environment variables to add to the Polaris server container. You can pass here any valid EnvVar object: https://kubernetes.io/docs/reference/generated/kubernetes-api/v1.27/#envvar-v1-core This can be useful to get configuration values from Kubernetes secrets or config maps. | -| extraInitContainers | list | `[]` | Add additional init containers to the polaris pod(s) See https://kubernetes.io/docs/concepts/workloads/pods/init-containers/. | -| extraServices | list | `[]` | Additional service definitions. All service definitions always select all Polaris pods. Use this if you need to expose specific ports with different configurations, e.g. expose polaris-http with an alternate LoadBalancer service instead of ClusterIP. | -| extraVolumeMounts | list | `[]` | Extra volume mounts to add to the polaris container. See https://kubernetes.io/docs/concepts/storage/volumes/. | -| extraVolumes | list | `[]` | Extra volumes to add to the polaris pod. See https://kubernetes.io/docs/concepts/storage/volumes/. | -| features | object | `{"realmOverrides":{}}` | Polaris features configuration. | -| features.realmOverrides | object | `{}` | Features to enable or disable per realm. This field is a map of maps. The realm name is the key, and the value is a map of feature names to values. If a feature is not present in the map, the default value from the 'defaults' field is used. | -| fileIo | object | `{"type":"default"}` | Polaris FileIO configuration. | -| fileIo.type | string | `"default"` | The type of file IO to use. Two built-in types are supported: default and wasb. The wasb one translates WASB paths to ABFS ones. | -| image.configDir | string | `"/deployments/config"` | The path to the directory where the application.properties file, and other configuration files, if any, should be mounted. Note: if you are using EclipseLink, then this value must be at least two folders down to the root folder, e.g. `/deployments/config` is OK, whereas `/deployments` is not. | -| image.pullPolicy | string | `"IfNotPresent"` | The image pull policy. | -| image.repository | string | `"apache/polaris"` | The image repository to pull from. | -| image.tag | string | `"latest"` | The image tag. | -| imagePullSecrets | list | `[]` | References to secrets in the same namespace to use for pulling any of the images used by this chart. Each entry is a LocalObjectReference to an existing secret in the namespace. The secret must contain a .dockerconfigjson key with a base64-encoded Docker configuration file. See https://kubernetes.io/docs/tasks/configure-pod-container/pull-image-private-registry/ for more information. | -| ingress.annotations | object | `{}` | Annotations to add to the ingress. | -| ingress.className | string | `""` | Specifies the ingressClassName; leave empty if you don't want to customize it | -| ingress.enabled | bool | `false` | Specifies whether an ingress should be created. | -| ingress.hosts | list | `[{"host":"chart-example.local","paths":[]}]` | A list of host paths used to configure the ingress. | -| ingress.tls | list | `[]` | A list of TLS certificates; each entry has a list of hosts in the certificate, along with the secret name used to terminate TLS traffic on port 443. | -| livenessProbe | object | `{"failureThreshold":3,"initialDelaySeconds":5,"periodSeconds":10,"successThreshold":1,"terminationGracePeriodSeconds":30,"timeoutSeconds":10}` | Configures the liveness probe for polaris pods. | -| livenessProbe.failureThreshold | int | `3` | Minimum consecutive failures for the probe to be considered failed after having succeeded. Minimum value is 1. | -| livenessProbe.initialDelaySeconds | int | `5` | Number of seconds after the container has started before liveness probes are initiated. Minimum value is 0. | -| livenessProbe.periodSeconds | int | `10` | How often (in seconds) to perform the probe. Minimum value is 1. | -| livenessProbe.successThreshold | int | `1` | Minimum consecutive successes for the probe to be considered successful after having failed. Minimum value is 1. | -| livenessProbe.terminationGracePeriodSeconds | int | `30` | Optional duration in seconds the pod needs to terminate gracefully upon probe failure. Minimum value is 1. | -| livenessProbe.timeoutSeconds | int | `10` | Number of seconds after which the probe times out. Minimum value is 1. | -| logging | object | `{"categories":{"org.apache.iceberg.rest":"INFO","org.apache.polaris":"INFO"},"console":{"enabled":true,"format":"%d{yyyy-MM-dd HH:mm:ss,SSS} %-5p [%c{3.}] [%X{requestId},%X{realmId}] [%X{traceId},%X{parentId},%X{spanId},%X{sampled}] (%t) %s%e%n","json":false,"threshold":"ALL"},"file":{"enabled":false,"fileName":"polaris.log","format":"%d{yyyy-MM-dd HH:mm:ss,SSS} %-5p [%c{3.}] [%X{requestId},%X{realmId}] [%X{traceId},%X{parentId},%X{spanId},%X{sampled}] (%t) %s%e%n","json":false,"logsDir":"/deployments/logs","rotation":{"fileSuffix":null,"maxBackupIndex":5,"maxFileSize":"100Mi"},"storage":{"className":"standard","selectorLabels":{},"size":"512Gi"},"threshold":"ALL"},"level":"INFO","mdc":{},"requestIdHeaderName":"Polaris-Request-Id"}` | Logging configuration. | -| logging.categories | object | `{"org.apache.iceberg.rest":"INFO","org.apache.polaris":"INFO"}` | Configuration for specific log categories. | -| logging.console | object | `{"enabled":true,"format":"%d{yyyy-MM-dd HH:mm:ss,SSS} %-5p [%c{3.}] [%X{requestId},%X{realmId}] [%X{traceId},%X{parentId},%X{spanId},%X{sampled}] (%t) %s%e%n","json":false,"threshold":"ALL"}` | Configuration for the console appender. | -| logging.console.enabled | bool | `true` | Whether to enable the console appender. | -| logging.console.format | string | `"%d{yyyy-MM-dd HH:mm:ss,SSS} %-5p [%c{3.}] [%X{requestId},%X{realmId}] [%X{traceId},%X{parentId},%X{spanId},%X{sampled}] (%t) %s%e%n"` | The log format to use. Ignored if JSON format is enabled. See https://quarkus.io/guides/logging#logging-format for details. | -| logging.console.json | bool | `false` | Whether to log in JSON format. | -| logging.console.threshold | string | `"ALL"` | The log level of the console appender. | -| logging.file | object | `{"enabled":false,"fileName":"polaris.log","format":"%d{yyyy-MM-dd HH:mm:ss,SSS} %-5p [%c{3.}] [%X{requestId},%X{realmId}] [%X{traceId},%X{parentId},%X{spanId},%X{sampled}] (%t) %s%e%n","json":false,"logsDir":"/deployments/logs","rotation":{"fileSuffix":null,"maxBackupIndex":5,"maxFileSize":"100Mi"},"storage":{"className":"standard","selectorLabels":{},"size":"512Gi"},"threshold":"ALL"}` | Configuration for the file appender. | -| logging.file.enabled | bool | `false` | Whether to enable the file appender. | -| logging.file.fileName | string | `"polaris.log"` | The log file name. | -| logging.file.format | string | `"%d{yyyy-MM-dd HH:mm:ss,SSS} %-5p [%c{3.}] [%X{requestId},%X{realmId}] [%X{traceId},%X{parentId},%X{spanId},%X{sampled}] (%t) %s%e%n"` | The log format to use. Ignored if JSON format is enabled. See https://quarkus.io/guides/logging#logging-format for details. | -| logging.file.json | bool | `false` | Whether to log in JSON format. | -| logging.file.logsDir | string | `"/deployments/logs"` | The local directory where log files are stored. The persistent volume claim will be mounted here. | -| logging.file.rotation | object | `{"fileSuffix":null,"maxBackupIndex":5,"maxFileSize":"100Mi"}` | Log rotation configuration. | -| logging.file.rotation.fileSuffix | string | `nil` | An optional suffix to append to the rotated log files. If present, the rotated log files will be grouped in time buckets, and each bucket will contain at most maxBackupIndex files. The suffix must be in a date-time format that is understood by DateTimeFormatter. If the suffix ends with .gz or .zip, the rotated files will also be compressed using the corresponding algorithm. | -| logging.file.rotation.maxBackupIndex | int | `5` | The maximum number of backup files to keep. | -| logging.file.rotation.maxFileSize | string | `"100Mi"` | The maximum size of the log file before it is rotated. Should be expressed as a Kubernetes quantity. | -| logging.file.storage | object | `{"className":"standard","selectorLabels":{},"size":"512Gi"}` | The log storage configuration. A persistent volume claim will be created using these settings. | -| logging.file.storage.className | string | `"standard"` | The storage class name of the persistent volume claim to create. | -| logging.file.storage.selectorLabels | object | `{}` | Labels to add to the persistent volume claim spec selector; a persistent volume with matching labels must exist. Leave empty if using dynamic provisioning. | -| logging.file.storage.size | string | `"512Gi"` | The size of the persistent volume claim to create. | -| logging.file.threshold | string | `"ALL"` | The log level of the file appender. | -| logging.level | string | `"INFO"` | The log level of the root category, which is used as the default log level for all categories. | -| logging.mdc | object | `{}` | Configuration for MDC (Mapped Diagnostic Context). Values specified here will be added to the log context of all incoming requests and can be used in log patterns. | -| logging.requestIdHeaderName | string | `"Polaris-Request-Id"` | The header name to use for the request ID. | -| managementService | object | `{"annotations":{},"clusterIP":"None","externalTrafficPolicy":null,"internalTrafficPolicy":null,"ports":[{"name":"polaris-mgmt","nodePort":null,"port":8182,"protocol":null,"targetPort":null}],"sessionAffinity":null,"trafficDistribution":null,"type":"ClusterIP"}` | Management service settings. These settings are used to configure liveness and readiness probes, and to configure the dedicated headless service that will expose health checks and metrics, e.g. for metrics scraping and service monitoring. | -| managementService.annotations | object | `{}` | Annotations to add to the service. | -| managementService.clusterIP | string | `"None"` | By default, the management service is headless, i.e. it does not have a cluster IP. This is generally the right option for exposing health checks and metrics, e.g. for metrics scraping and service monitoring. | -| managementService.ports | list | `[{"name":"polaris-mgmt","nodePort":null,"port":8182,"protocol":null,"targetPort":null}]` | The ports the management service will listen on. At least one port is required; the first port implicitly becomes the HTTP port that the application will use for serving management requests. By default, it's 8182. Note: port names must be unique and no more than 15 characters long. | -| managementService.ports[0] | object | `{"name":"polaris-mgmt","nodePort":null,"port":8182,"protocol":null,"targetPort":null}` | The name of the management port. Required. | -| managementService.ports[0].nodePort | string | `nil` | The port on each node on which this service is exposed when type is NodePort or LoadBalancer. Usually assigned by the system. If not specified, a port will be allocated if this Service requires one. If this field is specified when creating a Service which does not need it, creation will fail. | -| managementService.ports[0].port | int | `8182` | The port the management service listens on. By default, the management interface is exposed on HTTP port 8182. | -| managementService.ports[0].protocol | string | `nil` | The IP protocol for this port. Supports "TCP", "UDP", and "SCTP". Default is TCP. | -| managementService.ports[0].targetPort | string | `nil` | Number or name of the port to access on the pods targeted by the service. If this is a string, it will be looked up as a named port in the target Pod's container ports. If this is not specified, the value of the 'port' field is used. | -| managementService.type | string | `"ClusterIP"` | The type of service to create. Valid values are: ExternalName, ClusterIP, NodePort, and LoadBalancer. The default value is ClusterIP. | -| metrics.enabled | bool | `true` | Specifies whether metrics for the polaris server should be enabled. | -| metrics.tags | object | `{}` | Additional tags (dimensional labels) to add to the metrics. | -| nodeSelector | object | `{}` | Node labels which must match for the polaris pod to be scheduled on that node. See https://kubernetes.io/docs/concepts/scheduling-eviction/assign-pod-node/#nodeselector. | -| oidc | object | `{"authServeUrl":null,"client":{"id":"polaris","secret":{"key":"clientSecret","name":null}},"principalMapper":{"idClaimPath":null,"nameClaimPath":null,"type":"default"},"principalRolesMapper":{"filter":null,"mappings":[],"rolesClaimPath":null,"type":"default"}}` | Polaris OIDC configuration. Only relevant when at least one realm is configured for external (or mixed) authentication. The currently supported configuration is for a single, default OIDC tenant. For more complex scenarios, including OIDC multi-tenancy, you will need to provide the relevant configuration using the `advancedConfig` section. | -| oidc.authServeUrl | string | `nil` | The authentication server URL. Must be provided if at least one realm is configured for external authentication. | -| oidc.client | object | `{"id":"polaris","secret":{"key":"clientSecret","name":null}}` | The client to use when authenticating with the authentication server. | -| oidc.client.id | string | `"polaris"` | The client ID to use when contacting the authentication server's introspection endpoint in order to validate tokens. | -| oidc.client.secret | object | `{"key":"clientSecret","name":null}` | The secret to pull the client secret from. If no client secret is required, leave the secret name unset. | -| oidc.client.secret.key | string | `"clientSecret"` | The key name inside the secret to pull the client secret from. | -| oidc.client.secret.name | string | `nil` | The name of the secret to pull the client secret from. If not provided, the client is assumed to not require a client secret when contacting the introspection endpoint. | -| oidc.principalMapper | object | `{"idClaimPath":null,"nameClaimPath":null,"type":"default"}` | Principal mapping configuration. | -| oidc.principalMapper.idClaimPath | string | `nil` | The path to the claim that contains the principal ID. Nested paths can be expressed using "/" as a separator, e.g. "polaris/principal_id" would look for the "principal_id" field inside the "polaris" object in the token claims. Optional. Either this option or `nameClaimPath` (or both) must be provided. | -| oidc.principalMapper.nameClaimPath | string | `nil` | The claim that contains the principal name. Nested paths can be expressed using "/" as a separator, e.g. "polaris/principal_name" would look for the "principal_name" field inside the "polaris" object in the token claims. Optional. Either this option or `idClaimPath` (or both) must be provided. | -| oidc.principalMapper.type | string | `"default"` | The `PrincipalMapper` implementation to use. Only one built-in type is supported: default. | -| oidc.principalRolesMapper | object | `{"filter":null,"mappings":[],"rolesClaimPath":null,"type":"default"}` | Principal roles mapping configuration. | -| oidc.principalRolesMapper.filter | string | `nil` | A regular expression that matches the role names in the identity. Only roles that match this regex will be included in the Polaris-specific roles. | -| oidc.principalRolesMapper.mappings | list | `[]` | A list of regex mappings that will be applied to each role name in the identity. This can be used to transform the role names in the identity into role names as expected by Polaris. The default Authenticator expects the security identity to expose role names in the format `POLARIS_ROLE:`. | -| oidc.principalRolesMapper.rolesClaimPath | string | `nil` | The path to the claim that contains the principal roles. Nested paths can be expressed using "/" as a separator, e.g. "polaris/principal_roles" would look for the "principal_roles" field inside the "polaris" object in the token claims. If not set, Quarkus looks for roles in standard locations. See https://quarkus.io/guides/security-oidc-bearer-token-authentication#token-claims-and-security-identity-roles. | -| oidc.principalRolesMapper.type | string | `"default"` | The `PrincipalRolesMapper` implementation to use. Only one built-in type is supported: default. | -| persistence | object | `{"relationalJdbc":{"secret":{"jdbcUrl":"jdbcUrl","name":null,"password":"password","username":"username"}},"type":"in-memory"}` | Polaris persistence configuration. | -| persistence.relationalJdbc | object | `{"secret":{"jdbcUrl":"jdbcUrl","name":null,"password":"password","username":"username"}}` | The configuration for the relational-jdbc persistence manager. | -| persistence.relationalJdbc.secret | object | `{"jdbcUrl":"jdbcUrl","name":null,"password":"password","username":"username"}` | The secret name to pull the database connection properties from. | -| persistence.relationalJdbc.secret.jdbcUrl | string | `"jdbcUrl"` | The secret key holding the database JDBC connection URL | -| persistence.relationalJdbc.secret.name | string | `nil` | The secret name to pull database connection properties from | -| persistence.relationalJdbc.secret.password | string | `"password"` | The secret key holding the database password for authentication | -| persistence.relationalJdbc.secret.username | string | `"username"` | The secret key holding the database username for authentication | -| persistence.type | string | `"in-memory"` | The type of persistence to use. Two built-in types are supported: in-memory and relational-jdbc. The eclipse-link type is also supported but is deprecated. | -| podAnnotations | object | `{}` | Annotations to apply to polaris pods. | -| podLabels | object | `{}` | Additional Labels to apply to polaris pods. | -| podSecurityContext | object | `{"fsGroup":10001,"seccompProfile":{"type":"RuntimeDefault"}}` | Security context for the polaris pod. See https://kubernetes.io/docs/tasks/configure-pod-container/security-context/. | -| podSecurityContext.fsGroup | int | `10001` | GID 10001 is compatible with Polaris OSS default images; change this if you are using a different image. | -| rateLimiter | object | `{"tokenBucket":{"requestsPerSecond":9999,"type":"default","window":"PT10S"},"type":"no-op"}` | Polaris rate limiter configuration. | -| rateLimiter.tokenBucket | object | `{"requestsPerSecond":9999,"type":"default","window":"PT10S"}` | The configuration for the default rate limiter, which uses the token bucket algorithm with one bucket per realm. | -| rateLimiter.tokenBucket.requestsPerSecond | int | `9999` | The maximum number of requests per second allowed for each realm. | -| rateLimiter.tokenBucket.type | string | `"default"` | The type of the token bucket rate limiter. Only the default type is supported out of the box. | -| rateLimiter.tokenBucket.window | string | `"PT10S"` | The time window. | -| rateLimiter.type | string | `"no-op"` | The type of rate limiter filter to use. Two built-in types are supported: default and no-op. | -| readinessProbe | object | `{"failureThreshold":3,"initialDelaySeconds":5,"periodSeconds":10,"successThreshold":1,"timeoutSeconds":10}` | Configures the readiness probe for polaris pods. | -| readinessProbe.failureThreshold | int | `3` | Minimum consecutive failures for the probe to be considered failed after having succeeded. Minimum value is 1. | -| readinessProbe.initialDelaySeconds | int | `5` | Number of seconds after the container has started before readiness probes are initiated. Minimum value is 0. | -| readinessProbe.periodSeconds | int | `10` | How often (in seconds) to perform the probe. Minimum value is 1. | -| readinessProbe.successThreshold | int | `1` | Minimum consecutive successes for the probe to be considered successful after having failed. Minimum value is 1. | -| readinessProbe.timeoutSeconds | int | `10` | Number of seconds after which the probe times out. Minimum value is 1. | -| realmContext | object | `{"realms":["POLARIS"],"type":"default"}` | Realm context resolver configuration. | -| realmContext.realms | list | `["POLARIS"]` | List of valid realms, for use with the default realm context resolver. The first realm in the list is the default realm. Realms not in this list will be rejected. | -| realmContext.type | string | `"default"` | The type of realm context resolver to use. Two built-in types are supported: default and test; test is not recommended for production as it does not perform any realm validation. | -| replicaCount | int | `1` | The number of replicas to deploy (horizontal scaling). Beware that replicas are stateless; don't set this number > 1 when using in-memory meta store manager. | -| resources | object | `{}` | Configures the resources requests and limits for polaris pods. We usually recommend not to specify default resources and to leave this as a conscious choice for the user. This also increases chances charts run on environments with little resources, such as Minikube. If you do want to specify resources, uncomment the following lines, adjust them as necessary, and remove the curly braces after 'resources:'. | -| revisionHistoryLimit | string | `nil` | The number of old ReplicaSets to retain to allow rollback (if not set, the default Kubernetes value is set to 10). | -| service | object | `{"annotations":{},"clusterIP":null,"externalTrafficPolicy":null,"internalTrafficPolicy":null,"ports":[{"name":"polaris-http","nodePort":null,"port":8181,"protocol":null,"targetPort":null}],"sessionAffinity":null,"trafficDistribution":null,"type":"ClusterIP"}` | Polaris main service settings. | -| service.annotations | object | `{}` | Annotations to add to the service. | -| service.clusterIP | string | `nil` | You can specify your own cluster IP address If you define a Service that has the .spec.clusterIP set to "None" then Kubernetes does not assign an IP address. Instead, DNS records for the service will return the IP addresses of each pod targeted by the server. This is called a headless service. See https://kubernetes.io/docs/concepts/services-networking/service/#headless-services | -| service.externalTrafficPolicy | string | `nil` | Controls how traffic from external sources is routed. Valid values are Cluster and Local. The default value is Cluster. Set the field to Cluster to route traffic to all ready endpoints. Set the field to Local to only route to ready node-local endpoints. If the traffic policy is Local and there are no node-local endpoints, traffic is dropped by kube-proxy. | -| service.internalTrafficPolicy | string | `nil` | Controls how traffic from internal sources is routed. Valid values are Cluster and Local. The default value is Cluster. Set the field to Cluster to route traffic to all ready endpoints. Set the field to Local to only route to ready node-local endpoints. If the traffic policy is Local and there are no node-local endpoints, traffic is dropped by kube-proxy. | -| service.ports | list | `[{"name":"polaris-http","nodePort":null,"port":8181,"protocol":null,"targetPort":null}]` | The ports the service will listen on. At least one port is required; the first port implicitly becomes the HTTP port that the application will use for serving API requests. By default, it's 8181. Note: port names must be unique and no more than 15 characters long. | -| service.ports[0] | object | `{"name":"polaris-http","nodePort":null,"port":8181,"protocol":null,"targetPort":null}` | The name of the port. Required. | -| service.ports[0].nodePort | string | `nil` | The port on each node on which this service is exposed when type is NodePort or LoadBalancer. Usually assigned by the system. If not specified, a port will be allocated if this Service requires one. If this field is specified when creating a Service which does not need it, creation will fail. | -| service.ports[0].port | int | `8181` | The port the service listens on. By default, the HTTP port is 8181. | -| service.ports[0].protocol | string | `nil` | The IP protocol for this port. Supports "TCP", "UDP", and "SCTP". Default is TCP. | -| service.ports[0].targetPort | string | `nil` | Number or name of the port to access on the pods targeted by the service. If this is a string, it will be looked up as a named port in the target Pod's container ports. If this is not specified, the value of the 'port' field is used. | -| service.sessionAffinity | string | `nil` | The session affinity for the service. Valid values are: None, ClientIP. The default value is None. ClientIP enables sticky sessions based on the client's IP address. This is generally beneficial to Polaris deployments, but some testing may be required in order to make sure that the load is distributed evenly among the pods. Also, this setting affects only internal clients, not external ones. If Ingress is enabled, it is recommended to set sessionAffinity to None. | -| service.trafficDistribution | string | `nil` | The traffic distribution field provides another way to influence traffic routing within a Kubernetes Service. While traffic policies focus on strict semantic guarantees, traffic distribution allows you to express preferences such as routing to topologically closer endpoints. The only valid value is: PreferClose. The default value is implementation-specific. | -| service.type | string | `"ClusterIP"` | The type of service to create. Valid values are: ExternalName, ClusterIP, NodePort, and LoadBalancer. The default value is ClusterIP. | -| serviceAccount.annotations | object | `{}` | Annotations to add to the service account. | -| serviceAccount.create | bool | `true` | Specifies whether a service account should be created. | -| serviceAccount.name | string | `""` | The name of the service account to use. If not set and create is true, a name is generated using the fullname template. | -| serviceMonitor.enabled | bool | `true` | Specifies whether a ServiceMonitor for Prometheus operator should be created. | -| serviceMonitor.interval | string | `""` | The scrape interval; leave empty to let Prometheus decide. Must be a valid duration, e.g. 1d, 1h30m, 5m, 10s. | -| serviceMonitor.labels | object | `{}` | Labels for the created ServiceMonitor so that Prometheus operator can properly pick it up. | -| serviceMonitor.metricRelabelings | list | `[]` | Relabeling rules to apply to metrics. Ref https://prometheus.io/docs/prometheus/latest/configuration/configuration/#relabel_config. | -| storage | object | `{"secret":{"awsAccessKeyId":null,"awsSecretAccessKey":null,"gcpToken":null,"gcpTokenLifespan":null,"name":null}}` | Storage credentials for the server. If the following properties are unset, default credentials will be used, in which case the pod must have the necessary permissions to access the storage. | -| storage.secret | object | `{"awsAccessKeyId":null,"awsSecretAccessKey":null,"gcpToken":null,"gcpTokenLifespan":null,"name":null}` | The secret to pull storage credentials from. | -| storage.secret.awsAccessKeyId | string | `nil` | The key in the secret to pull the AWS access key ID from. Only required when using AWS. | -| storage.secret.awsSecretAccessKey | string | `nil` | The key in the secret to pull the AWS secret access key from. Only required when using AWS. | -| storage.secret.gcpToken | string | `nil` | The key in the secret to pull the GCP token from. Only required when using GCP. | -| storage.secret.gcpTokenLifespan | string | `nil` | The key in the secret to pull the GCP token expiration time from. Only required when using GCP. Must be a valid ISO 8601 duration. The default is PT1H (1 hour). | -| storage.secret.name | string | `nil` | The name of the secret to pull storage credentials from. | -| tasks | object | `{"maxConcurrentTasks":null,"maxQueuedTasks":null}` | Polaris asynchronous task executor configuration. | -| tasks.maxConcurrentTasks | string | `nil` | The maximum number of concurrent tasks that can be executed at the same time. The default is the number of available cores. | -| tasks.maxQueuedTasks | string | `nil` | The maximum number of tasks that can be queued up for execution. The default is Integer.MAX_VALUE. | -| tolerations | list | `[]` | A list of tolerations to apply to polaris pods. See https://kubernetes.io/docs/concepts/scheduling-eviction/taint-and-toleration/. | -| tracing.attributes | object | `{}` | Resource attributes to identify the polaris service among other tracing sources. See https://opentelemetry.io/docs/reference/specification/resource/semantic_conventions/#service. If left empty, traces will be attached to a service named "Apache Polaris"; to change this, provide a service.name attribute here. | -| tracing.enabled | bool | `false` | Specifies whether tracing for the polaris server should be enabled. | -| tracing.endpoint | string | `"http://otlp-collector:4317"` | The collector endpoint URL to connect to (required). The endpoint URL must have either the http:// or the https:// scheme. The collector must talk the OpenTelemetry protocol (OTLP) and the port must be its gRPC port (by default 4317). See https://quarkus.io/guides/opentelemetry for more information. | -| tracing.sample | string | `"1.0d"` | Which requests should be sampled. Valid values are: "all", "none", or a ratio between 0.0 and "1.0d" (inclusive). E.g. "0.5d" means that 50% of the requests will be sampled. Note: avoid entering numbers here, always prefer a string representation of the ratio. | diff --git a/1.1.0/metastores.md b/1.1.0/metastores.md deleted file mode 100644 index dd1d7f6f95..0000000000 --- a/1.1.0/metastores.md +++ /dev/null @@ -1,153 +0,0 @@ ---- -# -# Licensed to the Apache Software Foundation (ASF) under one -# or more contributor license agreements. See the NOTICE file -# distributed with this work for additional information -# regarding copyright ownership. The ASF licenses this file -# to you under the Apache License, Version 2.0 (the -# "License"); you may not use this file except in compliance -# with the License. You may obtain a copy of the License at -# -# http://www.apache.org/licenses/LICENSE-2.0 -# -# Unless required by applicable law or agreed to in writing, -# software distributed under the License is distributed on an -# "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY -# KIND, either express or implied. See the License for the -# specific language governing permissions and limitations -# under the License. -# -title: Metastores -type: docs -weight: 700 ---- - -This page explains how to configure and use Polaris metastores with either the recommended Relational JDBC or the -deprecated EclipseLink persistence backends. - -## Relational JDBC -This implementation leverages Quarkus for datasource management and supports configuration through -environment variables or JVM -D flags at startup. For more information, refer to the [Quarkus configuration reference](https://quarkus.io/guides/config-reference#env-file). - - -``` -POLARIS_PERSISTENCE_TYPE=relational-jdbc - -QUARKUS_DATASOURCE_USERNAME= -QUARKUS_DATASOURCE_PASSWORD= -QUARKUS_DATASOURCE_JDBC_URL= -``` - -The Relational JDBC metastore currently relies on a Quarkus-managed datasource and supports only PostgresSQL and H2 databases. This limitation is similar to that of EclipseLink, primarily due to underlying schema differences. At this time, official documentation is provided exclusively for usage with PostgreSQL. -Please refer to the documentation here: -[Configure data sources in Quarkus](https://quarkus.io/guides/datasource) - -Additionally the retries can be configured via `polaris.persistence.relational.jdbc.*` properties please ref [configuration]({{% ref "configuration" %}}) - -## EclipseLink (Deprecated) -> [!IMPORTANT] -> Eclipse link is deprecated, its recommend to use Relational JDBC as persistence instead. - -Polaris includes EclipseLink plugin by default with PostgresSQL driver. - -Configure the `polaris.persistence` section in your Polaris configuration file -(`application.properties`) as follows: - -``` -polaris.persistence.type=eclipse-link -polaris.persistence.eclipselink.configuration-file=/path/to/persistence.xml -polaris.persistence.eclipselink.persistence-unit=polaris -``` - -Alternatively, configuration can also be done with environment variables or system properties. Refer -to the [Quarkus Configuration Reference] for more information. - -The `configuration-file` option must point to an [EclipseLink configuration file]. This file, named -`persistence.xml`, is used to set up the database connection properties, which can differ depending -on the type of database and its configuration. - -> [!NOTE] -> You have to locate the `persistence.xml` at least two folders down to the root folder, e.g. `/deployments/config/persistence.xml` is OK, whereas `/deployments/persistence.xml` will cause an infinity loop. -[Quarkus Configuration Reference]: https://quarkus.io/guides/config-reference -[EclipseLink configuration file]: https://eclipse.dev/eclipselink/documentation/4.0/solutions/solutions.html#TESTINGJPA002 - -Polaris creates and connects to a separate database for each realm. Specifically, the `{realm}` placeholder in `jakarta.persistence.jdbc.url` is substituted with the actual realm name, allowing the Polaris server to connect to different databases based on the realm. - -> [!NOTE] -> Some database systems such as Postgres don't create databases automatically. Database admins need to create them manually before running Polaris server. - -A single `persistence.xml` can describe multiple [persistence units](https://eclipse.dev/eclipselink/documentation/4.0/concepts/concepts.html#APPDEV001). For example, with both a `polaris-dev` and `polaris` persistence unit defined, you could use a single `persistence.xml` to easily switch between development and production databases. Use the `persistence-unit` option in the Polaris server configuration to easily switch between persistence units. - -### Using H2 - -> [!IMPORTANT] -> H2 is an in-memory database and is not suitable for production! - -The default [persistence.xml] in Polaris is already configured for H2, but you can easily customize -your H2 configuration using the persistence unit template below: - -[persistence.xml]: https://github.com/apache/polaris/blob/main/persistence/eclipselink/src/main/resources/META-INF/persistence.xml - -```xml - - org.eclipse.persistence.jpa.PersistenceProvider - org.apache.polaris.extension.persistence.impl.eclipselink.models.ModelEntity - org.apache.polaris.extension.persistence.impl.eclipselink.models.ModelEntityActive - org.apache.polaris.extension.persistence.impl.eclipselink.models.ModelEntityChangeTracking - org.apache.polaris.extension.persistence.impl.eclipselink.models.ModelEntityDropped - org.apache.polaris.extension.persistence.impl.eclipselink.models.ModelGrantRecord - org.apache.polaris.extension.persistence.impl.eclipselink.models.ModelPrincipalSecrets - org.apache.polaris.extension.persistence.impl.eclipselink.models.ModelSequenceId - NONE - - - - - - - -``` - -To build Polaris with the necessary H2 dependency and start the Polaris service, run the following: - -```shell -./gradlew \ - :polaris-server:assemble \ - :polaris-server:quarkusAppPartsBuild --rerun \ - -PeclipseLinkDeps=com.h2database:h2:2.3.232 -java -Dpolaris.persistence.type=eclipse-link \ - -Dpolaris.persistence.eclipselink.configuration-file=/path/to/persistence.xml \ - -Dpolaris.persistence.eclipselink.persistence-unit=polaris \ - -jar runtime/server/build/quarkus-app/quarkus-run.jar -``` - -### Using Postgres - -PostgreSQL is included by default in the Polaris server distribution. - -The following shows a sample configuration for integrating Polaris with Postgres. - -```xml - - org.eclipse.persistence.jpa.PersistenceProvider - org.apache.polaris.extension.persistence.impl.eclipselink.models.ModelEntity - org.apache.polaris.extension.persistence.impl.eclipselink.models.ModelEntityActive - org.apache.polaris.extension.persistence.impl.eclipselink.models.ModelEntityChangeTracking - org.apache.polaris.extension.persistence.impl.eclipselink.models.ModelEntityDropped - org.apache.polaris.extension.persistence.impl.eclipselink.models.ModelGrantRecord - org.apache.polaris.extension.persistence.impl.eclipselink.models.ModelPrincipalSecrets - org.apache.polaris.extension.persistence.impl.eclipselink.models.ModelSequenceId - NONE - - - - - - - - - - -``` diff --git a/1.1.0/polaris-catalog-service.md b/1.1.0/polaris-catalog-service.md deleted file mode 100644 index 02fed63f46..0000000000 --- a/1.1.0/polaris-catalog-service.md +++ /dev/null @@ -1,26 +0,0 @@ ---- -# -# Licensed to the Apache Software Foundation (ASF) under one -# or more contributor license agreements. See the NOTICE file -# distributed with this work for additional information -# regarding copyright ownership. The ASF licenses this file -# to you under the Apache License, Version 2.0 (the -# "License"); you may not use this file except in compliance -# with the License. You may obtain a copy of the License at -# -# http://www.apache.org/licenses/LICENSE-2.0 -# -# Unless required by applicable law or agreed to in writing, -# software distributed under the License is distributed on an -# "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY -# KIND, either express or implied. See the License for the -# specific language governing permissions and limitations -# under the License. -# -linkTitle: 'Catalog API Spec' -weight: 900 -params: - show_page_toc: false ---- - -{{< redoc-polaris "generated/bundled-polaris-catalog-service.yaml" >}} diff --git a/1.1.0/polaris-management-service.md b/1.1.0/polaris-management-service.md deleted file mode 100644 index 0b66b9daa4..0000000000 --- a/1.1.0/polaris-management-service.md +++ /dev/null @@ -1,27 +0,0 @@ ---- -# -# Licensed to the Apache Software Foundation (ASF) under one -# or more contributor license agreements. See the NOTICE file -# distributed with this work for additional information -# regarding copyright ownership. The ASF licenses this file -# to you under the Apache License, Version 2.0 (the -# "License"); you may not use this file except in compliance -# with the License. You may obtain a copy of the License at -# -# http://www.apache.org/licenses/LICENSE-2.0 -# -# Unless required by applicable law or agreed to in writing, -# software distributed under the License is distributed on an -# "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY -# KIND, either express or implied. See the License for the -# specific language governing permissions and limitations -# under the License. -# -title: 'Apache Polaris Management Service OpenAPI' -linkTitle: 'Management OpenAPI' -weight: 800 -params: - show_page_toc: false ---- - -{{< redoc-polaris "polaris-management-service.yml" >}} diff --git a/1.1.0/polaris-spark-client.md b/1.1.0/polaris-spark-client.md deleted file mode 100644 index 3d597f19f4..0000000000 --- a/1.1.0/polaris-spark-client.md +++ /dev/null @@ -1,129 +0,0 @@ ---- -# -# Licensed to the Apache Software Foundation (ASF) under one -# or more contributor license agreements. See the NOTICE file -# distributed with this work for additional information -# regarding copyright ownership. The ASF licenses this file -# to you under the Apache License, Version 2.0 (the -# "License"); you may not use this file except in compliance -# with the License. You may obtain a copy of the License at -# -# http://www.apache.org/licenses/LICENSE-2.0 -# -# Unless required by applicable law or agreed to in writing, -# software distributed under the License is distributed on an -# "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY -# KIND, either express or implied. See the License for the -# specific language governing permissions and limitations -# under the License. -# -Title: Polaris Spark Client -type: docs -weight: 650 ---- - -Apache Polaris now provides Catalog support for Generic Tables (non-Iceberg tables), please check out -the [Catalog API Spec]({{% ref "polaris-catalog-service" %}}) for Generic Table API specs. - -Along with the Generic Table Catalog support, Polaris is also releasing a Spark client, which helps to -provide an end-to-end solution for Apache Spark to manage Delta tables using Polaris. - -Note the Polaris Spark client is able to handle both Iceberg and Delta tables, not just Delta. - -This page documents how to connect Spark with Polaris Service using the Polaris Spark client. - -## Quick Start with Local Polaris service -If you want to quickly try out the functionality with a local Polaris service, simply check out the Polaris repo -and follow the instructions in the Spark plugin getting-started -[README](https://github.com/apache/polaris/blob/main/plugins/spark/v3.5/getting-started/README.md). - -Check out the Polaris repo: -```shell -git clone https://github.com/apache/polaris.git ~/polaris -``` - -## Start Spark against a deployed Polaris service -Before starting, ensure that the deployed Polaris service supports Generic Tables, and that Spark 3.5(version 3.5.3 or later is installed). -Spark 3.5.6 is recommended, and you can follow the instructions below to get a Spark 3.5.6 distribution. -```shell -cd ~ -wget https://www.apache.org/dyn/closer.lua/spark/spark-3.5.6/spark-3.5.6-bin-hadoop3.tgz?action=download -mkdir spark-3.5 -tar xzvf spark-3.5.6-bin-hadoop3.tgz -C spark-3.5 --strip-components=1 -cd spark-3.5 -``` - -### Connecting with Spark using the Polaris Spark client -The following CLI command can be used to start the Spark with connection to the deployed Polaris service using -a released Polaris Spark client. - -```shell -bin/spark-shell \ ---packages ,org.apache.iceberg:iceberg-aws-bundle:1.9.1,io.delta:delta-spark_2.12:3.3.1 \ ---conf spark.sql.extensions=org.apache.iceberg.spark.extensions.IcebergSparkSessionExtensions,io.delta.sql.DeltaSparkSessionExtension \ ---conf spark.sql.catalog.spark_catalog=org.apache.spark.sql.delta.catalog.DeltaCatalog \ ---conf spark.sql.catalog..warehouse= \ ---conf spark.sql.catalog..header.X-Iceberg-Access-Delegation=vended-credentials \ ---conf spark.sql.catalog.=org.apache.polaris.spark.SparkCatalog \ ---conf spark.sql.catalog..uri= \ ---conf spark.sql.catalog..credential=':' \ ---conf spark.sql.catalog..scope='PRINCIPAL_ROLE:ALL' \ ---conf spark.sql.catalog..token-refresh-enabled=true -``` -Assume the released Polaris Spark client you want to use is `org.apache.polaris:polaris-spark-3.5_2.12:1.0.0`, -replace the `polaris-spark-client-package` field with the release. - -The `spark-catalog-name` is the catalog name you will use with Spark, and `polaris-catalog-name` is the catalog name used -by Polaris service, for simplicity, you can use the same name. - -Replace the `polaris-service-uri` with the uri of the deployed Polaris service. For example, with a locally deployed -Polaris service, the uri would be `http://localhost:8181/api/catalog`. - -For `client-id` and `client-secret` values, you can refer to [Using Polaris]({{% ref "getting-started/using-polaris" %}}) -for more details. - -You can also start the connection by programmatically initialize a SparkSession, following is an example with PySpark: -```python -from pyspark.sql import SparkSession - -spark = SparkSession.builder - .config("spark.jars.packages", ",org.apache.iceberg:iceberg-aws-bundle:1.9.1,io.delta:delta-spark_2.12:3.3.1") - .config("spark.sql.catalog.spark_catalog", "org.apache.spark.sql.delta.catalog.DeltaCatalog") - .config("spark.sql.extensions", "org.apache.iceberg.spark.extensions.IcebergSparkSessionExtensions,io.delta.sql.DeltaSparkSessionExtension") - .config("spark.sql.catalog.", "org.apache.polaris.spark.SparkCatalog") - .config("spark.sql.catalog..uri", ) - .config("spark.sql.catalog..token-refresh-enabled", "true") - .config("spark.sql.catalog..credential", ":") - .config("spark.sql.catalog..warehouse", ) - .config("spark.sql.catalog.polaris.scope", 'PRINCIPAL_ROLE:ALL') - .config("spark.sql.catalog.polaris.header.X-Iceberg-Access-Delegation", 'vended-credentials') - .getOrCreate() -``` -Similar as the CLI command, make sure the corresponding fields are replaced correctly. - -### Create tables with Spark -After Spark is started, you can use it to create and access Iceberg and Delta tables, for example: -```python -spark.sql("USE polaris") -spark.sql("CREATE NAMESPACE IF NOT EXISTS DELTA_NS") -spark.sql("CREATE NAMESPACE IF NOT EXISTS DELTA_NS.PUBLIC") -spark.sql("USE NAMESPACE DELTA_NS.PUBLIC") -spark.sql("""CREATE TABLE IF NOT EXISTS PEOPLE ( - id int, name string) -USING delta LOCATION 'file:///tmp/var/delta_tables/people'; -""") -``` - -## Connecting with Spark using local Polaris Spark client jar -If you would like to use a version of the Spark client that is currently not yet released, you can -build a Spark client jar locally from source. Please check out the Polaris repo and refer to the Spark plugin -[README](https://github.com/apache/polaris/blob/main/plugins/spark/README.md) for detailed instructions. - -## Limitations -The Polaris Spark client has the following functionality limitations: -1) Create table as select (CTAS) is not supported for Delta tables. As a result, the `saveAsTable` method of `Dataframe` - is also not supported, since it relies on the CTAS support. -2) Create a Delta table without explicit location is not supported. -3) Rename a Delta table is not supported. -4) ALTER TABLE ... SET LOCATION is not supported for DELTA table. -5) For other non-Iceberg tables like csv, it is not supported. diff --git a/1.1.0/policy.md b/1.1.0/policy.md deleted file mode 100644 index e96661f3f3..0000000000 --- a/1.1.0/policy.md +++ /dev/null @@ -1,198 +0,0 @@ ---- -# -# Licensed to the Apache Software Foundation (ASF) under one -# or more contributor license agreements. See the NOTICE file -# distributed with this work for additional information -# regarding copyright ownership. The ASF licenses this file -# to you under the Apache License, Version 2.0 (the -# "License"); you may not use this file except in compliance -# with the License. You may obtain a copy of the License at -# -# http://www.apache.org/licenses/LICENSE-2.0 -# -# Unless required by applicable law or agreed to in writing, -# software distributed under the License is distributed on an -# "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY -# KIND, either express or implied. See the License for the -# specific language governing permissions and limitations -# under the License. -# -title: Policy -type: docs -weight: 425 ---- - -The Polaris Policy framework empowers organizations to centrally define, manage, and enforce fine-grained governance, lifecycle, and operational rules across all data resources in the catalog. - -With the policy API, you can: -- Create and manage policies -- Attach policies to specific resources (catalogs, namespaces, tables, or views) -- Check applicable policies for any given resource - -## What is a Policy? - -A policy in Apache Polaris is a structured entity that defines rules governing actions on specified resources under -predefined conditions. Each policy contains: - -- **Name**: A unique identifier within a namespace -- **Type**: Determines the semantics and expected format of the policy content -- **Description**: Explains the purpose of the policy -- **Content**: Contains the actual rules defining the policy behavior -- **Version**: An automatically tracked revision number -- **Inheritable**: Whether the policy can be inherited by child resources, decided by its type - -### Policy Types - -Polaris supports several predefined system policy types (prefixed with `system.`): - -| Policy Type | Purpose | JSON-Schema | Applies To | -|-------------|-------------------------------------------------------|-------------|------------| -| **`system.data-compaction`** | Defines rules for data file compaction operations | [`data-compaction/2025-02-03.json`](https://polaris.apache.org/schemas/policies/system/data-compaction/2025-02-03.json) | Iceberg **table**, **namespace**, **catalog** | -| **`system.metadata-compaction`** | Defines rules for metadata file compaction operations | [`metadata-compaction/2025-02-03.json`](https://polaris.apache.org/schemas/policies/system/metadata-compaction/2025-02-03.json) | Iceberg **table**, **namespace**, **catalog** | -| **`system.orphan-file-removal`** | Defines rules for removing orphaned files | [`orphan-file-removal/2025-02-03.json`](https://polaris.apache.org/schemas/policies/system/orphan-file-removal/2025-02-03.json) | Iceberg **table**, **namespace**, **catalog** | -| **`system.snapshot-expiry`** | Defines rules for snapshot expiration | [`snapshot-expiry/2025-02-03.json`](https://polaris.apache.org/schemas/policies/system/snapshot-expiry/2025-02-03.json) | Iceberg **table**, **namespace**, **catalog** | - -Support for additional predefined system policy types and custom policy type definitions is in progress. -For more details, please refer to the [roadmap](https://github.com/apache/polaris/discussions/1028). - -### Policy Inheritance - -The entity hierarchy in Polaris is structured as follows: - -``` - Catalog - | - Namespace - | - +-----------+----------+ - | | | -Iceberg Iceberg Generic - Table View Table -``` - -Policies can be attached at any level, and inheritance flows from catalog down to namespace, then to tables and views. - -Policies can be inheritable or non-inheritable: - -- **Inheritable policies**: Apply to the target resource and all its applicable child resources -- **Non-inheritable policies**: Apply only to the specific target resource - -The inheritance follows an override mechanism: -1. Table-level policies override namespace and catalog policies -2. Namespace-level policies override parent namespace and catalog policies - -> [!IMPORTANT] -> Because an override completely replaces the same policy type at higher levels, -> **only one instance of a given policy type can be attached to (and therefore affect) a resource**. - -## Working with Policies - -### Creating a Policy - -To create a policy, you need to provide a name, type, and optionally a description and content: - -```json -POST /polaris/v1/{prefix}/namespaces/{namespace}/policies -{ - "name": "compaction-policy", - "type": "system.data-compaction", - "description": "Policy for optimizing table storage", - "content": "{\"version\": \"2025-02-03\", \"enable\": true, \"config\": {\"target_file_size_bytes\": 134217728}}" -} -``` - -The policy content is validated against a schema specific to its type. Here are a few policy content examples: -- Data Compaction Policy -```json -{ - "version": "2025-02-03", - "enable": true, - "config": { - "target_file_size_bytes": 134217728, - "compaction_strategy": "bin-pack", - "max-concurrent-file-group-rewrites": 5 - } -} -``` -- Orphan File Removal Policy -```json -{ - "version": "2025-02-03", - "enable": true, - "max_orphan_file_age_in_days": 30, - "locations": ["s3://my-bucket/my-table-location"], - "config": { - "prefix_mismatch_mode": "ignore" - } -} -``` - -### Attaching Policies to Resources - -Policies can be attached to different resource levels: - -1. **Catalog level**: Applies to the entire catalog -2. **Namespace level**: Applies to a specific namespace -3. **Table-like level**: Applies to individual tables or views - -Example of attaching a policy to a table: - -```json -PUT /polaris/v1/{prefix}/namespaces/{namespace}/policies/{policy-name}/mappings -{ - "target": { - "type": "table-like", - "path": ["NS1", "NS2", "test_table_1"] - } -} -``` - -For inheritable policies, only one policy of a given type can be attached to a resource. For non-inheritable policies, -multiple policies of the same type can be attached. - -### Retrieving Applicable Policies -A user can view applicable policies on a resource (e.g., table, namespace, or catalog) as long as they have -read permission on that resource. - -Here is an example to find all policies that apply to a specific resource (including inherited policies): -``` -GET /polaris/v1/catalog/applicable-policies?namespace=finance%1Fquarterly&target-name=transactions -``` - -**Sample response:** -```json -{ - "policies": [ - { - "name": "snapshot-expiry-policy", - "type": "system.snapshot-expiry", - "appliedAt": "namespace", - "content": { - "version": "2025-02-03", - "enable": true, - "config": { - "min_snapshot_to_keep": 1, - "max_snapshot_age_days": 2, - "max_ref_age_days": 3 - } - } - }, - { - "name": "compaction-policy", - "type": "system.data-compaction", - "appliedAt": "catalog", - "content": { - "version": "2025-02-03", - "enable": true, - "config": { - "target_file_size_bytes": 134217728 - } - } - } - ] -} -``` - -### API Reference - -For the complete and up-to-date API specification, see the [policy-api.yaml](https://github.com/apache/polaris/blob/main/spec/polaris-catalog-apis/policy-apis.yaml). diff --git a/1.1.0/realm.md b/1.1.0/realm.md deleted file mode 100644 index 4e0cc1ce25..0000000000 --- a/1.1.0/realm.md +++ /dev/null @@ -1,53 +0,0 @@ ---- -# -# Licensed to the Apache Software Foundation (ASF) under one -# or more contributor license agreements. See the NOTICE file -# distributed with this work for additional information -# regarding copyright ownership. The ASF licenses this file -# to you under the Apache License, Version 2.0 (the -# "License"); you may not use this file except in compliance -# with the License. You may obtain a copy of the License at -# -# http://www.apache.org/licenses/LICENSE-2.0 -# -# Unless required by applicable law or agreed to in writing, -# software distributed under the License is distributed on an -# "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY -# KIND, either express or implied. See the License for the -# specific language governing permissions and limitations -# under the License. -# -Title: Realm -type: docs -weight: 350 ---- - -This page explains what a realm is and what it is used for in Polaris. - -### What is it? - -A realm in Polaris serves as logical partitioning mechanism within the catalog system. This isolation allows for multitenancy, enabling different teams, environments or organizations to operate independently within the same Polaris deployment. - -### Key Characteristics - -**Isolation:** Each realm encapsulates its own set of resources, ensuring that operations, policies in one realm do not affect others. - -**Authentication Context:** When configuring Polaris, principals credentials are associated with a specific realm. This allows for the separation of security concerns across different realms. - -**Configuration Scope:** Realm identifiers are used in various configurations, such as connection strings feature configurations, etc. - -An example of this is: - -`jdbc:postgresql://localhost:5432/{realm}` - -This ensures that each realm's data is stored separately. - -### How is it used in the system? - -**RealmContext:** It is a key concept used to identify and resolve the context in which operations are performed. For example `DefaultRealmContextResolver`, a realm is resolved from request headers, and operations are performed based on the resolved realm identifier. - -**Authentication and Authorization:** For example, in `DefaultAuthenticator`, `RealmContext` is used to provide context about the current security domain, which is used to retrieve the correct `PolarisMetastoreManager` that manages all Polaris entities and associated grant records metadata for -authorization. - -**Isolation:** In methods like `createEntityManagerFactory(@Nonnull RealmContext realmContext)` from `PolarisEclipseLinkPersistenceUnit` interface, the realm context influence how resources are created or managed based on the security policies of that realm. -An example of this is the way a realm name can be used to create a database connection url so that you have one database instance per realm, when applicable. Or it can be more granular and applied at primary key level (within the same database instance). diff --git a/1.1.0/telemetry.md b/1.1.0/telemetry.md deleted file mode 100644 index 9e867408da..0000000000 --- a/1.1.0/telemetry.md +++ /dev/null @@ -1,192 +0,0 @@ ---- -# -# Licensed to the Apache Software Foundation (ASF) under one -# or more contributor license agreements. See the NOTICE file -# distributed with this work for additional information -# regarding copyright ownership. The ASF licenses this file -# to you under the Apache License, Version 2.0 (the -# "License"); you may not use this file except in compliance -# with the License. You may obtain a copy of the License at -# -# http://www.apache.org/licenses/LICENSE-2.0 -# -# Unless required by applicable law or agreed to in writing, -# software distributed under the License is distributed on an -# "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY -# KIND, either express or implied. See the License for the -# specific language governing permissions and limitations -# under the License. -# -title: Telemetry -type: docs -weight: 450 ---- - -## Metrics - -Metrics are published using [Micrometer]; they are available from Polaris's management interface -(port 8282 by default) under the path `/q/metrics`. For example, if the server is running on -localhost, the metrics can be accessed via http://localhost:8282/q/metrics. - -[Micrometer]: https://quarkus.io/guides/telemetry-micrometer - -Metrics can be scraped by Prometheus or any compatible metrics scraping server. See: -[Prometheus](https://prometheus.io) for more information. - -Additional tags can be added to the metrics by setting the `polaris.metrics.tags.*` property. Each -tag is a key-value pair, where the key is the tag name and the value is the tag value. For example, -to add a tag `environment=prod` to all metrics, set `polaris.metrics.tags.environment=prod`. Many -tags can be added, such as below: - -```properties -polaris.metrics.tags.service=polaris -polaris.metrics.tags.environment=prod -polaris.metrics.tags.region=us-west-2 -``` - -Note that by default Polaris adds one tag: `application=Polaris`. You can override this tag by -setting the `polaris.metrics.tags.application=` property. - -### Realm ID Tag - -Polaris can add the realm ID as a tag to all API and HTTP request metrics. This is disabled by -default to prevent high cardinality issues, but can be enabled by setting the following properties: - -```properties -polaris.metrics.realm-id-tag.enable-in-api-metrics=true -polaris.metrics.realm-id-tag.enable-in-http-metrics=true -``` - -You should be particularly careful when enabling the realm ID tag in HTTP request metrics, as these -metrics typically have a much higher cardinality than API request metrics. - -In order to prevent the number of tags from growing indefinitely and causing performance issues or -crashing the server, the number of unique realm IDs in HTTP request metrics is limited to 100 by -default. If the number of unique realm IDs exceeds this value, a warning will be logged and no more -HTTP request metrics will be recorded. This threshold can be changed by setting the -`polaris.metrics.realm-id-tag.http-metrics-max-cardinality` property. - -## Traces - -Traces are published using [OpenTelemetry]. - -[OpenTelemetry]: https://quarkus.io/guides/opentelemetry-tracing - -By default OpenTelemetry is disabled in Polaris, because there is no reasonable default -for the collector endpoint for all cases. - -To enable OpenTelemetry and publish traces for Polaris set `quarkus.otel.sdk.disabled=false` -and configure a valid collector endpoint URL with `http://` or `https://` as the server property -`quarkus.otel.exporter.otlp.traces.endpoint`. - -_If these properties are not set, the server will not publish traces._ - -The collector must talk the OpenTelemetry protocol (OTLP) and the port must be its gRPC port -(by default 4317), e.g. "http://otlp-collector:4317". - -By default, Polaris adds a few attributes to the [OpenTelemetry Resource] to identify the server, -and notably: - -- `service.name`: set to `Apache Polaris Server (incubating)`; -- `service.version`: set to the Polaris version. - -[OpenTelemetry Resource]: https://opentelemetry.io/docs/languages/js/resources/ - -You can override the default resource attributes or add additional ones by setting the -`quarkus.otel.resource.attributes` property. - -This property expects a comma-separated list of key-value pairs, where the key is the attribute name -and the value is the attribute value. For example, to change the service name to `Polaris` and add -an attribute `deployment.environment=dev`, set the following property: - -```properties -quarkus.otel.resource.attributes=service.name=Polaris,deployment.environment=dev -``` - -The alternative syntax below can also be used: - -```properties -quarkus.otel.resource.attributes[0]=service.name=Polaris -quarkus.otel.resource.attributes[1]=deployment.environment=dev -``` - -Finally, two additional span attributes are added to all request parent spans: - -- `polaris.request.id`: The unique identifier of the request, if set by the caller through the - `Polaris-Request-Id` header. -- `polaris.realm`: The unique identifier of the realm. Always set (unless the request failed because - of a realm resolution error). - -### Troubleshooting Traces - -If the server is unable to publish traces, check first for a log warning message like the following: - -``` -SEVERE [io.ope.exp.int.grp.OkHttpGrpcExporter] (OkHttp http://localhost:4317/...) Failed to export spans. -The request could not be executed. Full error message: Failed to connect to localhost/0:0:0:0:0:0:0:1:4317 -``` - -This means that the server is unable to connect to the collector. Check that the collector is -running and that the URL is correct. - -## Logging - -Polaris relies on [Quarkus](https://quarkus.io/guides/logging) for logging. - -By default, logs are written to the console and to a file located in the `./logs` directory. The log -file is rotated daily and compressed. The maximum size of the log file is 10MB, and the maximum -number of backup files is 14. - -JSON logging can be enabled by setting the `quarkus.log.console.json.enabled` and `quarkus.log.file.json.enabled` -properties to `true`. By default, JSON logging is disabled. - -The log level can be set for the entire application or for specific packages. The default log level -is `INFO`. To set the log level for the entire application, use the `quarkus.log.level` property. - -To set the log level for a specific package, use the `quarkus.log.category."package-name".level`, -where `package-name` is the name of the package. For example, the package `io.smallrye.config` has a -useful logger to help debugging configuration issues; but it needs to be set to the `DEBUG` level. -This can be done by setting the following property: - -```properties -quarkus.log.category."io.smallrye.config".level=DEBUG -``` - -The log message format for both console and file output is highly configurable. The default format -is: - -``` -%d{yyyy-MM-dd HH:mm:ss,SSS} %-5p [%c{3.}] [%X{requestId},%X{realmId}] [%X{traceId},%X{parentId},%X{spanId},%X{sampled}] (%t) %s%e%n -``` - -Refer to the [Logging format](https://quarkus.io/guides/logging#logging-format) guide for more -information on placeholders and how to customize the log message format. - -### MDC Logging - -Polaris uses Mapped Diagnostic Context (MDC) to enrich log messages with additional context. The -following MDC keys are available: - -- `requestId`: The unique identifier of the request, if set by the caller through the - `Polaris-Request-Id` header. -- `realmId`: The unique identifier of the realm. Always set. -- `traceId`: The unique identifier of the trace. Present if tracing is enabled and the message is - originating from a traced context. -- `parentId`: The unique identifier of the parent span. Present if tracing is enabled and the - message is originating from a traced context. -- `spanId`: The unique identifier of the span. Present if tracing is enabled and the message is - originating from a traced context. -- `sampled`: Whether the trace has been sampled. Present if tracing is enabled and the message is - originating from a traced context. - -Other MDC keys can be added by setting the `polaris.log.mdc.*` property. Each property is a -key-value pair, where the key is the MDC key name and the value is the MDC key value. For example, -to add the MDC keys `environment=prod` and `region=us-west-2` to all log messages, set the following -properties: - -```properties -polaris.log.mdc.environment=prod -polaris.log.mdc.region=us-west-2 -``` - -MDC context is propagated across threads, including in `TaskExecutor` threads. diff --git a/1.2.0/_index.md b/1.2.0/_index.md deleted file mode 100644 index a55fc149e6..0000000000 --- a/1.2.0/_index.md +++ /dev/null @@ -1,186 +0,0 @@ ---- -# -# Licensed to the Apache Software Foundation (ASF) under one -# or more contributor license agreements. See the NOTICE file -# distributed with this work for additional information -# regarding copyright ownership. The ASF licenses this file -# to you under the Apache License, Version 2.0 (the -# "License"); you may not use this file except in compliance -# with the License. You may obtain a copy of the License at -# -# http://www.apache.org/licenses/LICENSE-2.0 -# -# Unless required by applicable law or agreed to in writing, -# software distributed under the License is distributed on an -# "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY -# KIND, either express or implied. See the License for the -# specific language governing permissions and limitations -# under the License. -# -linkTitle: 'In Development' -title: 'Overview' -type: docs -weight: 200 -params: - top_hidden: true - show_page_toc: false -cascade: - type: docs - params: - show_page_toc: true -# This file will NOT be copied into a new release's versioned docs folder. ---- - -{{< alert warning >}} -These pages refer to the current state of the main branch, which is still under active development. - -Functionalities can be changed, removed or added without prior notice. -{{< /alert >}} - -Apache Polaris (Incubating) is a catalog implementation for Apache Iceberg™ tables and is built on the open source Apache Iceberg™ REST protocol. - -With Polaris, you can provide centralized, secure read and write access to your Iceberg tables across different REST-compatible query engines. - -![Conceptual diagram of Apache Polaris (Incubating).](/img/overview.svg "Apache Polaris (Incubating) overview") - -## Key concepts - -This section introduces key concepts associated with using Apache Polaris (Incubating). - -In the following diagram, a sample [Apache Polaris (Incubating) structure](#catalog) with nested [namespaces](#namespace) is shown for Catalog1. No tables -or namespaces have been created yet for Catalog2 or Catalog3. - -![Diagram that shows an example Apache Polaris (Incubating) structure.](/img/sample-catalog-structure.svg "Sample Apache Polaris (Incubating) structure") - -### Catalog - -In Polaris, you can create one or more catalog resources to organize Iceberg tables. - -Configure your catalog by setting values in the storage configuration for S3, Azure, or Google Cloud Storage. An Iceberg catalog enables a -query engine to manage and organize tables. The catalog forms the first architectural layer in the [Apache Iceberg™ table specification](https://iceberg.apache.org/spec/#overview) and must support the following tasks: - -- Storing the current metadata pointer for one or more Iceberg tables. A metadata pointer maps a table name to the location of that table's - current metadata file. - -- Performing atomic operations so that you can update the current metadata pointer for a table to the metadata pointer of a new version of - the table. - -To learn more about Iceberg REST catalogs, see the [Apache Iceberg™ REST catalog specification](https://iceberg.apache.org/rest-catalog-spec/). - -#### Catalog types - -A catalog can be one of the following two types: - -- Internal: The catalog is managed by Polaris. Tables from this catalog can be read and written in Polaris. - -- External: The catalog is externally managed by another Iceberg catalog provider (for example, Snowflake, Glue, Dremio Arctic). Tables from - this catalog are synced to Polaris. These tables are read-only in Polaris. - -A catalog is configured with a storage configuration that can point to S3, Azure storage, or GCS. - -### Namespace - -You create *namespaces* to logically group Iceberg tables within a catalog. A catalog can have multiple namespaces. You can also create -nested namespaces. Iceberg tables belong to namespaces. - -{{< alert important >}} -For the access privileges defined for a catalog to be enforced correctly, the following conditions must be met: - -- The directory only contains the data files that belong to a single table. -- The directory hierarchy matches the namespace hierarchy for the catalog. - -For example, if a catalog includes the following items: - -- Top-level namespace namespace1 -- Nested namespace namespace1a -- A customers table, which is grouped under nested namespace namespace1a -- An orders table, which is grouped under nested namespace namespace1a - -The directory hierarchy for the catalog must follow this structure: - -- /namespace1/namespace1a/customers/ -- /namespace1/namespace1a/orders/ -{{< /alert >}} - -### Storage configuration - -A storage configuration stores a generated identity and access management (IAM) entity for your cloud storage and is created -when you create a catalog. The storage configuration is used to set the values to connect Polaris to your cloud storage. During the -catalog creation process, an IAM entity is generated and used to create a trust relationship between the cloud storage provider and Polaris -Catalog. - -When you create a catalog, you supply the following information about your cloud storage: - -| Cloud storage provider | Information | -| -----------------------| ----------- | -| Amazon S3 |
  • Default base location for your Amazon S3 bucket
  • Locations for your Amazon S3 bucket
  • S3 role ARN
  • External ID (optional)
| -| Google Cloud Storage (GCS) |
  • Default base location for your GCS bucket
  • Locations for your GCS bucket
| -| Azure |
  • Default base location for your Microsoft Azure container
  • Locations for your Microsoft Azure container
  • Azure tenant ID
| - -## Example workflow - -In the following example workflow, Bob creates an Apache Iceberg™ table named Table1 and Alice reads data from Table1. - -1. Bob uses Apache Spark™ to create the Table1 table under the - Namespace1 namespace in the Catalog1 catalog and insert values into - Table1. - - Bob can create Table1 and insert data into it because he is using a - service connection with a service principal that has - the privileges to perform these actions. - -2. Alice uses Snowflake to read data from Table1. - - Alice can read data from Table1 because she is using a service - connection with a service principal with a catalog integration that - has the privileges to perform this action. Alice - creates an unmanaged table in Snowflake to read data from Table1. - -![Diagram that shows an example workflow for Apache Polaris (Incubating)](/img/example-workflow.svg "Example workflow for Apache Polaris (Incubating)") - -## Security and access control - -### Credential vending - -To secure interactions with service connections, Polaris vends temporary storage credentials to the query engine during query -execution. These credentials allow the query engine to run the query without requiring access to your cloud storage for -Iceberg tables. This process is called credential vending. - -As of now, the following limitation is known regarding Apache Iceberg support: - -- **remove_orphan_files:** Apache Spark can't use credential vending - for this due to a known issue. See [apache/iceberg#7914](https://github.com/apache/iceberg/pull/7914) for details. - -### Identity and access management (IAM) - -Polaris uses the identity and access management (IAM) entity to securely connect to your storage for accessing table data, Iceberg -metadata, and manifest files that store the table schema, partitions, and other metadata. Polaris retains the IAM entity for your -storage location. - -### Access control - -Polaris enforces the access control that you configure across all tables registered with the service and governs security for all -queries from query engines in a consistent manner. - -Polaris uses a role-based access control (RBAC) model that lets you centrally configure access for Polaris service principals to catalogs, -namespaces, and tables. - -Polaris RBAC uses two different role types to delegate privileges: - -- **Principal roles:** Granted to Polaris service principals and - analogous to roles in other access control systems that you grant to - service principals. - -- **Catalog roles:** Configured with certain privileges on Polaris - catalog resources and granted to principal roles. - -For more information, see [Access control]({{% ref "managing-security/access-control" %}}). - -## Legal Notices - -Apache®, Apache Iceberg™, Apache Spark™, Apache Flink®, and Flink® are either registered trademarks or trademarks of the Apache Software Foundation in the United States and/or other countries. - - - diff --git a/1.2.0/admin-tool.md b/1.2.0/admin-tool.md deleted file mode 100644 index accfdcd525..0000000000 --- a/1.2.0/admin-tool.md +++ /dev/null @@ -1,143 +0,0 @@ ---- -# -# Licensed to the Apache Software Foundation (ASF) under one -# or more contributor license agreements. See the NOTICE file -# distributed with this work for additional information -# regarding copyright ownership. The ASF licenses this file -# to you under the Apache License, Version 2.0 (the -# "License"); you may not use this file except in compliance -# with the License. You may obtain a copy of the License at -# -# http://www.apache.org/licenses/LICENSE-2.0 -# -# Unless required by applicable law or agreed to in writing, -# software distributed under the License is distributed on an -# "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY -# KIND, either express or implied. See the License for the -# specific language governing permissions and limitations -# under the License. -# -title: Admin Tool -type: docs -weight: 300 ---- - -Polaris includes a tool for administrators to manage the metastore. - -The tool must be built with the necessary JDBC drivers to access the metastore database. For -example, to build the tool with support for Postgres, run the following: - -```shell -./gradlew \ - :polaris-admin:assemble \ - :polaris-admin:quarkusAppPartsBuild --rerun \ - -Dquarkus.container-image.build=true -``` - -The above command will generate: - -- One Fast-JAR in `runtime/admin/build/quarkus-app/quarkus-run.jar` -- Two Docker images named `apache/polaris-admin-tool:latest` and `apache/polaris-admin-tool:` - -## Usage - -Please make sure the admin tool and Polaris server are with the same version before using it. -To run the standalone JAR, use the following command: - -```shell -java -jar runtime/admin/build/quarkus-app/quarkus-run.jar --help -``` - -To run the Docker image, use the following command: - -```shell -docker run apache/polaris-admin-tool:latest --help -``` - -The basic usage of the Polaris Admin Tool is outlined below: - -``` -Usage: polaris-admin-runner.jar [-hV] [COMMAND] -Polaris Admin Tool - -h, --help Show this help message and exit. - -V, --version Print version information and exit. -Commands: - help Display help information about the specified command. - bootstrap Bootstraps realms and principal credentials. - purge Purge principal credentials. -``` - -## Configuration - -The Polaris Admin Tool must be executed with the same configuration as the Polaris server. The -configuration can be done via environment variables or system properties. - -At a minimum, it is necessary to configure the Polaris Admin Tool to connect to the same database -used by the Polaris server. - -See the [metastore documentation]({{% ref "metastores" %}}) for more information on configuring the -database connection. - -Note: Polaris will always create schema 'polaris_schema' during bootstrap under the configured database. - -## Bootstrapping Realms and Principal Credentials - -The `bootstrap` command is used to bootstrap realms and create the necessary principal credentials -for the Polaris server. This command is idempotent and can be run multiple times without causing any -issues. If a realm is already bootstrapped, running the `bootstrap` command again will not have any -effect on that realm. - -```shell -java -jar runtime/admin/build/quarkus-app/quarkus-run.jar bootstrap --help -``` - -The basic usage of the `bootstrap` command is outlined below: - -``` -Usage: polaris-admin-runner.jar bootstrap [-hV] [-c=]... -r= [-r=]... -Bootstraps realms and root principal credentials. - -c, --credential= - Root principal credentials to bootstrap. Must be of the form - 'realm,clientId,clientSecret'. - -h, --help Show this help message and exit. - -r, --realm= The name of a realm to bootstrap. - -V, --version Print version information and exit. -``` - -For example, to bootstrap the `realm1` realm and create its root principal credential with the -client ID `admin` and client secret `admin`, you can run the following command: - -```shell -java -jar runtime/admin/build/quarkus-app/quarkus-run.jar bootstrap -r realm1 -c realm1,admin,admin -``` - -## Purging Realms and Principal Credentials - -The `purge` command is used to remove realms and principal credentials from the Polaris server. - -{{< alert warning >}} -Running the `purge` command will remove all data associated with the specified realms! -This includes all entities (catalogs, namespaces, tables, views, roles), all principal -credentials, grants, and any other data associated with the realms. -{{< /alert >}} - -```shell -java -jar runtime/admin/build/quarkus-app/quarkus-run.jar purge --help -``` - -The basic usage of the `purge` command is outlined below: - -``` -Usage: polaris-admin-runner.jar purge [-hV] -r= [-r=]... -Purge realms and all associated entities. - -h, --help Show this help message and exit. - -r, --realm= The name of a realm to purge. - -V, --version Print version information and exit. -``` - -For example, to purge the `realm1` realm, you can run the following command: - -```shell -java -jar runtime/admin/build/quarkus-app/quarkus-run.jar purge -r realm1 -``` diff --git a/1.2.0/command-line-interface.md b/1.2.0/command-line-interface.md deleted file mode 100644 index c455ec80f2..0000000000 --- a/1.2.0/command-line-interface.md +++ /dev/null @@ -1,1477 +0,0 @@ ---- -# -# Licensed to the Apache Software Foundation (ASF) under one -# or more contributor license agreements. See the NOTICE file -# distributed with this work for additional information -# regarding copyright ownership. The ASF licenses this file -# to you under the Apache License, Version 2.0 (the -# "License"); you may not use this file except in compliance -# with the License. You may obtain a copy of the License at -# -# http://www.apache.org/licenses/LICENSE-2.0 -# -# Unless required by applicable law or agreed to in writing, -# software distributed under the License is distributed on an -# "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY -# KIND, either express or implied. See the License for the -# specific language governing permissions and limitations -# under the License. -# -title: Command Line Interface -type: docs -weight: 300 ---- - -In order to help administrators quickly set up and manage their Polaris server, Polaris provides a simple command-line interface (CLI) for common tasks. - -The basic syntax of the Polaris CLI is outlined below: - -``` -polaris [options] COMMAND ... - -options: ---host ---port ---base-url ---client-id ---client-secret ---access-token ---realm ---header ---profile ---proxy -``` - -`COMMAND` must be one of the following: -1. catalogs -2. principals -3. principal-roles -4. catalog-roles -5. namespaces -6. privileges -7. profiles -8. policies -9. repair - -Each _command_ supports several _subcommands_, and some _subcommands_ have _actions_ that come after the subcommand in turn. Finally, _arguments_ follow to form a full invocation. Within a set of named arguments at the end of an invocation ordering is generally not important. Many invocations also have a required positional argument of the type that the _command_ refers to. Again, the ordering of this positional argument relative to named arguments is not important. - -Some example full invocations: - -``` -polaris principals list -polaris catalogs delete some_catalog_name -polaris catalogs update --property foo=bar some_other_catalog -polaris catalogs update another_catalog --property k=v -polaris privileges namespace grant --namespace some.schema --catalog fourth_catalog --catalog-role some_catalog_role TABLE_READ_DATA -polaris profiles list -polaris policies list --catalog some_catalog --namespace some.schema -polaris repair -``` - -### Authentication - -As outlined above, the Polaris CLI may take credentials using the `--client-id` and `--client-secret` options. For example: - -``` -polaris --client-id 4b5ed1ca908c3cc2 --client-secret 07ea8e4edefb9a9e57c247e8d1a4f51c principals ... -``` - -If `--client-id` and `--client-secret` are not provided, the Polaris CLI will try to read the client ID and client secret from environment variables called `CLIENT_ID` and `CLIENT_SECRET` respectively. If these flags are not provided and the environment variables are not set, the CLI will fail. - -Alternatively, the `--access-token` option can be used instead of `--client-id` and `--client-secret`, but both authentication methods cannot be used simultaneously. - -Additionally, the `--profile` option can be used to specify a saved profile instead of providing authentication details directly. If `--profile` is not provided, the CLI will check the `CLIENT_PROFILE` environment variable. Profiles store authentication details and connection settings, simplifying repeated CLI usage. - -If the `--host` and `--port` options are not provided, the CLI will default to communicating with `localhost:8181`. - -Alternatively, the `--base-url` option can be used instead of `--host` and `--port`, but both options cannot be used simultaneously. This allows specifying arbitrary Polaris URLs, including HTTPS ones, that have additional base prefixes before the `/api/*/v1` subpaths. - -If your Polaris server is configured to use a realm other than the default, you can use the `--realm` option to specify a realm. If `--realm` is not provided, the CLI will check the `REALM` environment variable. If neither is provided, the CLI will not send the realm context header. -Also, if your Polaris server uses a custom realm header name, you can use the `--header` option to specify it. If `--header` is not provided, the CLI will check the `HEADER` environment variable. If neither is provided, the CLI will use default header name `Polaris-Realm`. - -Read [here]({{% ref "./configuration.md" %}}) more about configuring polaris server to work with multiple realms. - -### PATH - -These examples assume the Polaris CLI is on the PATH and so can be invoked just by the command `polaris`. You can add the CLI to your PATH environment variable with a command like the following: - -``` -export PATH="~/polaris:$PATH" -``` - -Alternatively, you can run the CLI by providing a path to it, such as with the following invocation: - -``` -~/polaris principals list -``` - -## Commands - -Each of the commands `catalogs`, `principals`, `principal-roles`, `catalog-roles`, and `privileges` is used to manage a different type of entity within Polaris. - -In addition to these, the `profiles` command is available for managing stored authentication profiles, allowing login credentials to be configured for reuse. This provides an alternative to passing authentication details with every command. By default, profiles are stored in a `.polaris.json` file within the `~/.polaris` directory. The location of this directory can be overridden by setting the `POLARIS_HOME` environment variable. - -To find details on the options that can be provided to a particular command or subcommand ad-hoc, you may wish to use the `--help` flag. For example: - -``` -polaris catalogs --help -polaris principals create --help -polaris profiles --help -``` - -### catalogs - -The `catalogs` command is used to create, discover, and otherwise manage catalogs within Polaris. - -`catalogs` supports the following subcommands: - -1. create -2. delete -3. get -4. list -5. update - -#### create - -The `create` subcommand is used to create a catalog. - -``` -input: polaris catalogs create --help -options: - create - Named arguments: - --type The type of catalog to create in [INTERNAL, EXTERNAL]. INTERNAL by default. - --storage-type (Required) The type of storage to use for the catalog - --default-base-location (Required) Default base location of the catalog - --endpoint (Only for S3) The S3 endpoint to use when connecting to S3 - --endpoint-internal (Only for S3) The S3 endpoint used by Polaris to use when connecting to S3, if different from the one that clients use - --sts-endpoint (Only for S3) The STS endpoint to use when connecting to STS - --path-style-access (Only for S3) Whether to use path-style-access for S3 - --allowed-location An allowed location for files tracked by the catalog. Multiple locations can be provided by specifying this option more than once. - --role-arn (Only for AWS S3) A role ARN to use when connecting to S3 - --region (Only for S3) The region to use when connecting to S3 - --external-id (Only for S3) The external ID to use when connecting to S3 - --tenant-id (Required for Azure) A tenant ID to use when connecting to Azure Storage - --multi-tenant-app-name (Only for Azure) The app name to use when connecting to Azure Storage - --consent-url (Only for Azure) A consent URL granting permissions for the Azure Storage location - --service-account (Only for GCS) The service account to use when connecting to GCS - --property A key/value pair such as: tag=value. Multiple can be provided by specifying this option more than once - --catalog-connection-type The type of external catalog in [iceberg-rest, hadoop]. - --iceberg-remote-catalog-name The remote catalog name when federating to an Iceberg REST catalog - --hadoop-warehouse The warehouse to use when federating to a HADOOP catalog - --catalog-authentication-type The type of authentication in [OAUTH, BEARER, SIGV4, IMPLICIT] - --catalog-service-identity-type The type of service identity in [AWS_IAM] - --catalog-service-identity-iam-arn When using the AWS_IAM service identity type, this is the ARN of the IAM user or IAM role Polaris uses to assume roles and then access external resources. - --catalog-uri The URI of the external catalog - --catalog-token-uri (For authentication type OAUTH) Token server URI - --catalog-client-id (For authentication type OAUTH) oauth client id - --catalog-client-secret (For authentication type OAUTH) oauth client secret (input-only) - --catalog-client-scope (For authentication type OAUTH) oauth scopes to specify when exchanging for a short-lived access token. Multiple can be provided by specifying this option more than once - --catalog-bearer-token (For authentication type BEARER) Bearer token (input-only) - --catalog-role-arn (For authentication type SIGV4) The aws IAM role arn assumed by polaris userArn when signing requests - --catalog-role-session-name (For authentication type SIGV4) The role session name to be used by the SigV4 protocol for signing requests - --catalog-external-id (For authentication type SIGV4) An optional external id used to establish a trust relationship with AWS in the trust policy - --catalog-signing-region (For authentication type SIGV4) Region to be used by the SigV4 protocol for signing requests - --catalog-signing-name (For authentication type SIGV4) The service name to be used by the SigV4 protocol for signing requests, the default signing name is "execute-api" is if not provided - Positional arguments: - catalog -``` - -##### Examples - -``` -polaris catalogs create \ - --storage-type s3 \ - --default-base-location s3://example-bucket/my_data \ - --role-arn ${ROLE_ARN} \ - my_catalog - -polaris catalogs create \ - --storage-type s3 \ - --default-base-location s3://example-bucket/my_other_data \ - --allowed-location s3://example-bucket/second_location \ - --allowed-location s3://other-bucket/third_location \ - --role-arn ${ROLE_ARN} \ - my_other_catalog - -polaris catalogs create \ - --storage-type file \ - --default-base-location file:///example/tmp \ - quickstart_catalog -``` - -#### delete - -The `delete` subcommand is used to delete a catalog. - -``` -input: polaris catalogs delete --help -options: - delete - Positional arguments: - catalog -``` - -##### Examples - -``` -polaris catalogs delete some_catalog -``` - -#### get - -The `get` subcommand is used to retrieve details about a catalog. - -``` -input: polaris catalogs get --help -options: - get - Positional arguments: - catalog -``` - -##### Examples - -``` -polaris catalogs get some_catalog - -polaris catalogs get another_catalog -``` - -#### list - -The `list` subcommand is used to show details about all catalogs, or those that a certain principal role has access to. The principal used to perform this operation must have the `CATALOG_LIST` privilege. - -``` -input: polaris catalogs list --help -options: - list - Named arguments: - --principal-role The name of a principal role -``` - -##### Examples - -``` -polaris catalogs list - -polaris catalogs list --principal-role some_user -``` - -#### update - -The `update` subcommand is used to update a catalog. Currently, this command supports changing the properties of a catalog or updating its storage configuration. - -``` -input: polaris catalogs update --help -options: - update - Named arguments: - --default-base-location A new default base location for the catalog - --allowed-location An allowed location for files tracked by the catalog. Multiple locations can be provided by specifying this option more than once. - --region (Only for S3) The region to use when connecting to S3 - --set-property A key/value pair such as: tag=value. Merges the specified key/value into an existing properties map by updating the value if the key already exists or creating a new entry if not. Multiple can be provided by specifying this option more than once - --remove-property A key to remove from a properties map. If the key already does not exist then no action is takn for the specified key. If properties are also being set in the same update command then the list of removals is applied last. Multiple can be provided by specifying this option more than once - Positional arguments: - catalog -``` - -##### Examples - -``` -polaris catalogs update --property tag=new_value my_catalog - -polaris catalogs update --default-base-location s3://new-bucket/my_data my_catalog -``` - -### Principals - -The `principals` command is used to manage principals within Polaris. - -`principals` supports the following subcommands: - -1. create -2. delete -3. get -4. list -5. rotate-credentials -6. update -7. access -8. reset - -#### create - -The `create` subcommand is used to create a new principal. - -``` -input: polaris principals create --help -options: - create - Named arguments: - --type The type of principal to create in [SERVICE] - --property A key/value pair such as: tag=value. Multiple can be provided by specifying this option more than once - Positional arguments: - principal -``` - -##### Examples - -``` -polaris principals create some_user - -polaris principals create --client-id ${CLIENT_ID} --property admin=true some_admin_user -``` - -#### delete - -The `delete` subcommand is used to delete a principal. - -``` -input: polaris principals delete --help -options: - delete - Positional arguments: - principal -``` - -##### Examples - -``` -polaris principals delete some_user - -polaris principals delete some_admin_user -``` - -#### get - -The `get` subcommand retrieves details about a principal. - -``` -input: polaris principals get --help -options: - get - Positional arguments: - principal -``` - -##### Examples - -``` -polaris principals get some_user - -polaris principals get some_admin_user -``` - -#### list - -The `list` subcommand shows details about all principals. - -##### Examples - -``` -polaris principals list -``` - -#### rotate-credentials - -The `rotate-credentials` subcommand is used to update the credentials used by a principal. After this command runs successfully, the new credentials will be printed to stdout. - -``` -input: polaris principals rotate-credentials --help -options: - rotate-credentials - Positional arguments: - principal -``` - -##### Examples - -``` -polaris principals rotate-credentials some_user - -polaris principals rotate-credentials some_admin_user -``` - -#### update - -The `update` subcommand is used to update a principal. Currently, this supports rewriting the properties associated with a principal. - -``` -input: polaris principals update --help -options: - update - Named arguments: - --set-property A key/value pair such as: tag=value. Merges the specified key/value into an existing properties map by updating the value if the key already exists or creating a new entry if not. Multiple can be provided by specifying this option more than once - --remove-property A key to remove from a properties map. If the key already does not exist then no action is takn for the specified key. If properties are also being set in the same update command then the list of removals is applied last. Multiple can be provided by specifying this option more than once - Positional arguments: - principal -``` - -##### Examples - -``` -polaris principals update --property key=value --property other_key=other_value some_user - -polaris principals update --property are_other_keys_removed=yes some_user -``` - -#### access - -The `access` subcommand retrieves entities relation about a principal. - -``` -input: polaris principals access --help -options: - access - Positional arguments: - principal -``` - -##### Examples - -``` -polaris principals access quickstart_user -``` - -#### reset - -The `reset` subcommand is used to reset principal credentials. - -``` -input: polaris principals reset --help -options: - reset - Named arguments: - --new-client-id The new client ID for the principal - --new-client-secret The new client secret for the principal - Positional arguments: - principal -``` - -##### Examples - -``` -polaris principals create some_user - -polaris principals reset some_user -polaris principals reset --new-client-id ${NEW_CLIENT_ID} some_user -polaris principals reset --new-client-secret ${NEW_CLIENT_SECRET} some_user -polaris principals reset --new-client-id ${NEW_CLIENT_ID} --new-client-secret ${NEW_CLIENT_SECRET} some_user -``` - -### Principal Roles - -The `principal-roles` command is used to create, discover, and manage principal roles within Polaris. Additionally, this command can identify principals or catalog roles associated with a principal role, and can be used to grant a principal role to a principal. - -`principal-roles` supports the following subcommands: - -1. create -2. delete -3. get -4. list -5. update -6. grant -7. revoke - -#### create - -The `create` subcommand is used to create a new principal role. - -``` -input: polaris principal-roles create --help -options: - create - Named arguments: - --property A key/value pair such as: tag=value. Multiple can be provided by specifying this option more than once - Positional arguments: - principal_role -``` - -##### Examples - -``` -polaris principal-roles create data_engineer - -polaris principal-roles create --property key=value data_analyst -``` - -#### delete - -The `delete` subcommand is used to delete a principal role. - -``` -input: polaris principal-roles delete --help -options: - delete - Positional arguments: - principal_role -``` - -##### Examples - -``` -polaris principal-roles delete data_engineer - -polaris principal-roles delete data_analyst -``` - -#### get - -The `get` subcommand retrieves details about a principal role. - -``` -input: polaris principal-roles get --help -options: - get - Positional arguments: - principal_role -``` - -##### Examples - -``` -polaris principal-roles get data_engineer - -polaris principal-roles get data_analyst -``` - -#### list - -The list subcommand is used to print out all principal roles or, alternatively, to list all principal roles associated with a given principal or with a given catalog role. - -``` -input: polaris principal-roles list --help -options: - list - Named arguments: - --catalog-role The name of a catalog role. If provided, show only principal roles assigned to this catalog role. - --principal The name of a principal. If provided, show only principal roles assigned to this principal. -``` - -##### Examples - -``` -polaris principal-roles list - -polaris principal-roles --principal d.knuth - -polaris principal-roles --catalog-role super_secret_data -``` - -#### update - -The `update` subcommand is used to update a principal role. Currently, this supports updating the properties tied to a principal role. - -``` -input: polaris principal-roles update --help -options: - update - Named arguments: - --set-property A key/value pair such as: tag=value. Merges the specified key/value into an existing properties map by updating the value if the key already exists or creating a new entry if not. Multiple can be provided by specifying this option more than once - --remove-property A key to remove from a properties map. If the key already does not exist then no action is takn for the specified key. If properties are also being set in the same update command then the list of removals is applied last. Multiple can be provided by specifying this option more than once - Positional arguments: - principal_role -``` - -##### Examples - -``` -polaris principal-roles update --property key=value2 data_engineer - -polaris principal-roles update data_analyst --property key=value3 -``` - -#### grant - -The `grant` subcommand is used to grant a principal role to a principal. - -``` -input: polaris principal-roles grant --help -options: - grant - Named arguments: - --principal A principal to grant this principal role to - Positional arguments: - principal_role -``` - -##### Examples - -``` -polaris principal-roles grant --principal d.knuth data_engineer - -polaris principal-roles grant data_scientist --principal a.ng -``` - -#### revoke - -The `revoke` subcommand is used to revoke a principal role from a principal. - -``` -input: polaris principal-roles revoke --help -options: - revoke - Named arguments: - --principal A principal to revoke this principal role from - Positional arguments: - principal_role -``` - -##### Examples - -``` -polaris principal-roles revoke --principal former.employee data_engineer - -polaris principal-roles revoke data_scientist --principal changed.role -``` - -### Catalog Roles - -The catalog-roles command is used to create, discover, and manage catalog roles within Polaris. Additionally, this command can be used to grant a catalog role to a principal role. - -`catalog-roles` supports the following subcommands: - -1. create -2. delete -3. get -4. list -5. update -6. grant -7. revoke - -#### create - -The `create` subcommand is used to create a new catalog role. - -``` -input: polaris catalog-roles create --help -options: - create - Named arguments: - --catalog The name of an existing catalog - --property A key/value pair such as: tag=value. Multiple can be provided by specifying this option more than once - Positional arguments: - catalog_role -``` - -##### Examples - -``` -polaris catalog-roles create --property key=value --catalog some_catalog sales_data - -polaris catalog-roles create --catalog other_catalog sales_data -``` - -#### delete - -The `delete` subcommand is used to delete a catalog role. - -``` -input: polaris catalog-roles delete --help -options: - delete - Named arguments: - --catalog The name of an existing catalog - Positional arguments: - catalog_role -``` - -##### Examples - -``` -polaris catalog-roles delete --catalog some_catalog sales_data - -polaris catalog-roles delete --catalog other_catalog sales_data -``` - -#### get - -The `get` subcommand retrieves details about a catalog role. - -``` -input: polaris catalog-roles get --help -options: - get - Named arguments: - --catalog The name of an existing catalog - Positional arguments: - catalog_role -``` - -##### Examples - -``` -polaris catalog-roles get --catalog some_catalog inventory_data - -polaris catalog-roles get --catalog other_catalog inventory_data -``` - -#### list - -The `list` subcommand is used to print all catalog roles. Alternatively, if a principal role is provided, only catalog roles associated with that principal are shown. - -``` -input: polaris catalog-roles list --help -options: - list - Named arguments: - --principal-role The name of a principal role - Positional arguments: - catalog -``` - -##### Examples - -``` -polaris catalog-roles list - -polaris catalog-roles list --principal-role data_engineer -``` - -#### update - -The `update` subcommand is used to update a catalog role. Currently, only updating properties associated with the catalog role is supported. - -``` -input: polaris catalog-roles update --help -options: - update - Named arguments: - --catalog The name of an existing catalog - --set-property A key/value pair such as: tag=value. Merges the specified key/value into an existing properties map by updating the value if the key already exists or creating a new entry if not. Multiple can be provided by specifying this option more than once - --remove-property A key to remove from a properties map. If the key already does not exist then no action is takn for the specified key. If properties are also being set in the same update command then the list of removals is applied last. Multiple can be provided by specifying this option more than once - Positional arguments: - catalog_role -``` - -##### Examples - -``` -polaris catalog-roles update --property contains_pii=true --catalog some_catalog sales_data - -polaris catalog-roles update sales_data --catalog some_catalog --property key=value -``` - -#### grant - -The `grant` subcommand is used to grant a catalog role to a principal role. - -``` -input: polaris catalog-roles grant --help -options: - grant - Named arguments: - --catalog The name of an existing catalog - --principal-role The name of a catalog role - Positional arguments: - catalog_role -``` - -##### Examples - -``` -polaris catalog-roles grant sensitive_data --catalog some_catalog --principal-role power_user - -polaris catalog-roles grant --catalog sales_data contains_cc_info_catalog_role --principal-role financial_analyst_role -``` - -#### revoke - -The `revoke` subcommand is used to revoke a catalog role from a principal role. - -``` -input: polaris catalog-roles revoke --help -options: - revoke - Named arguments: - --catalog The name of an existing catalog - --principal-role The name of a catalog role - Positional arguments: - catalog_role -``` - -##### Examples - -``` -polaris catalog-roles revoke sensitive_data --catalog some_catalog --principal-role power_user - -polaris catalog-roles revoke --catalog sales_data contains_cc_info_catalog_role --principal-role financial_analyst_role -``` - -### Namespaces - -The `namespaces` command is used to manage namespaces within Polaris. - -`namespaces` supports the following subcommands: - -1. create -2. delete -3. get -4. list - -#### create - -The `create` subcommand is used to create a new namespace. - -When creating a namespace with an explicit location, that location must reside within the parent catalog or namespace. - -``` -input: polaris namespaces create --help -options: - create - Named arguments: - --catalog The name of an existing catalog - --location If specified, the location at which to store the namespace and entities inside it - --property A key/value pair such as: tag=value. Multiple can be provided by specifying this option more than once - Positional arguments: - namespace -``` - -##### Examples - -``` -polaris namespaces create --catalog my_catalog outer - -polaris namespaces create --catalog my_catalog --location 's3://bucket/outer/inner_SUFFIX' outer.inner -``` - -#### delete - -The `delete` subcommand is used to delete a namespace. - -``` -input: polaris namespaces delete --help -options: - delete - Named arguments: - --catalog The name of an existing catalog - Positional arguments: - namespace -``` - -##### Examples - -``` -polaris namespaces delete outer_namespace.inner_namespace --catalog my_catalog - -polaris namespaces delete --catalog my_catalog outer_namespace -``` - -#### get - -The `get` subcommand retrieves details about a namespace. - -``` -input: polaris namespaces get --help -options: - get - Named arguments: - --catalog The name of an existing catalog - Positional arguments: - namespace -``` - -##### Examples - -``` -polaris namespaces get --catalog some_catalog a.b - -polaris namespaces get a.b.c --catalog some_catalog -``` - -#### list - -The `list` subcommand shows details about all namespaces directly within a catalog or, optionally, within some parent prefix in that catalog. - -``` -input: polaris namespaces list --help -options: - list - Named arguments: - --catalog The name of an existing catalog - --parent If specified, list namespaces inside this parent namespace -``` - -##### Examples - -``` -polaris namespaces list --catalog my_catalog - -polaris namespaces list --catalog my_catalog --parent a - -polaris namespaces list --catalog my_catalog --parent a.b -``` - -### Privileges - -The `privileges` command is used to grant various privileges to a catalog role, or to revoke those privileges. Privileges can be on the level of a catalog, a namespace, a table, or a view. For more information on privileges, please refer to the [docs]({{% ref "entities#privilege" %}}). - -Note that when using the `privileges` command, the user specifies the relevant catalog and catalog role before selecting a subcommand. - -`privileges` supports the following subcommands: - -1. list -2. catalog -3. namespace -4. table -5. view - -Each of these subcommands, except `list`, supports the `grant` and `revoke` actions and requires an action to be specified. - -Note that each subcommand's `revoke` action always accepts the same options that the corresponding `grant` action does, but with the addition of the `cascade` option. `cascade` is used to revoke all other privileges that depend on the specified privilege. - -#### list - -The `list` subcommand shows details about all privileges for a catalog role. - -``` -input: polaris privileges list --help -options: - list - Named arguments: - --catalog The name of an existing catalog - --catalog-role The name of a catalog role -``` - -##### Examples - -``` -polaris privileges list --catalog my_catalog --catalog-role my_role - -polaris privileges my_role list --catalog-role my_other_role --catalog my_catalog -``` - -#### catalog - -The `catalog` subcommand manages privileges at the catalog level. `grant` is used to grant catalog privileges to the specified catalog role, and `revoke` is used to revoke them. - -``` -input: polaris privileges catalog --help -options: - catalog - grant - Named arguments: - --catalog The name of an existing catalog - --catalog-role The name of a catalog role - Positional arguments: - privilege - revoke - Named arguments: - --cascade When revoking privileges, additionally revoke privileges that depend on the specified privilege - --catalog The name of an existing catalog - --catalog-role The name of a catalog role - Positional arguments: - privilege -``` - -##### Examples - -``` -polaris privileges \ - catalog \ - grant \ - --catalog my_catalog \ - --catalog-role catalog_role \ - TABLE_CREATE - -polaris privileges \ - catalog \ - revoke \ - --catalog my_catalog \ - --catalog-role catalog_role \ - --cascade \ - TABLE_CREATE -``` - -#### namespace - -The `namespace` subcommand manages privileges at the namespace level. - -``` -input: polaris privileges namespace --help -options: - namespace - grant - Named arguments: - --namespace A period-delimited namespace - --catalog The name of an existing catalog - --catalog-role The name of a catalog role - Positional arguments: - privilege - revoke - Named arguments: - --namespace A period-delimited namespace - --cascade When revoking privileges, additionally revoke privileges that depend on the specified privilege - --catalog The name of an existing catalog - --catalog-role The name of a catalog role - Positional arguments: - privilege -``` - -##### Examples - -``` -polaris privileges \ - namespace \ - grant \ - --catalog my_catalog \ - --catalog-role catalog_role \ - --namespace a.b \ - TABLE_LIST - -polaris privileges \ - namespace \ - revoke \ - --catalog my_catalog \ - --catalog-role catalog_role \ - --namespace a.b \ - TABLE_LIST -``` - -#### table - -The `table` subcommand manages privileges at the table level. - -``` -input: polaris privileges table --help -options: - table - grant - Named arguments: - --namespace A period-delimited namespace - --table The name of a table - --catalog The name of an existing catalog - --catalog-role The name of a catalog role - Positional arguments: - privilege - revoke - Named arguments: - --namespace A period-delimited namespace - --table The name of a table - --cascade When revoking privileges, additionally revoke privileges that depend on the specified privilege - --catalog The name of an existing catalog - --catalog-role The name of a catalog role - Positional arguments: - privilege -``` - -##### Examples - -``` -polaris privileges \ - table \ - grant \ - --catalog my_catalog \ - --catalog-role catalog_role \ - --namespace a.b \ - --table t \ - TABLE_DROP - -polaris privileges \ - table \ - grant \ - --catalog my_catalog \ - --catalog-role catalog_role \ - --namespace a.b \ - --table t \ - --cascade \ - TABLE_DROP -``` - -#### view - -The `view` subcommand manages privileges at the view level. - -``` -input: polaris privileges view --help -options: - view - grant - Named arguments: - --namespace A period-delimited namespace - --view The name of a view - --catalog The name of an existing catalog - --catalog-role The name of a catalog role - Positional arguments: - privilege - revoke - Named arguments: - --namespace A period-delimited namespace - --view The name of a view - --cascade When revoking privileges, additionally revoke privileges that depend on the specified privilege - --catalog The name of an existing catalog - --catalog-role The name of a catalog role - Positional arguments: - privilege -``` - -##### Examples - -``` -polaris privileges \ - view \ - grant \ - --catalog my_catalog \ - --catalog-role catalog_role \ - --namespace a.b.c \ - --view v \ - VIEW_FULL_METADATA - -polaris privileges \ - view \ - grant \ - --catalog my_catalog \ - --catalog-role catalog_role \ - --namespace a.b.c \ - --view v \ - --cascade \ - VIEW_FULL_METADATA -``` - -### profiles - -The `profiles` command is used to manage stored authentication profiles in Polaris. Profiles allow authentication credentials to be saved and reused, eliminating the need to pass credentials with every command. - -`profiles` supports the following subcommands: - -1. create -2. delete -3. get -4. list -5. update - -#### create - -The `create` subcommand is used to create a new authentication profile. - -``` -input: polaris profiles create --help -options: - create - Positional arguments: - profile -``` - -##### Examples - -``` -polaris profiles create dev -``` - -#### delete - -The `delete` subcommand removes a stored profile. - -``` -input: polaris profiles delete --help -options: - delete - Positional arguments: - profile -``` - -##### Examples - -``` -polaris profiles delete dev -``` - -#### get - -The `get` subcommand removes a stored profile. - -``` -input: polaris profiles get --help -options: - get - Positional arguments: - profile -``` - -##### Examples - -``` -polaris profiles get dev -``` - -#### list - -The `list` subcommand displays all stored profiles. - -``` -input: polaris profiles list --help -options: - list -``` - -##### Examples - -``` -polaris profiles list -``` - -#### update - -The `update` subcommand modifies an existing profile. - -``` -input: polaris profiles update --help -options: - update - Positional arguments: - profile -``` - -##### Examples - -``` -polaris profiles update dev -``` - -### Policies - -The `policies` command is used to manage policies within Polaris. - -`policies` supports the following subcommands: - -1. attach -2. create -3. delete -4. detach -5. get -6. list -7. update - -#### attach - -The `attach` subcommand is used to create a mapping between a policy and a resource entity. - -``` -input: polaris policies attach --help -options: - attach - Named arguments: - --catalog The name of an existing catalog - --namespace A period-delimited namespace - --attachment-type The type of entity to attach the policy to, e.g., 'catalog', 'namespace', or table-like. - --attachment-path The path of the entity to attach the policy to, e.g., 'ns1.tb1'. Not required for catalog-level attachment. - --parameters Optional key-value pairs for the attachment/detachment, e.g., key=value. Can be specified multiple times. - Positional arguments: - policy -``` - -##### Examples - -``` -polaris policies attach --catalog some_catalog --namespace some.schema --attachment-type namespace --attachment-path some.schema some_policy - -polaris policies attach --catalog some_catalog --namespace some.schema --attachment-type table-like --attachment-path some.schema.t some_table_policy -``` - -#### create - -The `create` subcommand is used to create a policy. - -``` -input: polaris policies create --help -options: - create - Named arguments: - --catalog The name of an existing catalog - --namespace A period-delimited namespace - --policy-file The path to a JSON file containing the policy definition - --policy-type The type of the policy, e.g., 'system.data-compaction' - --policy-description An optional description for the policy. - Positional arguments: - policy -``` - -##### Examples - -``` -polaris policies create --catalog some_catalog --namespace some.schema --policy-file some_policy.json --policy-type system.data-compaction some_policy - -polaris policies create --catalog some_catalog --namespace some.schema --policy-file some_snapshot_expiry_policy.json --policy-type system.snapshot-expiry some_snapshot_expiry_policy -``` - -#### delete - -The `delete` subcommand is used to delete a policy. - -``` -input: polaris policies delete --help -options: - delete - Named arguments: - --catalog The name of an existing catalog - --namespace A period-delimited namespace - --detach-all When set to true, the policy will be deleted along with all its attached mappings. - Positional arguments: - policy -``` - -##### Examples - -``` -polaris policies delete --catalog some_catalog --namespace some.schema some_policy - -polaris policies delete --catalog some_catalog --namespace some.schema --detach-all some_policy -``` - -#### detach - -The `detach` subcommand is used to remove a mapping between a policy and a target entity - -``` -input: polaris policies detach --help -options: - detach - Named arguments: - --catalog The name of an existing catalog - --namespace A period-delimited namespace - --attachment-type The type of entity to attach the policy to, e.g., 'catalog', 'namespace', or table-like. - --attachment-path The path of the entity to attach the policy to, e.g., 'ns1.tb1'. Not required for catalog-level attachment. - --parameters Optional key-value pairs for the attachment/detachment, e.g., key=value. Can be specified multiple times. - Positional arguments: - policy -``` - -##### Examples - -``` -polaris policies detach --catalog some_catalog --namespace some.schema --attachment-type namespace --attachment-path some.schema some_policy - -polaris policies detach --catalog some_catalog --namespace some.schema --attachment-type catalog --attachment-path some_catalog some_policy -``` - -#### get - -The `get` subcommand is used to load a policy from the catalog. - -``` -input: polaris policies get --help -options: - get - Named arguments: - --catalog The name of an existing catalog - --namespace A period-delimited namespace - Positional arguments: - policy -``` - -##### Examples - -``` -polaris policies get --catalog some_catalog --namespace some.schema some_policy -``` - -#### list - -The `list` subcommand is used to get all policy identifiers under this namespace and all applicable policies for a specified entity. - -``` -input: polaris policies list --help -options: - list - Named arguments: - --catalog The name of an existing catalog - --namespace A period-delimited namespace - --target-name The name of the target entity (e.g., table name, namespace name). - --applicable When set, lists policies applicable to the target entity (considering inheritance) instead of policies defined directly in the target. - --policy-type The type of the policy, e.g., 'system.data-compaction' -``` - -##### Examples - -``` -polaris policies list --catalog some_catalog - -polaris policies list --catalog some_catalog --applicable -``` - -#### update - -The `update` subcommand is used to update a policy. - -``` -input: polaris policies update --help -options: - update - Named arguments: - --catalog The name of an existing catalog - --namespace A period-delimited namespace - --policy-file The path to a JSON file containing the policy definition - --policy-description An optional description for the policy. - Positional arguments: - policy -``` - -##### Examples - -``` -polaris policies update --catalog some_catalog --namespace some.schema --policy-file my_updated_policy.json my_policy - -polaris policies update --catalog some_catalog --namespace some.schema --policy-file my_updated_policy.json --policy-description "Updated policy description" my_policy -``` - -### repair - -The `repair` command is a bash script wrapper used to regenerate Python client code and update necessary dependencies, ensuring the Polaris client remains up-to-date and functional. **Please note that this command does not support any options and its usage information is not available via a `--help` flag.** - -## Examples - -This section outlines example code for a few common operations as well as for some more complex ones. - -For especially complex operations, you may wish to instead directly use the Python API. - -### Creating a principal and a catalog - -``` -polaris principals create my_user - -polaris catalogs create \ - --type internal \ - --storage-type s3 \ - --default-base-location s3://iceberg-bucket/polaris-base \ - --role-arn arn:aws:iam::111122223333:role/ExampleCorpRole \ - --allowed-location s3://iceberg-bucket/polaris-alt-location-1 \ - --allowed-location s3://iceberg-bucket/polaris-alt-location-2 \ - my_catalog -``` - -### Granting a principal the ability to manage the content of a catalog - -``` -polaris principal-roles create power_user -polaris principal-roles grant --principal my_user power_user - -polaris catalog-roles create --catalog my_catalog my_catalog_role -polaris catalog-roles grant \ - --catalog my_catalog \ - --principal-role power_user \ - my_catalog_role - -polaris privileges \ - catalog \ - --catalog my_catalog \ - --catalog-role my_catalog_role \ - grant \ - CATALOG_MANAGE_CONTENT -``` - -### Identifying the tables a given principal has been granted explicit access to read - -_Note that some other privileges, such as `CATALOG_MANAGE_CONTENT`, subsume `TABLE_READ_DATA` and would not be discovered here._ - -``` -principal_roles=$(polaris principal-roles list --principal my_principal) -for principal_role in ${principal_roles}; do - catalog_roles=$(polaris catalog-roles --list --principal-role "${principal_role}") - for catalog_role in ${catalog_roles}; do - grants=$(polaris privileges list --catalog-role "${catalog_role}" --catalog "${catalog}") - for grant in $(echo "${grants}" | jq -c '.[] | select(.privilege == "TABLE_READ_DATA")'); do - echo "${grant}" - done - done -done -``` diff --git a/1.2.0/configuration.md b/1.2.0/configuration.md deleted file mode 100644 index 78fd9cedbc..0000000000 --- a/1.2.0/configuration.md +++ /dev/null @@ -1,201 +0,0 @@ ---- -# -# Licensed to the Apache Software Foundation (ASF) under one -# or more contributor license agreements. See the NOTICE file -# distributed with this work for additional information -# regarding copyright ownership. The ASF licenses this file -# to you under the Apache License, Version 2.0 (the -# "License"); you may not use this file except in compliance -# with the License. You may obtain a copy of the License at -# -# http://www.apache.org/licenses/LICENSE-2.0 -# -# Unless required by applicable law or agreed to in writing, -# software distributed under the License is distributed on an -# "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY -# KIND, either express or implied. See the License for the -# specific language governing permissions and limitations -# under the License. -# -title: Configuring Polaris -type: docs -weight: 550 ---- - -## Overview - -This page provides information on how to configure Apache Polaris (Incubating). Unless stated -otherwise, this information is valid both for Polaris Docker images (and Kubernetes deployments) as -well as for Polaris binary distributions. - -{{< alert note >}} -For Production tips and best practices, refer to [Configuring Polaris for Production]({{% ref "configuring-polaris-for-production.md" %}}). -{{< /alert >}} - -First off, Polaris server runs on Quarkus, and uses its configuration mechanisms. Read Quarkus -[configuration guide](https://quarkus.io/guides/config) to get familiar with the basics. - -Quarkus aggregates configuration properties from multiple sources, applying them in a specific order -of precedence. When a property is defined in multiple sources, the value from the source with the -higher priority overrides those from lower-priority sources. - -The sources are listed below, from highest to lowest priority: - -1. System properties: properties set via the Java command line using `-Dproperty.name=value`. -2. Environment variables (see below for important details). -3. Settings in `$PWD/config/application.properties` file. -4. The `application.properties` files packaged in Polaris. -5. Default values: hardcoded defaults within the application. - -When using environment variables, there are two naming conventions: - -1. If possible, just use the property name as the environment variable name. This works fine in most - cases, e.g. in Kubernetes deployments. For example, `polaris.realm-context.realms` can be - included as is in a container YAML definition: - ```yaml - env: - - name: "polaris.realm-context.realms" - value: "realm1,realm2" - ``` - -2. If running from a script or shell prompt, however, stricter naming rules apply: variable names - can consist solely of uppercase letters, digits, and the `_` (underscore) sign. In such - situations, the environment variable name must be derived from the property name, by using - uppercase letters, and replacing all dots, dashes and quotes by underscores. For example, - `polaris.realm-context.realms` becomes `POLARIS_REALM_CONTEXT_REALMS`. See - [here](https://smallrye.io/smallrye-config/Main/config/environment-variables/) for more details. - -{{< alert important >}} -While convenient, uppercase-only environment variables can be problematic for complex property -names. In these situations, it's preferable to use system properties or a configuration file. -{{< /alert >}} - - -As stated above, a configuration file can also be provided at runtime; it should be available -(mounted) at `$PWD/config/application.properties` for Polaris server to recognize it. In Polaris -official Docker images, this location is `/deployment/config/application.properties`. - -For Kubernetes deployments, the configuration file is typically defined as a `ConfigMap`, then -mounted in the container at `/deployment/config/application.properties`. It can be mounted in -read-only mode, as Polaris only reads the configuration file once, at startup. - -## Polaris Configuration Options Reference - -| Configuration Property | Default Value | Description | -|----------------------------------------------------------------------------------------|------------------------------------|---------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------| -| `polaris.persistence.type` | `relational-jdbc` | Define the persistence backend used by Polaris (`in-memory`, `relational-jdbc`, `eclipse-link` (deprecated)). See [Configuring Apache Polaris for Production)[{{% ref "configuring-polaris-for-production.md" %}}) | -| `polaris.persistence.relational.jdbc.max-retries` | `1` | Total number of retries JDBC persistence will attempt on connection resets or serialization failures before giving up. | -| `polaris.persistence.relational.jdbc.max_duaration_in_ms` | `5000 ms` | Max time interval (ms) since the start of a transaction when retries can be attempted. | -| `polaris.persistence.relational.jdbc.initial_delay_in_ms` | `100 ms` | Initial delay before retrying. The delay is doubled after each retry. | -| `polaris.persistence.eclipselink.configurationFile` | | Define the location of the `persistence.xml`. By default, it's the built-in `persistence.xml` in use. | -| `polaris.persistence.eclipselink.persistenceUnit` | `polaris` | Define the name of the persistence unit to use, as defined in the `persistence.xml`. | -| `polaris.realm-context.type` | `default` | Define the type of the Polaris realm to use. | -| `polaris.realm-context.realms` | `POLARIS` | Define the list of realms to use. | -| `polaris.realm-context.header-name` | `Polaris-Realm` | Define the header name defining the realm context. | -| `polaris.features."ENFORCE_PRINCIPAL_CREDENTIAL_ROTATION_REQUIRED_CHECKING"` | `false` | Flag to enforce check if credential rotation. | -| `polaris.features."SUPPORTED_CATALOG_STORAGE_TYPES"` | `S3`, `GCS`, `AZURE` | Define the catalog supported storage. Supported values are `S3`, `GCS`, `AZURE`, `FILE`. | -| `polaris.features.realm-overrides."my-realm"."SKIP_CREDENTIAL_SUBSCOPING_INDIRECTION"` | `true` | "Override" realm features, here the skip credential subscoping indirection flag. | -| `polaris.authentication.authenticator.type` | `default` | Define the Polaris authenticator type. | -| `polaris.authentication.token-service.type` | `default` | Define the Polaris token service type. | -| `polaris.authentication.token-broker.type` | `rsa-key-pair` | Define the Polaris token broker type. Also configure the location of the key files. For RSA: if the locations of the key files are not configured, an ephemeral key-pair will be created on each Polaris server instance startup, which breaks existing tokens after server restarts and is also incompatible with running multiple Polaris server instances. | -| `polaris.authentication.token-broker.max-token-generation` | `PT1H` | Define the max token generation policy on the token broker. | -| `polaris.authentication.token-broker.rsa-key-pair.private-key-file` | | Define the location of the RSA-256 private key file, if present the `public-key` file must be specified, too. | -| `polaris.authentication.token-broker.rsa-key-pair.public-key-file` | | Define the location of the RSA-256 public key file, if present the `private-key` file must be specified, too. | -| `polaris.authentication.token-broker.symmetric-key.secret` | `secret` | Define the secret of the symmetric key. | -| `polaris.authentication.token-broker.symmetric-key.file` | `/tmp/symmetric.key` | Define the location of the symmetric key file. | -| `polaris.storage.aws.access-key` | `accessKey` | Define the AWS S3 access key. If unset, the default credential provider chain will be used. | -| `polaris.storage.aws.secret-key` | `secretKey` | Define the AWS S3 secret key. If unset, the default credential provider chain will be used. | -| `polaris.storage.gcp.token` | `token` | Define the Google Cloud Storage token. If unset, the default credential provider chain will be used. | -| `polaris.storage.gcp.lifespan` | `PT1H` | Define the Google Cloud Storage lifespan type. If unset, the default credential provider chain will be used. | -| `polaris.log.request-id-header-name` | `Polaris-Request-Id` | Define the header name to match request ID in the log. | -| `polaris.log.mdc.aid` | `polaris` | Define the log context (e.g. MDC) AID. | -| `polaris.log.mdc.sid` | `polaris-service` | Define the log context (e.g. MDC) SID. | -| `polaris.rate-limiter.filter.type` | `no-op` | Define the Polaris rate limiter. Supported values are `no-op`, `token-bucket`. | -| `polaris.rate-limiter.token-bucket.type` | `default` | Define the token bucket rate limiter. | -| `polaris.rate-limiter.token-bucket.requests-per-second` | `9999` | Define the number of requests per second for the token bucket rate limiter. | -| `polaris.rate-limiter.token-bucket.window` | `PT10S` | Define the window type for the token bucket rate limiter. | -| `polaris.metrics.tags.=` | `application=Polaris` | Define arbitrary metric tags to include in every request. | -| `polaris.metrics.realm-id-tag.api-metrics-enabled` | `false` | Whether to enable the `realm_id` metric tag in API metrics. | -| `polaris.metrics.realm-id-tag.http-metrics-enabled` | `false` | Whether to enable the `realm_id` metric tag in HTTP request metrics. | -| `polaris.metrics.realm-id-tag.http-metrics-max-cardinality` | `100` | The maximum cardinality for the `realm_id` tag in HTTP request metrics. | -| `polaris.tasks.max-concurrent-tasks` | `100` | Define the max number of concurrent tasks. | -| `polaris.tasks.max-queued-tasks` | `1000` | Define the max number of tasks in queue. | -| `polaris.config.rollback.compaction.on-conflicts.enabled` | `false` | When set to true Polaris will apply the deconfliction by rollbacking those REPLACE operations snapshots which have the property of `polaris.internal.rollback.compaction.on-conflict` in their snapshot summary set to `rollback`, to resolve conflicts at the server end. | -| `polaris.event-listener.type` | `no-op` | Define the Polaris event listener type. Supported values are `no-op`, `aws-cloudwatch`. | -| `polaris.event-listener.aws-cloudwatch.log-group` | `polaris-cloudwatch-default-group` | Define the AWS CloudWatch log group name for the event listener. | -| `polaris.event-listener.aws-cloudwatch.log-stream` | `polaris-cloudwatch-default-stream`| Define the AWS CloudWatch log stream name for the event listener. Ensure that Polaris' IAM credentials have the following actions: "PutLogEvents", "DescribeLogStreams", and "DescribeLogGroups" on the specified log stream/group. If the specified log stream/group does not exist, then "CreateLogStream" and "CreateLogGroup" will also be required. | -| `polaris.event-listener.aws-cloudwatch.region` | `us-east-1` | Define the AWS region for the CloudWatch event listener. | -| `polaris.event-listener.aws-cloudwatch.synchronous-mode` | `false` | Define whether log events are sent to CloudWatch synchronously. When set to true, events are sent synchronously which may impact performance but ensures immediate delivery. When false (default), events are sent asynchronously for better performance. | - -There are non Polaris configuration properties that can be useful: - -| Configuration Property | Default Value | Description | -|------------------------------------------------------|---------------------------------|-----------------------------------------------------------------------------| -| `quarkus.log.level` | `INFO` | Define the root log level. | -| `quarkus.log.category."org.apache.polaris".level` | | Define the log level for a specific category. | -| `quarkus.default-locale` | System locale | Force the use of a specific locale, for instance `en_US`. | -| `quarkus.http.port` | `8181` | Define the HTTP port number. | -| `quarkus.http.auth.basic` | `false` | Enable the HTTP basic authentication. | -| `quarkus.http.limits.max-body-size` | `10240K` | Define the HTTP max body size limit. | -| `quarkus.http.cors.origins` | | Define the HTTP CORS origins. | -| `quarkus.http.cors.methods` | `PATCH, POST, DELETE, GET, PUT` | Define the HTTP CORS covered methods. | -| `quarkus.http.cors.headers` | `*` | Define the HTTP CORS covered headers. | -| `quarkus.http.cors.exposed-headers` | `*` | Define the HTTP CORS covered exposed headers. | -| `quarkus.http.cors.access-control-max-age` | `PT10M` | Define the HTTP CORS access control max age. | -| `quarkus.http.cors.access-control-allow-credentials` | `true` | Define the HTTP CORS access control allow credentials flag. | -| `quarkus.management.enabled` | `true` | Enable the management server. | -| `quarkus.management.port` | `8182` | Define the port number of the Polaris management server. | -| `quarkus.management.root-path` | | Define the root path where `/metrics` and `/health` endpoints are based on. | -| `quarkus.otel.sdk.disabled` | `true` | Enable the OpenTelemetry layer. | - -{{< alert note >}} -This section is only relevant for Polaris Docker images and Kubernetes deployments. -{{< /alert >}} - -There are many other actionable environment variables available in the official Polaris Docker -image; they come from the base image used by Polaris, [ubi9/openjdk-21-runtime]. They should be used -to fine-tune the Java runtime directly, e.g. to enable debugging or to set the heap size. These -variables are not specific to Polaris, but are inherited from the base image. If in doubt, leave -everything at its default! - -[ubi9/openjdk-21-runtime]: https://catalog.redhat.com/software/containers/ubi9/openjdk-21-runtime/6501ce769a0d86945c422d5f - -| Environment variable | Description | -|----------------------------------|----------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------| -| `JAVA_OPTS` or `JAVA_OPTIONS` | **NOT RECOMMENDED**. JVM options passed to the `java` command (example: "-verbose:class"). Setting this variable will override all options set by any of the other variables in this table. To pass extra settings, use `JAVA_OPTS_APPEND` instead. | -| `JAVA_OPTS_APPEND` | User specified Java options to be appended to generated options in `JAVA_OPTS` (example: "-Dsome.property=foo"). | -| `JAVA_TOOL_OPTIONS` | This variable is defined and honored by all OpenJDK distros, see [here](https://bugs.openjdk.org/browse/JDK-4971166). Options defined here take precedence over all else; using this variable is generally not necessary, but can be useful e.g. to enforce JVM startup parameters, to set up remote debug, or to define JVM agents. | -| `JAVA_MAX_MEM_RATIO` | Is used to calculate a default maximal heap memory based on a containers restriction. If used in a container without any memory constraints for the container then this option has no effect. If there is a memory constraint then `-XX:MaxRAMPercentage` is set to a ratio of the container available memory as set here. The default is `80` which means 80% of the available memory is used as an upper boundary. You can skip this mechanism by setting this value to `0` in which case no `-XX:MaxRAMPercentage` option is added. | -| `JAVA_DEBUG` | If set remote debugging will be switched on. Disabled by default (example: true"). | -| `JAVA_DEBUG_PORT` | Port used for remote debugging. Defaults to "5005" (tip: use "*:5005" to enable debugging on all network interfaces). | -| `GC_MIN_HEAP_FREE_RATIO` | Minimum percentage of heap free after GC to avoid expansion. Default is 10. | -| `GC_MAX_HEAP_FREE_RATIO` | Maximum percentage of heap free after GC to avoid shrinking. Default is 20. | -| `GC_TIME_RATIO` | Specifies the ratio of the time spent outside the garbage collection. Default is 4. | -| `GC_ADAPTIVE_SIZE_POLICY_WEIGHT` | The weighting given to the current GC time versus previous GC times. Default is 90. | -| `GC_METASPACE_SIZE` | The initial metaspace size. There is no default (example: "20"). | -| `GC_MAX_METASPACE_SIZE` | The maximum metaspace size. There is no default (example: "100"). | -| `GC_CONTAINER_OPTIONS` | Specify Java GC to use. The value of this variable should contain the necessary JRE command-line options to specify the required GC, which will override the default of `-XX:+UseParallelGC` (example: `-XX:+UseG1GC`). | -Here are some examples: - -| Example | `docker run` option | -|--------------------------------------------|---------------------------------------------------------------------------------------------------------------------| -| Using another GC | `-e GC_CONTAINER_OPTIONS="-XX:+UseShenandoahGC"` lets Polaris use Shenandoah GC instead of the default parallel GC. | -| Set the Java heap size to a _fixed_ amount | `-e JAVA_OPTS_APPEND="-Xms8g -Xmx8g"` lets Polaris use a Java heap of 8g. | -| Set the maximum heap percentage | `-e JAVA_MAX_MEM_RATIO="70"` lets Polaris use 70% percent of the available memory. | - - -## Troubleshooting Configuration Issues - -If you encounter issues with the configuration, you can ask Polaris to print out the configuration it -is using. To do this, set the log level for the `io.smallrye.config` category to `DEBUG`, and also -set the console appender level to `DEBUG`: - -```properties -quarkus.log.console.level=DEBUG -quarkus.log.category."io.smallrye.config".level=DEBUG -``` - -{{< alert important >}} -This will print out all configuration values, including sensitive ones like -passwords. Don't do this in production, and don't share this output with anyone you don't trust! -{{< /alert >}} diff --git a/1.2.0/configuring-polaris-for-production.md b/1.2.0/configuring-polaris-for-production.md deleted file mode 100644 index 928d8115f1..0000000000 --- a/1.2.0/configuring-polaris-for-production.md +++ /dev/null @@ -1,223 +0,0 @@ ---- -# -# Licensed to the Apache Software Foundation (ASF) under one -# or more contributor license agreements. See the NOTICE file -# distributed with this work for additional information -# regarding copyright ownership. The ASF licenses this file -# to you under the Apache License, Version 2.0 (the -# "License"); you may not use this file except in compliance -# with the License. You may obtain a copy of the License at -# -# http://www.apache.org/licenses/LICENSE-2.0 -# -# Unless required by applicable law or agreed to in writing, -# software distributed under the License is distributed on an -# "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY -# KIND, either express or implied. See the License for the -# specific language governing permissions and limitations -# under the License. -# -title: Configuring Polaris for Production -linkTitle: Production Configuration -type: docs -weight: 600 ---- - -The default server configuration is intended for development and testing. When you deploy Polaris in production, -review and apply the following checklist: -- [ ] Configure OAuth2 keys -- [ ] Enforce realm header validation (`require-header=true`) -- [ ] Use a durable metastore (JDBC + PostgreSQL) -- [ ] Bootstrap valid realms in the metastore -- [ ] Disable local FILE storage - -### Configure OAuth2 - -Polaris authentication requires specifying a token broker factory type. Two implementations are -supported out of the box: - -- [rsa-key-pair] uses a pair of public and private keys; -- [symmetric-key] uses a shared secret. - -[rsa-key-pair]: https://github.com/apache/polaris/blob/390f1fa57bb1af24a21aa95fdbff49a46e31add7/service/common/src/main/java/org/apache/polaris/service/auth/JWTRSAKeyPairFactory.java -[symmetric-key]: https://github.com/apache/polaris/blob/390f1fa57bb1af24a21aa95fdbff49a46e31add7/service/common/src/main/java/org/apache/polaris/service/auth/JWTSymmetricKeyFactory.java - -By default, Polaris uses `rsa-key-pair`, with randomly generated keys. - -{{< alert important >}} -The default `rsa-key-pair` configuration is not suitable when deploying many replicas of Polaris, -as each replica will have its own set of keys. This will cause token validation to fail when a -request is routed to a different replica than the one that issued the token. -{{< /alert >}} - -It is highly recommended to configure Polaris with previously-generated RSA keys. This can be done -by setting the following properties: - -```properties -polaris.authentication.token-broker.type=rsa-key-pair -polaris.authentication.token-broker.rsa-key-pair.public-key-file=/tmp/public.key -polaris.authentication.token-broker.rsa-key-pair.private-key-file=/tmp/private.key -``` - -To generate an RSA key pair in PKCS#8 format, you can use the following commands: - -```shell -openssl genpkey -algorithm RSA -out private.key -pkeyopt rsa_keygen_bits:2048 -openssl rsa -in private.key -pubout -out public.key -``` - -Alternatively, you can use a symmetric key by setting the following properties: - -```properties -polaris.authentication.token-broker.type=symmetric-key -polaris.authentication.token-broker.symmetric-key.file=/tmp/symmetric.key -``` - -Note: it is also possible to set the symmetric key secret directly in the configuration file. If -possible, pass the secret as an environment variable to avoid storing sensitive information in the -configuration file: - -```properties -polaris.authentication.token-broker.symmetric-key.secret=${POLARIS_SYMMETRIC_KEY_SECRET} -``` - -Finally, you can also configure the token broker to use a maximum lifespan by setting the following -property: - -```properties -polaris.authentication.token-broker.max-token-generation=PT1H -``` - -Typically, in Kubernetes, you would define the keys as a `Secret` and mount them as files in the -container. - -### Realm Context Resolver - -By default, Polaris resolves realms based on incoming request headers. You can configure the realm -context resolver by setting the following properties in `application.properties`: - -```properties -polaris.realm-context.realms=POLARIS,MY-REALM -polaris.realm-context.header-name=Polaris-Realm -``` - -Where: - -- `realms` is a comma-separated list of allowed realms. This setting _must_ be correctly configured. - At least one realm must be specified. -- `header-name` is the name of the header used to resolve the realm; by default, it is - `Polaris-Realm`. - -If a request contains the specified header, Polaris will use the realm specified in the header. If -the realm is not in the list of allowed realms, Polaris will return a `404 Not Found` response. - -If a request _does not_ contain the specified header, however, by default Polaris will use the first -realm in the list as the default realm. In the above example, `POLARIS` is the default realm and -would be used if the `Polaris-Realm` header is not present in the request. - -This is not recommended for production use, as it may lead to security vulnerabilities. To avoid -this, set the following property to `true`: - -```properties -polaris.realm-context.require-header=true -``` - -This will cause Polaris to also return a `404 Not Found` response if the realm header is not present -in the request. - -### Metastore Configuration - -A metastore should be configured with an implementation that durably persists Polaris entities. By -default, Polaris uses an in-memory metastore. - -{{< alert important >}} -The default in-memory metastore is not suitable for production use, as it will lose all data -when the server is restarted; it is also unusable when multiple Polaris replicas are used. -{{< /alert >}} - -To enable a durable metastore, configure your system to use the Relational JDBC-backed metastore. -This implementation leverages Quarkus for datasource management and supports configuration through -environment variables or JVM -D flags at startup. For more information, refer to the [Quarkus configuration reference](https://quarkus.io/guides/config-reference#env-file). - -Configure the metastore by setting the following ENV variables: - -``` -POLARIS_PERSISTENCE_TYPE=relational-jdbc - -QUARKUS_DATASOURCE_USERNAME= -QUARKUS_DATASOURCE_PASSWORD= -QUARKUS_DATASOURCE_JDBC_URL= -``` - - -The relational JDBC metastore is a Quarkus-managed datasource and only supports Postgres and H2 as of now. -Please refer to the documentation here: -[Configure data sources in Quarkus](https://quarkus.io/guides/datasource) - -{{< alert important >}} -Be sure to secure your metastore backend since it will be storing sensitive data and catalog -metadata. -{{< /alert >}} - -Note: Polaris will always create schema 'polaris_schema' during bootstrap under the configured database. - -### Bootstrapping - -Before using Polaris, you must **bootstrap** the metastore. This is a manual operation that must be -performed **only once** for each realm in order to prepare the metastore to integrate with Polaris. - -By default, when bootstrapping a new realm, Polaris will create randomised `CLIENT_ID` and -`CLIENT_SECRET` for the `root` principal and store their hashes in the metastore backend. - -Depending on your database, this may not be convenient as the generated credentials are not stored -in clear text in the database. - -In order to provide your own credentials for `root` principal (so you can request tokens via -`api/catalog/v1/oauth/tokens`), use the [Polaris Admin Tool]({{% ref "admin-tool" %}}) - -You can verify the setup by attempting a token issue for the `root` principal: - -```bash -curl -X POST http://localhost:8181/api/catalog/v1/oauth/tokens \ - -d "grant_type=client_credentials" \ - -d "client_id=my-client-id" \ - -d "client_secret=my-client-secret" \ - -d "scope=PRINCIPAL_ROLE:ALL" -``` - -Which should return an access token: - -```json -{ - "access_token": "...", - "token_type": "bearer", - "issued_token_type": "urn:ietf:params:oauth:token-type:access_token", - "expires_in": 3600 -} -``` - -If you used a non-default realm name, add the appropriate request header to the `curl` command, -otherwise Polaris will resolve the realm to the first one in the configuration -`polaris.realm-context.realms`. Here is an example to set realm header: - -```bash -curl -X POST http://localhost:8181/api/catalog/v1/oauth/tokens \ - -H "Polaris-Realm: my-realm" \ - -d "grant_type=client_credentials" \ - -d "client_id=my-client-id" \ - -d "client_secret=my-client-secret" \ - -d "scope=PRINCIPAL_ROLE:ALL" -``` - -### Disable FILE Storage Type -By default, Polaris allows using the local file system (`FILE`) for catalog storage. This is fine for testing, -but **not recommended for production**. To disable it, set the supported storage types like this: -```hocon -polaris.features."SUPPORTED_CATALOG_STORAGE_TYPES" = [ "S3", "Azure" ] -``` -Leave out `FILE` to prevent its use. Only include the storage types your setup needs. - -### Upgrade Considerations - -The [Polaris Evolution](../evolution) page discusses backward compatibility and -upgrade concerns. diff --git a/1.2.0/entities.md b/1.2.0/entities.md deleted file mode 100644 index df53a0787f..0000000000 --- a/1.2.0/entities.md +++ /dev/null @@ -1,91 +0,0 @@ ---- -# -# Licensed to the Apache Software Foundation (ASF) under one -# or more contributor license agreements. See the NOTICE file -# distributed with this work for additional information -# regarding copyright ownership. The ASF licenses this file -# to you under the Apache License, Version 2.0 (the -# "License"); you may not use this file except in compliance -# with the License. You may obtain a copy of the License at -# -# http://www.apache.org/licenses/LICENSE-2.0 -# -# Unless required by applicable law or agreed to in writing, -# software distributed under the License is distributed on an -# "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY -# KIND, either express or implied. See the License for the -# specific language governing permissions and limitations -# under the License. -# -Title: Entities -type: docs -weight: 400 ---- - -This page documents various entities that can be managed in Apache Polaris (Incubating). - -## Catalog - -A catalog is a top-level entity in Polaris that may contain other entities like [namespaces](#namespace) and [tables](#table). These map directly to [Apache Iceberg catalogs](https://iceberg.apache.org/terms/#catalog). - -For information on managing catalogs with the REST API or for more information on what data can be associated with a catalog, see [the CreateCatalogRequest OpenAPI](https://github.com/apache/polaris/blob/main/spec/polaris-management-service.yml). - -### Storage Type - -All catalogs in Polaris are associated with a _storage type_. Valid Storage Types are `S3`, `Azure`, and `GCS`. The `FILE` type is also additionally available for testing. Each of these types relates to a different storage provider where data within the catalog may reside. Depending on the storage type, various other configurations may be set for a catalog including credentials to be used when accessing data inside the catalog. - -For details on how to use Storage Types in the REST API, see [the StorageConfigInfo OpenAPI](https://github.com/apache/polaris/blob/main/spec/polaris-management-service.yml). - -For usage examples of storage types, see [docs]({{% ref "command-line-interface" %}}). - -## Namespace - -A namespace is a logical entity that resides within a [catalog](#catalog) and can contain other entities such as [tables](#table) or [views](#view). Some other systems may refer to namespaces as _schemas_ or _databases_. - -In Polaris, namespaces can be nested. For example, `a.b.c.d.e.f.g` is a valid namespace. `b` is said to reside within `a`, and so on. - -For information on managing namespaces with the REST API or for more information on what data can be associated with a namespace, see [the CreateNamespaceRequest OpenAPI](https://github.com/apache/polaris/blob/main/spec/polaris-management-service.yml). - -## Table - -Polaris tables are entities that map to [Apache Iceberg tables](https://iceberg.apache.org/docs/nightly/configuration/), [Delta tables](https://docs.databricks.com/aws/en/delta/table-properties), or [Hudi tables](https://hudi.apache.org/docs/next/configurations#TABLE_CONFIG). - -For information on managing tables with the REST API or for more information on what data can be associated with a table, see [the CreateTableRequest OpenAPI](https://github.com/apache/polaris/blob/main/spec/polaris-management-service.yml). - -## View - -Polaris views are entities that map to [Apache Iceberg views](https://iceberg.apache.org/view-spec/). - -For information on managing views with the REST API or for more information on what data can be associated with a view, see [the CreateViewRequest OpenAPI](https://github.com/apache/polaris/blob/main/spec/polaris-management-service.yml). - -## Principal - -Polaris principals are unique identities that can be used to represent users or services. Each principal may have one or more [principal roles](#principal-role) assigned to it for the purpose of accessing catalogs and the entities within them. - -For information on managing principals with the REST API or for more information on what data can be associated with a principal, see [the CreatePrincipalRequest OpenAPI](https://github.com/apache/polaris/blob/main/spec/polaris-management-service.yml). - -## Principal Role - -Polaris principal roles are labels that may be granted to [principals](#principal). Each principal may have one or more principal roles, and the same principal role may be granted to multiple principals. Principal roles may be assigned based on the persona or responsibilities of a given principal, or on how that principal will need to access different entities within Polaris. - -For information on managing principal roles with the REST API or for more information on what data can be associated with a principal role, see [the CreatePrincipalRoleRequest OpenAPI](https://github.com/apache/polaris/blob/main/spec/polaris-management-service.yml) - -## Catalog Role - -Polaris catalog roles are labels that may be granted to [catalogs](#catalog). Each catalog may have one or more catalog roles, and the same catalog role may be granted to multiple catalogs. Catalog roles may be assigned based on the nature of data that will reside in a catalog, or by the groups of users and services that might need to access that data. - -Each catalog role may have multiple [privileges](#privilege) granted to it, and each catalog role can be granted to one or more [principal roles](#principal-role). This is the mechanism by which principals are granted access to entities inside a catalog such as namespaces and tables. - -## Policy - -Polaris policy is a set of rules governing actions on specified resources under predefined conditions. Polaris support policy for Iceberg table compaction, snapshot expiry, row-level access control, and custom policy definitions. - -Policy can be applied at catalog level, namespace level, or table level. Policy inheritance can be achieved by attaching one to a higher-level scope, such as namespace or catalog. As a result, tables registered under those entities do not need to be declared individually for the same policy. If a table or a namespace requires a different policy, user can assign a different policy, hence overriding policy of the same type declared at the higher level entities. - -## Privilege - -Polaris privileges are granted to [catalog roles](#catalog-role) in order to grant principals with a given principal role some degree of access to catalogs with a given catalog role. When a privilege is granted to a catalog role, any principal roles granted that catalog role receive the privilege. In turn, any principals who are granted that principal role receive it. - -A privilege can be scoped to any entity inside a catalog, including the catalog itself. - -For a list of supported privileges for each privilege class, see [the OpenAPI](https://github.com/apache/polaris/blob/main/spec/polaris-management-service.yml) (TablePrivilege, ViewPrivilege, NamespacePrivilege, CatalogPrivilege). diff --git a/1.2.0/evolution.md b/1.2.0/evolution.md deleted file mode 100644 index b3a57c7525..0000000000 --- a/1.2.0/evolution.md +++ /dev/null @@ -1,115 +0,0 @@ ---- -# -# Licensed to the Apache Software Foundation (ASF) under one -# or more contributor license agreements. See the NOTICE file -# distributed with this work for additional information -# regarding copyright ownership. The ASF licenses this file -# to you under the Apache License, Version 2.0 (the -# "License"); you may not use this file except in compliance -# with the License. You may obtain a copy of the License at -# -# http://www.apache.org/licenses/LICENSE-2.0 -# -# Unless required by applicable law or agreed to in writing, -# software distributed under the License is distributed on an -# "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY -# KIND, either express or implied. See the License for the -# specific language governing permissions and limitations -# under the License. -# -title: Polaris Evolution -type: docs -weight: 1000 ---- - -This page discusses what can be expected from Apache Polaris as the project evolves. - -## Using Polaris as a Catalog - -Polaris is primarily intended to be used as a Catalog of Tables and Views. As such, -it implements the Iceberg REST Catalog API and its own REST APIs. - -Revisions of the Iceberg REST Catalog API are controlled by the [Apache Iceberg](https://iceberg.apache.org/) -community. Polaris attempts to accurately implement this specification. Nonetheless, -optional REST Catalog features may or may not be supported immediately. In general, -there is no guarantee that Polaris releases always implement the latest version of -the Iceberg REST Catalog API. - -Any API under Polaris control that is not in an "experimental" or "beta" state -(e.g. the Management API) is maintained as a versioned REST API. New releases of Polaris -may include changes to the current version of the API. When that happens those changes -are intended to be compatible with prior versions of Polaris clients. Certain endpoints -and parameters may be deprecated. - -In case a major change is required to an API that cannot be implemented in a -backward-compatible way, new endpoints (URI paths) may be introduced. New URI "roots" may -be introduced too (e.g. `api/catalog/v2`). - -Note that those "v1", "v2", etc. URI path segments are not meant to be 1:1 with Polaris -releases or Polaris project version numbers (e.g. a "v2" path segment does not mean that -it is added in Polaris 2.0). - -Polaris servers will support deprecated API endpoints / parameters / versions / etc. -for some transition period to allow clients to migrate. - -### Managing Polaris Database - -Polaris stores its data in a database, which is sometimes referred to as "Metastore" or -"Persistence" in other docs. - -Each Polaris release may support multiple Persistence [implementations](../metastores), -for example, "EclipseLink" (deprecated) and "JDBC" (current). - -Each type of Persistence evolves individually. Within each Persistence type, Polaris -attempts to support rolling upgrades (both version X and X + 1 servers running at the -same time). - -However, migrating between different Persistence types is not supported in a rolling -upgrade manner (for example, migrating from "EclipseLink" to "JDBC"). Polaris provides -[tools](https://github.com/apache/polaris-tools/) for migrating between different -catalogs and those tools may be used to migrate between different Persistence types -as well. Service interruption (downtime) should be expected in those cases. - -## Using Polaris as a Build-Time Dependency - -Polaris produces several jars. These jars or custom builds of Polaris code may be used in -downstream projects according to the terms of the license included into Polaris distributions. - -The minimal version of the JRE required by Polaris code (compilation target) may be updated in -any release. Different Polaris jars may have different minimal JRE version requirements. - -Changes in Java class should be expected at any time regardless of the module name or -whether the class / method is `public` or not. - -This approach is not meant to discourage the use of Polaris code in downstream projects, but -to allow more flexibility in evolving the codebase to support new catalog-level features -and improve code efficiency. Maintainers of downstream projects are encouraged to join Polaris -mailing lists to monitor project changes, suggest improvements, and engage with the Polaris -community in case of specific compatibility concerns. - -## Semantic Versioning - -Polaris strives to follow [Semantic Versioning](https://semver.org/) conventions both with -respect to REST APIs (beta and experimental APIs excepted), [Polaris Policies](../policy/) -and user-facing [configuration](../configuration/). - -The following are some examples of Polaris approach to SemVer in REST APIs / configuration. -These examples are for illustration purposes and should not be considered to be -exhaustive. - -* Polaris implementing an optional Iceberg REST Catalog feature that was unimplemented -in the previous release is not considered a major change. - -* Supporting a new revision of the Iceberg REST Catalog spec in a backward-compatible way -is not considered a major change. Specifically, supporting new REST API prefixes (e.g. `v2`) -is not a major change because it does not affect older clients. - -* Changing the implementation of an Iceberg REST Catalog feature / endpoint in a non-backward -compatible way (e.g. removing or renaming a request parameter) is a major change. - -* Dropping support for a configuration property with the `polaris.` name prefix is a major change. - -* Dropping support for any previously defined [Policy](../policy/) type or property is a major change. - -* Upgrading Quarkus Runtime to its next major version is a major change (because -Quarkus-managed configuration may change). diff --git a/1.2.0/federation/_index.md b/1.2.0/federation/_index.md deleted file mode 100644 index e4fbe261a0..0000000000 --- a/1.2.0/federation/_index.md +++ /dev/null @@ -1,26 +0,0 @@ ---- -# -# Licensed to the Apache Software Foundation (ASF) under one -# or more contributor license agreements. See the NOTICE file -# distributed with this work for additional information -# regarding copyright ownership. The ASF licenses this file -# to you under the Apache License, Version 2.0 (the -# "License"); you may not use this file except in compliance -# with the License. You may obtain a copy of the License at -# -# http://www.apache.org/licenses/LICENSE-2.0 -# -# Unless required by applicable law or agreed to in writing, -# software distributed under the License is distributed on an -# "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY -# KIND, either express or implied. See the License for the -# specific language governing permissions and limitations -# under the License. -# -title: Federation -type: docs -weight: 703 ---- - -Guides for federating Polaris with existing metadata services. Expand this section to select a -specific integration. diff --git a/1.2.0/federation/hive-metastore-federation.md b/1.2.0/federation/hive-metastore-federation.md deleted file mode 100644 index 0d39a5e4a0..0000000000 --- a/1.2.0/federation/hive-metastore-federation.md +++ /dev/null @@ -1,125 +0,0 @@ ---- -# -# Licensed to the Apache Software Foundation (ASF) under one -# or more contributor license agreements. See the NOTICE file -# distributed with this work for additional information -# regarding copyright ownership. The ASF licenses this file -# to you under the Apache License, Version 2.0 (the -# "License"); you may not use this file except in compliance -# with the License. You may obtain a copy of the License at -# -# http://www.apache.org/licenses/LICENSE-2.0 -# -# Unless required by applicable law or agreed to in writing, -# software distributed under the License is distributed on an -# "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY -# KIND, either express or implied. See the License for the -# specific language governing permissions and limitations -# under the License. -# -title: Hive Metastore Federation -type: docs -weight: 705 ---- - -Polaris can federate catalog operations to an existing Hive Metastore (HMS). This lets an external -HMS remain the source of truth for table metadata while Polaris brokers access, policies, and -multi-engine connectivity. - -## Build-time enablement - -The Hive factory is packaged as an optional extension and is not baked into default server builds. -Include it when assembling the runtime or container images by setting the `NonRESTCatalogs` Gradle -property to include `HIVE` (and any other non-REST backends you need): - -```bash -./gradlew :polaris-server:assemble :polaris-server:quarkusAppPartsBuild --rerun \ - -DNonRESTCatalogs=HIVE -Dquarkus.container-image.build=true -``` - -`runtime/server/build.gradle.kts` wires the extension in only when this flag is present, so binaries -built without it will reject Hive federation requests. - -## Runtime requirements - -- **Metastore connectivity:** Expose the HMS Thrift endpoint (`thrift://host:port`) to the Polaris - deployment. -- **Configuration discovery:** Iceberg’s `HiveCatalog` loads Hadoop/Hive client settings from the - classpath. Provide `hive-site.xml` (and `core-site.xml` if needed) via - `HADOOP_CONF_DIR`/`HIVE_CONF_DIR` or an image layer. -- **Authentication:** Hive federation only supports `IMPLICIT` authentication, meaning Polaris uses - the operating-system or Kerberos identity of the running process (no stored secrets). Ensure the - service principal is logged in or holds a valid keytab/TGT before starting Polaris. -- **Object storage role:** Configure `polaris.service-identity..aws-iam.*` (or the default - realm) so the server can assume the AWS role referenced by the catalog. The IAM role must allow - STS access from the Polaris service identity and grant permissions to the table locations. - -### Kerberos setup example - -If your Hive Metastore enforces Kerberos, stage the necessary configuration alongside Polaris: - -```bash -export KRB5_CONFIG=/etc/polaris/krb5.conf -export HADOOP_CONF_DIR=/etc/polaris/hadoop-conf # contains hive-site.xml with HMS principal -export HADOOP_OPTS="-Djava.security.auth.login.config=/etc/polaris/jaas.conf" -kinit -kt /etc/polaris/keytabs/polaris.keytab polaris/service@EXAMPLE.COM -``` - -- `hive-site.xml` must define `hive.metastore.sasl.enabled=true`, the metastore principal, and - client principal pattern (for example `hive.metastore.client.kerberos.principal=polaris/_HOST@REALM`). -- The JAAS entry (referenced by `java.security.auth.login.config`) should use `useKeyTab=true` and - point to the same keytab shown above so the Polaris JVM can refresh credentials automatically. -- Keep the keytab readable solely by the Polaris service user; the implicit authenticator consumes - the TGT at startup and for periodic renewal. - -## Creating a federated catalog - -Use the Management API (or the Python CLI) to create an external catalog whose connection type is -`HIVE`. The following request registers a catalog that proxies to an HMS running on -`thrift://hms.example.internal:9083`: - -```bash -curl -X POST https:///management/v1/catalogs \ - -H "Authorization: Bearer $TOKEN" \ - -H "Content-Type: application/json" \ - -d '{ - "type": "EXTERNAL", - "name": "analytics_hms", - "storageConfigInfo": { - "storageType": "S3", - "roleArn": "arn:aws:iam::123456789012:role/polaris-warehouse-access", - "region": "us-east-1" - }, - "properties": { "default-base-location": "s3://analytics-bucket/warehouse/" }, - "connectionConfigInfo": { - "connectionType": "HIVE", - "uri": "thrift://hms.example.internal:9083", - "warehouse": "s3://analytics-bucket/warehouse/", - "authenticationParameters": { "authenticationType": "IMPLICIT" } - } - }' -``` - -Grant catalog roles to principal roles exactly as you would for internal catalogs so engines can -obtain tokens that authorize against the federated metadata. - -`default-base-location` is required; it tells Polaris and Iceberg where to place new metadata files. -`allowedLocations` is optional—supply it only when you want to restrict writers to a specific set of -prefixes. If your IAM trust policy requires an `externalId` or explicit `userArn`, include those -optional fields in `storageConfigInfo`. Polaris persists them and supplies them when assuming the -role cited by `roleArn` during metadata commits. - -## Limitations and operational notes - -- **Single identity:** Because only `IMPLICIT` authentication is permitted, Polaris cannot mix - multiple Hive identities in a single deployment (`HiveFederatedCatalogFactory` rejects other auth - types). Plan a deployment topology that aligns the Polaris process identity with the target HMS. -- **Generic tables:** The Hive extension exposes Iceberg tables registered in HMS. Generic table - federation is not implemented (`HiveFederatedCatalogFactory#createGenericCatalog` throws - `UnsupportedOperationException`). -- **Configuration caching:** Atlas-style catalog failover and multi-HMS routing are not yet handled; - Polaris initializes one `HiveCatalog` per connection and relies on the underlying Iceberg client - for retries. - -With these constraints satisfied, Polaris can sit in front of an HMS so that Iceberg tables managed -there gain OAuth-protected, multi-engine access through the Polaris REST APIs. diff --git a/1.2.0/federation/iceberg-rest-federation.md b/1.2.0/federation/iceberg-rest-federation.md deleted file mode 100644 index 8318f45095..0000000000 --- a/1.2.0/federation/iceberg-rest-federation.md +++ /dev/null @@ -1,71 +0,0 @@ ---- -# -# Licensed to the Apache Software Foundation (ASF) under one -# or more contributor license agreements. See the NOTICE file -# distributed with this work for additional information -# regarding copyright ownership. The ASF licenses this file -# to you under the Apache License, Version 2.0 (the -# "License"); you may not use this file except in compliance -# with the License. You may obtain a copy of the License at -# -# http://www.apache.org/licenses/LICENSE-2.0 -# -# Unless required by applicable law or agreed to in writing, -# software distributed under the License is distributed on an -# "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY -# KIND, either express or implied. See the License for the -# specific language governing permissions and limitations -# under the License. -# -title: Iceberg REST Federation -type: docs -weight: 704 ---- - -Polaris can federate an external Iceberg REST catalog (e.g., another Polaris deployment, AWS Glue, or a custom Iceberg -REST implementation), enabling a Polaris service to access table and view entities managed by remote Iceberg REST Catalogs. - -## Runtime requirements - -- **REST endpoint:** The remote service must expose the Iceberg REST specification. Configure - firewalls so Polaris can reach the base URI you provide in the connection config. -- **Authentication:** Polaris forwards requests using the credentials defined in - `ConnectionConfigInfo.AuthenticationParameters`. OAuth2 client credentials, bearer tokens, and AWS - SigV4 are supported; choose the scheme the remote service expects. - -## Creating a federated REST catalog - -The snippet below registers an external catalog that forwards to a remote Polaris server using OAuth2 -client credentials. `iceberg-remote-catalog-name` is optional; supply it when the remote server multiplexes -multiple logical catalogs under one URI. - -```bash -polaris catalogs create \ - --type EXTERNAL \ - --storage-type s3 \ - --role-arn "arn:aws:iam::123456789012:role/polaris-warehouse-access" \ - --default-base-location "s3://analytics-bucket/warehouse/" \ - --catalog-connection-type iceberg-rest \ - --iceberg-remote-catalog-name analytics \ - --catalog-uri "https://remote-polaris.example.com/catalog/v1" \ - --catalog-authentication-type OAUTH \ - --catalog-token-uri "https://remote-polaris.example.com/catalog/v1/oauth/tokens" \ - --catalog-client-id "" \ - --catalog-client-secret "" \ - --catalog-client-scopes "PRINCIPAL_ROLE:ALL" \ - analytics_rest -``` - -Refer to the [CLI documentation](../command-line-interface.md#catalogs) for details on alternative authentication types such as BEARER or SIGV4. - -Grant catalog roles to principal roles the same way you do for internal catalogs so compute engines -receive tokens with access to the federated namespace. - -## Operational notes - -- **Connectivity checks:** Polaris does not lazily probe the remote service; catalog creation fails if - the REST endpoint is unreachable or authentication is rejected. -- **Feature parity:** Federation exposes whatever table/namespace operations the remote service - implements. Unsupported features return the remote error directly to callers. -- **Generic tables:** The REST federation path currently surfaces Iceberg tables only; generic table - federation is not implemented. diff --git a/1.2.0/generic-table.md b/1.2.0/generic-table.md deleted file mode 100644 index 63ef38a1da..0000000000 --- a/1.2.0/generic-table.md +++ /dev/null @@ -1,169 +0,0 @@ ---- -# -# Licensed to the Apache Software Foundation (ASF) under one -# or more contributor license agreements. See the NOTICE file -# distributed with this work for additional information -# regarding copyright ownership. The ASF licenses this file -# to you under the Apache License, Version 2.0 (the -# "License"); you may not use this file except in compliance -# with the License. You may obtain a copy of the License at -# -# http://www.apache.org/licenses/LICENSE-2.0 -# -# Unless required by applicable law or agreed to in writing, -# software distributed under the License is distributed on an -# "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY -# KIND, either express or implied. See the License for the -# specific language governing permissions and limitations -# under the License. -# -title: Generic Table (Beta) -type: docs -weight: 435 ---- - -The Generic Table in Apache Polaris is designed to provide support for non-Iceberg tables across different table formats includes delta, csv etc. It currently provides the following capabilities: -- Create a generic table under a namespace -- Load a generic table -- Drop a generic table -- List all generic tables under a namespace - -**NOTE** The current generic table is in beta release. Please use it with caution and report any issue if encountered. - -## What is a Generic Table? - -A generic table in Polaris is an entity that defines the following fields: - -- **name** (required): A unique identifier for the table within a namespace -- **format** (required): The format for the generic table, i.e. "delta", "csv" -- **base-location** (optional): Table base location in URI format. For example: s3:///path/to/table - - The table base location is a location that includes all files for the table - - A table with multiple disjoint locations (i.e. containing files that are outside the configured base location) is not compliant with the current generic table support in Polaris. - - If no location is provided, clients or users are responsible for managing the location. -- **properties** (optional): Properties for the generic table passed on creation. - - Currently, there is no reserved property key defined. - - The property definition and interpretation is delegated to client or engine implementations. -- **doc** (optional): Comment or description for the table - -## Generic Table API Vs. Iceberg Table API - -Generic Table provides a different set of APIs to operate on the generic table entities while Iceberg APIs operates on -the Iceberg table entities. - -| Operations | **Iceberg Table API** | **Generic Table API** | -|--------------|---------------------------------------------------------------------------------------------------------------------------------------------------------------------|-------------------------------------------------------------------------------------------------------------------------------| -| Create Table | Create an Iceberg table | Create a generic table | -| Load Table | Load an Iceberg table. If the table to load is a generic table, you need to call the Generic Table loadTable API, otherwise a TableNotFoundException will be thrown | Load a generic table. Similarly, try to load an Iceberg table through Generic Table API will thrown a TableNotFoundException. | -| Drop Table | Drop an Iceberg table. Similar as load table, if the table to drop is a Generic table, a tableNotFoundException will be thrown. | Drop a generic table. Drop an Iceberg table through Generic table endpoint will thrown an TableNotFound Exception | -| List Table | List all Iceberg tables | List all generic tables | - -Note that generic table shares the same namespace with Iceberg tables, the table name has to be unique under the same namespace. Furthermore, since -there is currently no support for Update Generic Table, any update to the existing table requires a drop and re-create. - -## Working with Generic Table - -There are two ways to work with Polaris Generic Tables today: -1) Directly communicate with Polaris through REST API calls using tools such as `curl`. Details will be described in the later section. -2) Use the Spark client provided if you are working with Spark. Please refer to [Polaris Spark Client]({{% ref "polaris-spark-client" %}}) for detailed instructions. - -### Create a Generic Table - -To create a generic table, you need to provide the corresponding fields as described in [What is a Generic Table](#what-is-a-generic-table). - -The REST API for creating a generic Table is `POST /polaris/v1/{prefix}/namespaces/{namespace}/generic-tables`, and the -request body looks like the following: - -```json -{ - "name": "", - "format": "", - "base-location": "", - "doc": "", - "properties": { - "": "" - } -} -``` - -Here is an example to create a generic table with name `delta_table` and format as `delta` under a namespace `delta_ns` -for catalog `delta_catalog` using curl: - -```shell -curl -X POST http://localhost:8181/api/catalog/polaris/v1/delta_catalog/namespaces/delta_ns/generic-tables \ - -H "Content-Type: application/json" \ - -d '{ - "name": "delta_table", - "format": "delta", - "base-location": "s3:///path/to/table", - "doc": "delta table example", - "properties": { - "key1": "value1" - } - }' -``` - -### Load a Generic Table -The REST endpoint for load a generic table is `GET /polaris/v1/{prefix}/namespaces/{namespace}/generic-tables/{generic-table}`. - -Here is an example to load the table `delta_table` using curl: -```shell -curl -X GET http://localhost:8181/api/catalog/polaris/v1/delta_catalog/namespaces/delta_ns/generic-tables/delta_table -``` -And the response looks like the following: -```json -{ - "table": { - "name": "delta_table", - "format": "delta", - "base-location": "s3:///path/to/table", - "doc": "delta table example", - "properties": { - "key1": "value1" - } - } -} -``` - -### List Generic Tables -The REST endpoint for listing the generic tables under a given -namespace is `GET /polaris/v1/{prefix}/namespaces/{namespace}/generic-tables/`. - -Following curl command lists all tables under namespace delta_namespace: -```shell -curl -X GET http://localhost:8181/api/catalog/polaris/v1/delta_catalog/namespaces/delta_ns/generic-tables/ -``` -Example Response: -```json -{ - "identifiers": [ - { - "namespace": ["delta_ns"], - "name": "delta_table" - } - ], - "next-page-token": null -} -``` - -### Drop a Generic Table -The drop generic table REST endpoint is `DELETE /polaris/v1/{prefix}/namespaces/{namespace}/generic-tables/{generic-table}` - -The following curl call drops the table `delat_table`: -```shell -curl -X DELETE http://localhost:8181/api/catalog/polaris/v1/delta_catalog/namespaces/delta_ns/generic-tables/{generic-table} -``` - -### API Reference - -For the complete and up-to-date API specification, see the [Catalog API Spec](https://editor-next.swagger.io/?url=https://raw.githubusercontent.com/apache/polaris/refs/heads/main/spec/generated/bundled-polaris-catalog-service.yaml). - -## Limitations - -Current limitations of Generic Table support: -1) Limited spec information. Currently, there is no spec for information like Schema, Partition etc. -2) No commit coordination or update capability provided at the catalog service level. - -Therefore, the catalog itself is unaware of anything about the underlying table except some of the loosely defined metadata. -It is the responsibility of the engine (and plugins used by the engine) to determine exactly how loading or commiting data -should look like based on the metadata. For example, with the delta support, th delta log serialization, deserialization -and update all happens at client side. diff --git a/1.2.0/getting-started/creating-a-catalog/_index.md b/1.2.0/getting-started/creating-a-catalog/_index.md deleted file mode 100644 index eeaf431733..0000000000 --- a/1.2.0/getting-started/creating-a-catalog/_index.md +++ /dev/null @@ -1,54 +0,0 @@ ---- -# -# Licensed to the Apache Software Foundation (ASF) under one -# or more contributor license agreements. See the NOTICE file -# distributed with this work for additional information -# regarding copyright ownership. The ASF licenses this file -# to you under the Apache License, Version 2.0 (the -# "License"); you may not use this file except in compliance -# with the License. You may obtain a copy of the License at -# -# http://www.apache.org/licenses/LICENSE-2.0 -# -# Unless required by applicable law or agreed to in writing, -# software distributed under the License is distributed on an -# "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY -# KIND, either express or implied. See the License for the -# specific language governing permissions and limitations -# under the License. -# -title: Creating a Catalog -linkTitle: Creating a Catalog -type: docs -weight: 300 ---- - -The following Object Storage providers can be configured as storage backends for your Polaris catalog: - -- [S3 compatible object stores]({{< ref "s3.md" >}}) -- [Google Cloud Storage]({{< ref "catalog-gcs.md" >}}) -- [Azure Blob Storage]({{< ref "catalog-azure.md" >}}) -- Local file system (By default for testing only) - - -## Create a catalog using polaris CLI - -Check full list of options for the `polaris catalogs create` command [here]({{% ref "../../command-line-interface#create" %}}) - -### Example - -```shell -CLIENT_ID=root \ -CLIENT_SECRET=s3cr3t \ -DEFAULT_BASE_LOCATION=s3://example-bucket/my_data \ -ROLE_ARN=arn:aws:iam::111122223333:role/ExampleCorpRole \ -./polaris \ - --client-id ${CLIENT_ID} \ - --client-secret ${CLIENT_SECRET} \ - catalogs \ - create \ - --storage-type s3 \ - --default-base-location ${DEFAULT_BASE_LOCATION} \ - --role-arn ${ROLE_ARN} \ - my_catalog -``` diff --git a/1.2.0/getting-started/creating-a-catalog/catalog-azure.md b/1.2.0/getting-started/creating-a-catalog/catalog-azure.md deleted file mode 100644 index 8666f28876..0000000000 --- a/1.2.0/getting-started/creating-a-catalog/catalog-azure.md +++ /dev/null @@ -1,55 +0,0 @@ ---- -# -# Licensed to the Apache Software Foundation (ASF) under one -# or more contributor license agreements. See the NOTICE file -# distributed with this work for additional information -# regarding copyright ownership. The ASF licenses this file -# to you under the Apache License, Version 2.0 (the -# "License"); you may not use this file except in compliance -# with the License. You may obtain a copy of the License at -# -# http://www.apache.org/licenses/LICENSE-2.0 -# -# Unless required by applicable law or agreed to in writing, -# software distributed under the License is distributed on an -# "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY -# KIND, either express or implied. See the License for the -# specific language governing permissions and limitations -# under the License. -# -title: Creating a catalog on Azure -linkTitle: Azure -type: docs -weight: 300 ---- - -For the `polaris catalogs create` [command]({{% ref "../../command-line-interface#create" %}}) there are few `azure` only options - -```text ---storage-type azure ---tenant-id (Required for Azure) A tenant ID to use when connecting to Azure Storage ---multi-tenant-app-name (Only for Azure) The app name to use when connecting to Azure Storage ---consent-url (Only for Azure) A consent URL granting permissions for the Azure Storage location -``` - -### example - -```shell -CLIENT_ID=root \ -CLIENT_SECRET=s3cr3t \ -DEFAULT_BASE_LOCATION=abfss://tenant123@blob.core.windows.net \ -TENANT_ID=tenant123.onmicrosoft.com \ -MULTI_TENANT_APP_NAME=myapp \ -CONSENT_URL=https://myapp.com/consent -./polaris \ - --client-id ${CLIENT_ID} \ - --client-secret ${CLIENT_SECRET} \ - catalogs \ - create \ - --storage-type azure \ - --tenant-id ${TENANT_ID} \ - --multi-tenant-app-name ${MULTI_TENANT_APP_NAME} \ - --consent-url ${CONSENT_URL} \ - --default-base-location ${DEFAULT_BASE_LOCATION} \ - my_azure_catalog -``` \ No newline at end of file diff --git a/1.2.0/getting-started/creating-a-catalog/catalog-gcs.md b/1.2.0/getting-started/creating-a-catalog/catalog-gcs.md deleted file mode 100644 index db6214e38c..0000000000 --- a/1.2.0/getting-started/creating-a-catalog/catalog-gcs.md +++ /dev/null @@ -1,49 +0,0 @@ ---- -# -# Licensed to the Apache Software Foundation (ASF) under one -# or more contributor license agreements. See the NOTICE file -# distributed with this work for additional information -# regarding copyright ownership. The ASF licenses this file -# to you under the Apache License, Version 2.0 (the -# "License"); you may not use this file except in compliance -# with the License. You may obtain a copy of the License at -# -# http://www.apache.org/licenses/LICENSE-2.0 -# -# Unless required by applicable law or agreed to in writing, -# software distributed under the License is distributed on an -# "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY -# KIND, either express or implied. See the License for the -# specific language governing permissions and limitations -# under the License. -# -title: Creating a catalog on Google Cloud Storage (GCS) -linkTitle: GCS -type: docs -weight: 200 ---- - -For the `polaris catalogs create` [command]({{% ref "../../command-line-interface#create" %}}) there are few `gcs` only options - -```text ---storage-type gcs ---service-account (Only for GCS) The service account to use when connecting to GCS -``` - -### example - -```shell -CLIENT_ID=root \ -CLIENT_SECRET=s3cr3t \ -DEFAULT_BASE_LOCATION=gs://my-ml-bucket/predictions/ \ -SERVICE_ACCOUNT=serviceAccount:my-service-account@my-project.iam.gserviceaccount.com \ -./polaris \ - --client-id ${CLIENT_ID} \ - --client-secret ${CLIENT_SECRET} \ - catalogs \ - create \ - --storage-type gcs \ - --service-account ${SERVICE_ACCOUNT} \ - --default-base-location ${DEFAULT_BASE_LOCATION} \ - my_gcs_catalog -``` \ No newline at end of file diff --git a/1.2.0/getting-started/creating-a-catalog/s3/_index.md b/1.2.0/getting-started/creating-a-catalog/s3/_index.md deleted file mode 100644 index 538bca17a9..0000000000 --- a/1.2.0/getting-started/creating-a-catalog/s3/_index.md +++ /dev/null @@ -1,38 +0,0 @@ ---- -# -# Licensed to the Apache Software Foundation (ASF) under one -# or more contributor license agreements. See the NOTICE file -# distributed with this work for additional information -# regarding copyright ownership. The ASF licenses this file -# to you under the Apache License, Version 2.0 (the -# "License"); you may not use this file except in compliance -# with the License. You may obtain a copy of the License at -# -# http://www.apache.org/licenses/LICENSE-2.0 -# -# Unless required by applicable law or agreed to in writing, -# software distributed under the License is distributed on an -# "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY -# KIND, either express or implied. See the License for the -# specific language governing permissions and limitations -# under the License. -# -title: Creating a catalog on S3 compatible cloud providers -linkTitle: S3 -type: docs -weight: 100 ---- - -The following S3 compatible cloud providers can be configured as storage backends for your Polaris catalog: - -- [AWS S3]({{< ref "catalog-aws.md" >}}) -- [MinIO]({{< ref "catalog-minio.md" >}}) - -For the `polaris catalogs create` [command]({{% ref "../../../command-line-interface#create" %}}) there are few `s3` only options - -```text ---storage-type s3 ---role-arn (Only for AWS S3) A role ARN to use when connecting to S3 ---region (Only for S3) The region to use when connecting to S3 ---external-id (Only for S3) The external ID to use when connecting to S3 -``` diff --git a/1.2.0/getting-started/creating-a-catalog/s3/catalog-aws.md b/1.2.0/getting-started/creating-a-catalog/s3/catalog-aws.md deleted file mode 100644 index b86ac874f8..0000000000 --- a/1.2.0/getting-started/creating-a-catalog/s3/catalog-aws.md +++ /dev/null @@ -1,52 +0,0 @@ ---- -# -# Licensed to the Apache Software Foundation (ASF) under one -# or more contributor license agreements. See the NOTICE file -# distributed with this work for additional information -# regarding copyright ownership. The ASF licenses this file -# to you under the Apache License, Version 2.0 (the -# "License"); you may not use this file except in compliance -# with the License. You may obtain a copy of the License at -# -# http://www.apache.org/licenses/LICENSE-2.0 -# -# Unless required by applicable law or agreed to in writing, -# software distributed under the License is distributed on an -# "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY -# KIND, either express or implied. See the License for the -# specific language governing permissions and limitations -# under the License. -# -title: Creating a catalog on AWS S3 -linkTitle: AWS -type: docs -weight: 100 ---- - -When creating a catalog based on AWS S3 storage only the `role-arn` is a required parameter. However, usually -one also provides the `region` and -[external-id](https://docs.aws.amazon.com/IAM/latest/UserGuide/id_roles_common-scenarios_third-party.html). - -Note: the name `quickstart_catalog` from the example below is referenced in other Getting Started examples, -but of course, it can be any valid catalog name. - -```shell -CLIENT_ID=root -CLIENT_SECRET=s3cr3t -DEFAULT_BASE_LOCATION=s3://example-bucket/my_data -ROLE_ARN=arn:aws:iam::111122223333:role/ExampleCorpRole -REGION=us-west-2 -EXTERNAL_ID=12345678901234567890 - -./polaris \ - --client-id ${CLIENT_ID} \ - --client-secret ${CLIENT_SECRET} \ - catalogs \ - create \ - --storage-type s3 \ - --default-base-location ${DEFAULT_BASE_LOCATION} \ - --role-arn ${ROLE_ARN} \ - --region ${REGION} \ - --external-id ${EXTERNAL_ID} \ - quickstart_catalog -``` \ No newline at end of file diff --git a/1.2.0/getting-started/creating-a-catalog/s3/catalog-minio.md b/1.2.0/getting-started/creating-a-catalog/s3/catalog-minio.md deleted file mode 100644 index cdeeb12775..0000000000 --- a/1.2.0/getting-started/creating-a-catalog/s3/catalog-minio.md +++ /dev/null @@ -1,63 +0,0 @@ ---- -# -# Licensed to the Apache Software Foundation (ASF) under one -# or more contributor license agreements. See the NOTICE file -# distributed with this work for additional information -# regarding copyright ownership. The ASF licenses this file -# to you under the Apache License, Version 2.0 (the -# "License"); you may not use this file except in compliance -# with the License. You may obtain a copy of the License at -# -# http://www.apache.org/licenses/LICENSE-2.0 -# -# Unless required by applicable law or agreed to in writing, -# software distributed under the License is distributed on an -# "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY -# KIND, either express or implied. See the License for the -# specific language governing permissions and limitations -# under the License. -# -title: Creating a catalog on MinIO -linkTitle: MinIO -type: docs -weight: 200 ---- - -When creating a catalog based on MinIO storage it is important to configure the `endpoint` property to point -to your own MinIO cluster. If the `endpoint` property is not set, Polaris will attempt to contact AWS -storage services (which is certain to fail in this case). - -Note: the region setting is not required by MinIO, but it is set in this example for the sake of -simplicity as it is usually required by the AWS SDK (used internally by Polaris). One can also -set the `AWS_REGION` environment variable in the Polaris server process and avoid setting region -as a catalog property. - -Note: the name `quickstart_catalog` from the example below is referenced in other Getting Started examples, -but of course, it can be any valid catalog name. - -```shell -CLIENT_ID=root -CLIENT_SECRET=s3cr3t -DEFAULT_BASE_LOCATION=s3://example-bucket/my_data -REGION=us-west-2 - -./polaris \ - --client-id ${CLIENT_ID} \ - --client-secret ${CLIENT_SECRET} \ - catalogs \ - create \ - --storage-type s3 \ - --endpoint http://127.0.0.1:9100 - --default-base-location ${DEFAULT_BASE_LOCATION} \ - --region ${REGION} \ - quickstart_catalog -``` - -In more complex deployments it may be necessary to configure different endpoints for S3 requests -and for STS (AssumeRole) requests. This can be achieved via the `--sts-endpoint` CLI option. - -Additionally, the `--endpoint-internal` CLI option cane be used to set the S3 endpoint for use by -the Polaris Server itself, if it needs to be different from the endpoint used by clients / engines. - -A usable MinIO example for `docker-compose` is available in the Polaris source code under the -[getting-started/minio](https://github.com/apache/polaris/tree/main/getting-started/minio) module. diff --git a/1.2.0/getting-started/deploying-polaris/_index.md b/1.2.0/getting-started/deploying-polaris/_index.md deleted file mode 100644 index e975f69274..0000000000 --- a/1.2.0/getting-started/deploying-polaris/_index.md +++ /dev/null @@ -1,29 +0,0 @@ ---- -# -# Licensed to the Apache Software Foundation (ASF) under one -# or more contributor license agreements. See the NOTICE file -# distributed with this work for additional information -# regarding copyright ownership. The ASF licenses this file -# to you under the Apache License, Version 2.0 (the -# "License"); you may not use this file except in compliance -# with the License. You may obtain a copy of the License at -# -# http://www.apache.org/licenses/LICENSE-2.0 -# -# Unless required by applicable law or agreed to in writing, -# software distributed under the License is distributed on an -# "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY -# KIND, either express or implied. See the License for the -# specific language governing permissions and limitations -# under the License. -# -title: Deploying Polaris -linkTitle: Deploying Polaris -type: docs -weight: 200 ---- - -Here you can find the guides of how to deploy Polaris locally, as well as with all supported Cloud Providers: Amazon Web Services (AWS), Azure, and Google Cloud Platform (GCP). - -Locally, Polaris can be deployed using both docker and local build. -On the cloud, the following tutorials will deploy Polaris using docker environment. diff --git a/1.2.0/getting-started/deploying-polaris/cloud-deploy/_index.md b/1.2.0/getting-started/deploying-polaris/cloud-deploy/_index.md deleted file mode 100644 index 56626bc2c3..0000000000 --- a/1.2.0/getting-started/deploying-polaris/cloud-deploy/_index.md +++ /dev/null @@ -1,27 +0,0 @@ ---- -# -# Licensed to the Apache Software Foundation (ASF) under one -# or more contributor license agreements. See the NOTICE file -# distributed with this work for additional information -# regarding copyright ownership. The ASF licenses this file -# to you under the Apache License, Version 2.0 (the -# "License"); you may not use this file except in compliance -# with the License. You may obtain a copy of the License at -# -# http://www.apache.org/licenses/LICENSE-2.0 -# -# Unless required by applicable law or agreed to in writing, -# software distributed under the License is distributed on an -# "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY -# KIND, either express or implied. See the License for the -# specific language governing permissions and limitations -# under the License. -# -title: Deploying Polaris on Cloud Providers -linkTitle: Cloud Providers -type: docs -weight: 300 ---- - -Polaris can be deployed on various cloud providers, including Amazon Web Services (AWS), Azure, and Google Cloud Platform (GCP). -In the following guides, we will walk you through the process of deploying Polaris on each of these cloud providers. diff --git a/1.2.0/getting-started/deploying-polaris/cloud-deploy/deploy-aws.md b/1.2.0/getting-started/deploying-polaris/cloud-deploy/deploy-aws.md deleted file mode 100644 index 485365d59b..0000000000 --- a/1.2.0/getting-started/deploying-polaris/cloud-deploy/deploy-aws.md +++ /dev/null @@ -1,59 +0,0 @@ ---- -# -# Licensed to the Apache Software Foundation (ASF) under one -# or more contributor license agreements. See the NOTICE file -# distributed with this work for additional information -# regarding copyright ownership. The ASF licenses this file -# to you under the Apache License, Version 2.0 (the -# "License"); you may not use this file except in compliance -# with the License. You may obtain a copy of the License at -# -# http://www.apache.org/licenses/LICENSE-2.0 -# -# Unless required by applicable law or agreed to in writing, -# software distributed under the License is distributed on an -# "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY -# KIND, either express or implied. See the License for the -# specific language governing permissions and limitations -# under the License. -# -title: Deploying Polaris on Amazon Web Services (AWS) -linkTitle: AWS -type: docs -weight: 310 ---- - -Build and launch Polaris using the AWS Startup Script at the location provided in the command below. This script will start an [Amazon RDS for PostgreSQL](https://aws.amazon.com/rds/postgresql/) instance, which will be used as the backend Postgres instance holding all Polaris data. -Additionally, Polaris will be bootstrapped to use this database and Docker containers will be spun up for Spark SQL and Trino. - -The requirements to run the script below are: -* There must be at least two subnets created in the VPC and region in which your EC2 instance reside. The span of subnets MUST include at least 2 availability zones (AZs) within the same region. -* Your EC2 instance must be enabled with [IMDSv1 or IMDSv2 with 2+ hop limit](https://docs.aws.amazon.com/AWSEC2/latest/UserGuide/configuring-IMDS-new-instances.html#configure-IMDS-new-instances-instance-settings). -* The AWS identity that you will use to run this script must have the following AWS permissions: - * "ec2:DescribeInstances" - * "rds:CreateDBInstance" - * "rds:DescribeDBInstances" - * "rds:CreateDBSubnetGroup" - * "sts:AssumeRole" on the same role as the Instance Profile role of the EC2 instance on which you are running this script. Additionally, you should ensure that the Instance Profile contains a trust policy that allows the role to trust itself to be assumed. - -```shell -export ASSETS_PATH=$(pwd)/getting-started/assets/ -export CLIENT_ID=root -export CLIENT_SECRET=s3cr3t -./getting-started/assets/cloud_providers/deploy-aws.sh -``` - -## Next Steps -Congrats, you now have a running instance of Polaris! For further information regarding how to use Polaris, -check out the [Creating a Catalog]({{% ref "../../creating-a-catalog" %}}) and -[Using Polaris]({{% relref "../../using-polaris" %}}) pages. - -## Cleanup Instructions -To shut down the Polaris server, run the following commands: - -```shell -export ASSETS_PATH=$(pwd)/getting-started/assets/ -docker compose -p polaris -f getting-started/eclipselink/docker-compose.yml down -``` - -To deploy Polaris in a production setting, please review further recommendations at the [Configuring Polaris for Production]({{% relref "../../../configuring-polaris-for-production" %}}) page. diff --git a/1.2.0/getting-started/deploying-polaris/cloud-deploy/deploy-azure.md b/1.2.0/getting-started/deploying-polaris/cloud-deploy/deploy-azure.md deleted file mode 100644 index ce265391fd..0000000000 --- a/1.2.0/getting-started/deploying-polaris/cloud-deploy/deploy-azure.md +++ /dev/null @@ -1,54 +0,0 @@ ---- -# -# Licensed to the Apache Software Foundation (ASF) under one -# or more contributor license agreements. See the NOTICE file -# distributed with this work for additional information -# regarding copyright ownership. The ASF licenses this file -# to you under the Apache License, Version 2.0 (the -# "License"); you may not use this file except in compliance -# with the License. You may obtain a copy of the License at -# -# http://www.apache.org/licenses/LICENSE-2.0 -# -# Unless required by applicable law or agreed to in writing, -# software distributed under the License is distributed on an -# "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY -# KIND, either express or implied. See the License for the -# specific language governing permissions and limitations -# under the License. -# -title: Deploying Polaris on Azure -linkTitle: Azure -type: docs -weight: 320 ---- - -Build and launch Polaris using the Azure Startup Script at the location provided in the command below. This script will start an [Azure Database for PostgreSQL - Flexible Server](https://learn.microsoft.com/en-us/azure/postgresql/flexible-server/overview) instance, which will be used as the backend Postgres instance holding all Polaris data. -Additionally, Polaris will be bootstrapped to use this database and Docker containers will be spun up for Spark SQL and Trino. - -The requirements to run the script below are: -* Install the AZ CLI, if it is not already installed on the Azure VM. Instructions to download the AZ CLI can be found [here](https://learn.microsoft.com/en-us/cli/azure/install-azure-cli). -* You must be logged into the AZ CLI. Please run `az account show` to ensure that you are logged in prior to running this script. -* Assign a System-Assigned Managed Identity to the Azure VM. - -```shell -export ASSETS_PATH=$(pwd)/getting-started/assets/ -export CLIENT_ID=root -export CLIENT_SECRET=s3cr3t -./getting-started/assets/cloud_providers/deploy-azure.sh -``` - -## Next Steps -Congrats, you now have a running instance of Polaris! For further information regarding how to use Polaris, -check out the [Creating a Catalog]({{% ref "../../creating-a-catalog" %}}) and -[Using Polaris]({{% relref "../../using-polaris" %}}) pages. - -## Cleanup Instructions -To shut down the Polaris server, run the following commands: - -```shell -export ASSETS_PATH=$(pwd)/getting-started/assets/ -docker compose -p polaris -f getting-started/eclipselink/docker-compose.yml down -``` - -To deploy Polaris in a production setting, please review further recommendations at the [Configuring Polaris for Production]({{% relref "../../../configuring-polaris-for-production" %}}) page. diff --git a/1.2.0/getting-started/deploying-polaris/cloud-deploy/deploy-gcp.md b/1.2.0/getting-started/deploying-polaris/cloud-deploy/deploy-gcp.md deleted file mode 100644 index dfb47bc78d..0000000000 --- a/1.2.0/getting-started/deploying-polaris/cloud-deploy/deploy-gcp.md +++ /dev/null @@ -1,54 +0,0 @@ ---- -# -# Licensed to the Apache Software Foundation (ASF) under one -# or more contributor license agreements. See the NOTICE file -# distributed with this work for additional information -# regarding copyright ownership. The ASF licenses this file -# to you under the Apache License, Version 2.0 (the -# "License"); you may not use this file except in compliance -# with the License. You may obtain a copy of the License at -# -# http://www.apache.org/licenses/LICENSE-2.0 -# -# Unless required by applicable law or agreed to in writing, -# software distributed under the License is distributed on an -# "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY -# KIND, either express or implied. See the License for the -# specific language governing permissions and limitations -# under the License. -# -title: Deploying Polaris on Google Cloud Platform (GCP) -linkTitle: GCP -type: docs -weight: 330 ---- - -Build and launch Polaris using the GCP Startup Script at the location provided in the command below. This script will start a [Cloud SQL for PostgreSQL](https://cloud.google.com/sql/docs/postgres) instance, which will be used as the backend Postgres instance holding all Polaris data. -Additionally, Polaris will be bootstrapped to use this database and Docker containers will be spun up for Spark SQL and Trino. - -The requirements to run the script below are: -* Install the `gcloud` CLI, if it is not already installed on the GCP VM. Instructions to download the `gcloud` CLI can be found [here](https://cloud.google.com/sdk/docs/install). -* Ensure the `Cloud SQL Admin API` has been enabled in your project and that your VM's Principal has access to the correct role: `roles/cloudsql.admin`. -* Ensure the VM's Principal has access to at least Read-only scope on Compute Engine: `compute.readonly`. - -```shell -export ASSETS_PATH=$(pwd)/getting-started/assets/ -export CLIENT_ID=root -export CLIENT_SECRET=s3cr3t -./getting-started/assets/cloud_providers/deploy-gcp.sh -``` - -## Next Steps -Congrats, you now have a running instance of Polaris! For further information regarding how to use Polaris, -check out the [Creating a Catalog]({{% ref "../../creating-a-catalog" %}}) and -[Using Polaris]({{% relref "../../using-polaris" %}}) pages. - -## Cleanup Instructions -To shut down the Polaris server, run the following commands: - -```shell -export ASSETS_PATH=$(pwd)/getting-started/assets/ -docker compose -p polaris -f getting-started/eclipselink/docker-compose.yml down -``` - -To deploy Polaris in a production setting, please review further recommendations at the [Configuring Polaris for Production]({{% relref "../../../configuring-polaris-for-production" %}}) page. diff --git a/1.2.0/getting-started/deploying-polaris/local-deploy.md b/1.2.0/getting-started/deploying-polaris/local-deploy.md deleted file mode 100644 index c2b7b41743..0000000000 --- a/1.2.0/getting-started/deploying-polaris/local-deploy.md +++ /dev/null @@ -1,119 +0,0 @@ ---- -# -# Licensed to the Apache Software Foundation (ASF) under one -# or more contributor license agreements. See the NOTICE file -# distributed with this work for additional information -# regarding copyright ownership. The ASF licenses this file -# to you under the Apache License, Version 2.0 (the -# "License"); you may not use this file except in compliance -# with the License. You may obtain a copy of the License at -# -# http://www.apache.org/licenses/LICENSE-2.0 -# -# Unless required by applicable law or agreed to in writing, -# software distributed under the License is distributed on an -# "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY -# KIND, either express or implied. See the License for the -# specific language governing permissions and limitations -# under the License. -# -title: Deploying Polaris locally -linkTitle: Local deployment -type: docs -weight: 200 ---- - -Polaris can be deployed via a docker image or as a standalone process. Before starting, be sure that you've satisfied the relevant prerequisites detailed in the previous page. - -## Common Setup -Before running Polaris, ensure you have completed the following setup steps: - -1. **Build Polaris** -```shell -cd ~/polaris -./gradlew \ - :polaris-server:assemble \ - :polaris-server:quarkusAppPartsBuild --rerun \ - :polaris-admin:assemble \ - :polaris-admin:quarkusAppPartsBuild --rerun \ - -Dquarkus.container-image.build=true -``` -- **For standalone**: Omit the `-Dquarkus.container-image.tag` and `-Dquarkus.container-image.build` options if you do not need to build a Docker image. - -## Running Polaris with Docker - -To start using Polaris in Docker and launch Polaris, which is packaged with a Postgres instance, Apache Spark, and Trino. - -```shell -export ASSETS_PATH=$(pwd)/getting-started/assets/ -export QUARKUS_DATASOURCE_JDBC_URL=jdbc:postgresql://postgres:5432/POLARIS -export QUARKUS_DATASOURCE_USERNAME=postgres -export QUARKUS_DATASOURCE_PASSWORD=postgres -export CLIENT_ID=root -export CLIENT_SECRET=s3cr3t -docker compose -p polaris -f getting-started/assets/postgres/docker-compose-postgres.yml \ - -f getting-started/jdbc/docker-compose-bootstrap-db.yml \ - -f getting-started/jdbc/docker-compose.yml up -d -``` - -You should see output for some time as Polaris, Spark, and Trino build and start up. Eventually, you won’t see any more logs and see some logs relating to Spark, resembling the following: - -``` -spark-sql-1 | Spark Web UI available at http://8bc4de8ed854:4040 -spark-sql-1 | Spark master: local[*], Application Id: local-1743745174604 -spark-sql-1 | 25/04/04 05:39:38 WARN SparkSQLCLIDriver: WARNING: Directory for Hive history file: /home/spark does not exist. History will not be available during this session. -spark-sql-1 | 25/04/04 05:39:39 WARN RESTSessionCatalog: Iceberg REST client is missing the OAuth2 server URI configuration and defaults to http://polaris:8181/api/catalogv1/oauth/tokens. This automatic fallback will be removed in a future Iceberg release.It is recommended to configure the OAuth2 endpoint using the 'oauth2-server-uri' property to be prepared. This warning will disappear if the OAuth2 endpoint is explicitly configured. See https://github.com/apache/iceberg/issues/10537 -``` - -The Docker image pre-configures a sample catalog called `quickstart_catalog` that uses a local file system. - -## Running Polaris as a Standalone Process - -You can also start Polaris through Gradle (packaged within the Polaris repository): - -1. **Start the Server** - -Run the following command to start Polaris: - -```shell -./gradlew run -``` - -You should see output for some time as Polaris builds and starts up. Eventually, you won’t see any more logs and should see messages that resemble the following: - -``` -INFO [io.quarkus] [,] [,,,] (main) Apache Polaris Server (incubating) on JVM (powered by Quarkus ) started in 1.911s. Listening on: http://0.0.0.0:8181. Management interface listening on http://0.0.0.0:8182. -INFO [io.quarkus] [,] [,,,] (main) Profile prod activated. -INFO [io.quarkus] [,] [,,,] (main) Installed features: [...] -``` - -At this point, Polaris is running. - -When using a Gradle-launched Polaris instance in this tutorial, we'll launch an instance of Polaris that stores entities only in-memory. This means that any entities that you define will be destroyed when Polaris is shut down. -For more information on how to configure Polaris for production usage, see the [docs]({{% relref "../../configuring-polaris-for-production" %}}). - -When Polaris is run using the `./gradlew run` command, the root principal credentials are `root` and `s3cr3t` for the `CLIENT_ID` and `CLIENT_SECRET`, respectively. - -### Installing Apache Spark and Trino Locally for Testing - -#### Apache Spark - -If you want to connect to Polaris with [Apache Spark](https://spark.apache.org/), you'll need to start by cloning Spark. As in the [prerequisites]({{% ref "../install-dependencies#git" %}}), make sure [git](https://git-scm.com/) is installed first. - -Then, clone Spark and check out a versioned branch. This guide uses [Spark 3.5](https://spark.apache.org/releases/spark-release-3-5-0.html). - -```shell -git clone --branch branch-3.5 https://github.com/apache/spark.git ~/spark -``` - -#### Trino -If you want to connect to Polaris with [Trino](https://trino.io/), it is recommended to set up a test instance of Trino using Docker. As in the [prerequisites]({{% ref "../install-dependencies#docker" %}}), make sure [Docker](https://www.docker.com/) is installed first - -```shell -docker run --name trino -d -p 8080:8080 trinodb/trino -``` - -## Next Steps -Congrats, you now have a running instance of Polaris! For further information regarding how to use Polaris, -check out the [Creating a Catalog]({{% ref "../creating-a-catalog" %}}) and -[Using Polaris]({{% ref "../using-polaris" %}}) pages. diff --git a/1.2.0/getting-started/install-dependencies.md b/1.2.0/getting-started/install-dependencies.md deleted file mode 100644 index 66640104d4..0000000000 --- a/1.2.0/getting-started/install-dependencies.md +++ /dev/null @@ -1,120 +0,0 @@ ---- -# -# Licensed to the Apache Software Foundation (ASF) under one -# or more contributor license agreements. See the NOTICE file -# distributed with this work for additional information -# regarding copyright ownership. The ASF licenses this file -# to you under the Apache License, Version 2.0 (the -# "License"); you may not use this file except in compliance -# with the License. You may obtain a copy of the License at -# -# http://www.apache.org/licenses/LICENSE-2.0 -# -# Unless required by applicable law or agreed to in writing, -# software distributed under the License is distributed on an -# "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY -# KIND, either express or implied. See the License for the -# specific language governing permissions and limitations -# under the License. -# -Title: Installing Dependencies -type: docs -weight: 100 ---- - -This guide serves as an introduction to several key entities that can be managed with Apache Polaris (Incubating), describes how to build and deploy Polaris locally, and finally includes examples of how to use Polaris with Apache Spark™. - -# Prerequisites - -This guide covers building Polaris, deploying it locally or via [Docker](https://www.docker.com/), and interacting with it using the command-line interface and [Apache Spark](https://spark.apache.org/). Before proceeding with Polaris, be sure to satisfy the relevant prerequisites listed here. - -## Git - -To get the latest Polaris code, you'll need to clone the repository using [git](https://git-scm.com/). You can install git using [homebrew](https://brew.sh/) on MacOS: - -```shell -brew install git -``` - -Please follow instructions from the [Git Documentation](https://git-scm.com/book/en/v2/Getting-Started-Installing-Git) for instructions on installing Git on other platforms. - -Then, use git to clone the Polaris repo: - -```shell -git clone https://github.com/apache/polaris.git ~/polaris -``` - -## Docker - -It is recommended to deploy Polaris inside [Docker](https://www.docker.com/) for the Quickstart workflow. Instructions for deploying the Quickstart workflow on the supported Cloud Providers (AWS, Azure, GCP) will be provided only with Docker. However, non-Docker deployment instructions for local deployments can also be followed on Cloud Providers. - -Instructions to install Docker can be found on the [Docker website](https://docs.docker.com/engine/install/). Ensure that Docker and the Docker Compose plugin are both installed. - -### Docker on MacOS -Docker can be installed using [homebrew](https://brew.sh/): - -```shell -brew install --cask docker -``` - -There could be a [Docker permission issues](https://github.com/apache/polaris/pull/971) related to seccomp configuration. To resolve these issues, set the `seccomp` profile to "unconfined" when running a container. For example: - -```shell -docker run --security-opt seccomp=unconfined apache/polaris:latest -``` - -Note: Setting the seccomp profile to "unconfined" disables the default system call filtering, which may pose security risks. Use this configuration with caution, especially in production environments. - -### Docker on Amazon Linux -Docker can be installed using a modification to the CentOS instructions. For example: - -```shell -sudo dnf update -y -# Remove old version -sudo dnf remove -y docker docker-client docker-client-latest docker-common docker-latest docker-latest-logrotate docker-logrotate docker-engine -# Install dnf plugin -sudo dnf -y install dnf-plugins-core -# Add CentOS repository -sudo dnf config-manager --add-repo https://download.docker.com/linux/centos/docker-ce.repo -# Adjust release server version in the path as it will not match with Amazon Linux 2023 -sudo sed -i 's/$releasever/9/g' /etc/yum.repos.d/docker-ce.repo -# Install as usual -sudo dnf -y install docker-ce docker-ce-cli containerd.io docker-buildx-plugin docker-compose-plugin -``` - -### Confirm Docker Installation - -Once installed, make sure that both Docker and the Docker Compose plugin are installed: - -```shell -docker version -docker compose version -``` - -Also make sure Docker is running and is able to run a sample Docker container: - -```shell -docker run hello-world -``` - -## Java - -If you plan to build Polaris from source yourself or using this tutorial's instructions on a Cloud Provider, you will need to satisfy a few prerequisites first. - -Polaris is built using [gradle](https://gradle.org/) and is compatible with Java 21. We recommend the use of [jenv](https://www.jenv.be/) to manage multiple Java versions. For example, to install Java 21 via [homebrew](https://brew.sh/) and configure it with jenv: - -```shell -cd ~/polaris -brew install openjdk@21 jenv -jenv add $(brew --prefix openjdk@21) -jenv local 21 -``` - -Ensure that `java --version` and `javac` both return non-zero responses. - -## jq - -Most Polaris Quickstart scripts require [jq]((https://jqlang.org/download/)). You can install jq using [homebrew](https://brew.sh/): -```shell -brew install jq -``` diff --git a/1.2.0/getting-started/using-polaris/_index.md b/1.2.0/getting-started/using-polaris/_index.md deleted file mode 100644 index 752a846f5c..0000000000 --- a/1.2.0/getting-started/using-polaris/_index.md +++ /dev/null @@ -1,348 +0,0 @@ ---- -# -# Licensed to the Apache Software Foundation (ASF) under one -# or more contributor license agreements. See the NOTICE file -# distributed with this work for additional information -# regarding copyright ownership. The ASF licenses this file -# to you under the Apache License, Version 2.0 (the -# "License"); you may not use this file except in compliance -# with the License. You may obtain a copy of the License at -# -# http://www.apache.org/licenses/LICENSE-2.0 -# -# Unless required by applicable law or agreed to in writing, -# software distributed under the License is distributed on an -# "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY -# KIND, either express or implied. See the License for the -# specific language governing permissions and limitations -# under the License. -# -Title: Using Polaris -type: docs -weight: 401 ---- - -## Setup - -Ensure your `CLIENT_ID` & `CLIENT_SECRET` variables are already defined, as they were required for starting the Polaris server earlier. - -```shell -export CLIENT_ID=YOUR_CLIENT_ID -export CLIENT_SECRET=YOUR_CLIENT_SECRET -``` - -Refer to the [Creating a Catalog]({{% ref "../creating-a-catalog" %}}) page for instructions on defining a -catalog for your specific storage type. The following examples assume the catalog's name is `quickstart_catalog`. - -In Polaris, the [catalog]({{% relref "../../entities#catalog" %}}) is the top-level entity that objects like [tables]({{% relref "../../entities#table" %}}) and [views]({{% relref "../../entities#view" %}}) are organized under. - -The `DEFAULT_BASE_LOCATION` value you provided at catalog creation time will be the default location that objects in -this catalog should be stored in. - -Additionally, if Polaris is running somewhere other than `localhost:8181`, you can specify the correct hostname and port by providing `--host` and `--port` flags. For the full set of options supported by the CLI, please refer to the [docs]({{% relref "../../command-line-interface" %}}). - - -### Creating a Principal and Assigning it Privileges - -With a catalog created, we can create a [principal]({{% relref "../../entities#principal" %}}) that has access to manage that catalog. For details on how to configure the Polaris CLI, see [the section above](#defining-a-catalog) or refer to the [docs]({{% relref "../../command-line-interface" %}}). - -```shell -./polaris \ - --client-id ${CLIENT_ID} \ - --client-secret ${CLIENT_SECRET} \ - principals \ - create \ - quickstart_user - -./polaris \ - --client-id ${CLIENT_ID} \ - --client-secret ${CLIENT_SECRET} \ - principal-roles \ - create \ - quickstart_user_role - -./polaris \ - --client-id ${CLIENT_ID} \ - --client-secret ${CLIENT_SECRET} \ - catalog-roles \ - create \ - --catalog quickstart_catalog \ - quickstart_catalog_role -``` - -Be sure to provide the necessary credentials, hostname, and port as before. - -When the `principals create` command completes successfully, it will return the credentials for this new principal. Export them for future use. For example: - -```shell -./polaris ... principals create example -{"clientId": "XXXX", "clientSecret": "YYYY"} -export USER_CLIENT_ID=XXXX -export USER_CLIENT_SECRET=YYYY -``` - -Now, we grant the principal the [principal role]({{% relref "../../entities#principal-role" %}}) we created, and grant the [catalog role]({{% relref "../../entities#catalog-role" %}}) the principal role we created. For more information on these entities, please refer to the linked documentation. - -```shell -./polaris \ - --client-id ${CLIENT_ID} \ - --client-secret ${CLIENT_SECRET} \ - principal-roles \ - grant \ - --principal quickstart_user \ - quickstart_user_role - -./polaris \ - --client-id ${CLIENT_ID} \ - --client-secret ${CLIENT_SECRET} \ - catalog-roles \ - grant \ - --catalog quickstart_catalog \ - --principal-role quickstart_user_role \ - quickstart_catalog_role -``` - -Now, we’ve linked our principal to the catalog via roles like so: - -![Principal to Catalog](/img/quickstart/privilege-illustration-1.png "Principal to Catalog") - -In order to give this principal the ability to interact with the catalog, we must assign some [privileges]({{% relref "../../entities#privilege" %}}). For the time being, we will give this principal the ability to fully manage content in our new catalog. We can do this with the CLI like so: - -```shell -./polaris \ - --client-id ${CLIENT_ID} \ - --client-secret ${CLIENT_SECRET} \ - privileges \ - catalog \ - grant \ - --catalog quickstart_catalog \ - --catalog-role quickstart_catalog_role \ - CATALOG_MANAGE_CONTENT -``` - -This grants the [catalog privileges]({{% relref "../../entities#privilege" %}}) `CATALOG_MANAGE_CONTENT` to our catalog role, linking everything together like so: - -![Principal to Catalog with Catalog Role](/img/quickstart/privilege-illustration-2.png "Principal to Catalog with Catalog Role") - -`CATALOG_MANAGE_CONTENT` has create/list/read/write privileges on all entities within the catalog. The same privilege could be granted to a namespace, in which case the principal could create/list/read/write any entity under that namespace. - -## Using Iceberg & Polaris - -At this point, we’ve created a principal and granted it the ability to manage a catalog. We can now use an external engine to assume that principal, access our catalog, and store data in that catalog using [Apache Iceberg](https://iceberg.apache.org/). Polaris is compatible with any [Apache Iceberg](https://iceberg.apache.org/) client that supports the REST API. Depending on the client you plan to use, refer to the respective examples below. - -### Connecting with Spark - -#### Using a Local Build of Spark - -To use a Polaris-managed catalog in [Apache Spark](https://spark.apache.org/), we can configure Spark to use the Iceberg catalog REST API. - -This guide uses [Apache Spark 3.5](https://spark.apache.org/releases/spark-release-3-5-0.html), but be sure to find [the appropriate iceberg-spark package for your Spark version](https://mvnrepository.com/artifact/org.apache.iceberg/iceberg-spark). From a local Spark clone on the `branch-3.5` branch we can run the following: - -_Note: the credentials provided here are those for our principal, not the root credentials._ - -```shell -bin/spark-sql \ ---packages org.apache.iceberg:iceberg-spark-runtime-3.5_2.12:1.10.0,org.apache.iceberg:iceberg-aws-bundle:1.10.0 \ ---conf spark.sql.extensions=org.apache.iceberg.spark.extensions.IcebergSparkSessionExtensions \ ---conf spark.sql.catalog.quickstart_catalog.warehouse=quickstart_catalog \ ---conf spark.sql.catalog.quickstart_catalog.header.X-Iceberg-Access-Delegation=vended-credentials \ ---conf spark.sql.catalog.quickstart_catalog=org.apache.iceberg.spark.SparkCatalog \ ---conf spark.sql.catalog.quickstart_catalog.catalog-impl=org.apache.iceberg.rest.RESTCatalog \ ---conf spark.sql.catalog.quickstart_catalog.uri=http://localhost:8181/api/catalog \ ---conf spark.sql.catalog.quickstart_catalog.credential=${USER_CLIENT_ID}:${USER_CLIENT_SECRET} \ ---conf spark.sql.catalog.quickstart_catalog.scope='PRINCIPAL_ROLE:ALL' \ ---conf spark.sql.catalog.quickstart_catalog.token-refresh-enabled=true \ ---conf spark.sql.catalog.quickstart_catalog.client.region=us-west-2 -``` - -Similar to the CLI commands above, this configures Spark to use the Polaris running at `localhost:8181`. If your Polaris server is running elsewhere, but sure to update the configuration appropriately. - -Finally, note that we include the `iceberg-aws-bundle` package here. If your table is using a different filesystem, be sure to include the appropriate dependency. - -#### Using Spark SQL from a Docker container - -Refresh the Docker container with the user's credentials: -```shell -docker compose -p polaris -f getting-started/eclipselink/docker-compose.yml stop spark-sql -docker compose -p polaris -f getting-started/eclipselink/docker-compose.yml rm -f spark-sql -docker compose -p polaris -f getting-started/eclipselink/docker-compose.yml up -d --no-deps spark-sql -``` - -Attach to the running spark-sql container: - -```shell -docker attach $(docker ps -q --filter name=spark-sql) -``` - -#### Sample Commands - -Once the Spark session starts, we can create a namespace and table within the catalog: - -```sql -USE quickstart_catalog; -CREATE NAMESPACE IF NOT EXISTS quickstart_namespace; -CREATE NAMESPACE IF NOT EXISTS quickstart_namespace.schema; -USE NAMESPACE quickstart_namespace.schema; -CREATE TABLE IF NOT EXISTS quickstart_table (id BIGINT, data STRING) USING ICEBERG; -``` - -We can now use this table like any other: - -``` -INSERT INTO quickstart_table VALUES (1, 'some data'); -SELECT * FROM quickstart_table; -. . . -+---+---------+ -|id |data | -+---+---------+ -|1 |some data| -+---+---------+ -``` - -If at any time access is revoked... - -```shell -./polaris \ - --client-id ${CLIENT_ID} \ - --client-secret ${CLIENT_SECRET} \ - privileges \ - catalog \ - revoke \ - --catalog quickstart_catalog \ - --catalog-role quickstart_catalog_role \ - CATALOG_MANAGE_CONTENT -``` - -Spark will lose access to the table: - -``` -INSERT INTO quickstart_table VALUES (1, 'some data'); - -org.apache.iceberg.exceptions.ForbiddenException: Forbidden: Principal 'quickstart_user' with activated PrincipalRoles '[]' and activated grants via '[quickstart_catalog_role, quickstart_user_role]' is not authorized for op LOAD_TABLE_WITH_READ_DELEGATION -``` - -### Connecting with Trino - -Refresh the Docker container with the user's credentials: - -```shell -docker compose -p polaris -f getting-started/eclipselink/docker-compose.yml stop trino -docker compose -p polaris -f getting-started/eclipselink/docker-compose.yml rm -f trino -docker compose -p polaris -f getting-started/eclipselink/docker-compose.yml up -d --no-deps trino -``` - -Attach to the running Trino container: - -```shell -docker exec -it $(docker ps -q --filter name=trino) trino -``` - -You may not see Trino's prompt immediately, type ENTER to see it. A few commands that you can try: - -```sql -SHOW CATALOGS; -SHOW SCHEMAS FROM iceberg; -CREATE SCHEMA iceberg.quickstart_schema; -CREATE TABLE iceberg.quickstart_schema.quickstart_table AS SELECT 1 x; -SELECT * FROM iceberg.quickstart_schema.quickstart_table; -``` - -If at any time access is revoked... - -```shell -./polaris \ - --client-id ${CLIENT_ID} \ - --client-secret ${CLIENT_SECRET} \ - privileges \ - catalog \ - revoke \ - --catalog quickstart_catalog \ - --catalog-role quickstart_catalog_role \ - CATALOG_MANAGE_CONTENT -``` - -Trino will lose access to the table: - -```sql -SELECT * FROM iceberg.quickstart_schema.quickstart_table; - -org.apache.iceberg.exceptions.ForbiddenException: Forbidden: Principal 'quickstart_user' with activated PrincipalRoles '[]' and activated grants via '[quickstart_catalog_role, quickstart_user_role]' is not authorized for op LOAD_TABLE_WITH_READ_DELEGATION -``` - -### Connecting with PyIceberg - -#### Using Credentials - -```python -from pyiceberg.catalog import load_catalog - -catalog = load_catalog( - type='rest', - uri='http://localhost:8181/api/catalog', - warehouse='quickstart_catalog', - scope="PRINCIPAL_ROLE:ALL", - credential=f"{CLIENT_ID}:{CLIENT_SECRET}", -) -``` - -If the `load_catalog` function is used with credentials, then PyIceberg will automatically request an authorization token from the `v1/oauth/tokens` endpoint, and will later use this token to prove its identity to the Polaris Catalog. - -#### Using a Token - -```python -from pyiceberg.catalog import load_catalog -import requests - -# Step 1: Get OAuth token -response = requests.post( - "http://localhost:8181/api/catalog/v1/oauth/tokens", - auth =(CLIENT_ID, CLIENT_SECRET), - data = { - "grant_type": "client_credentials", - "scope": "PRINCIPAL_ROLE:ALL" - }) -token = response.json()["access_token"] - -# Step 2: Load the catalog using the token -catalog = load_catalog( - type='rest', - uri='http://localhost:8181/api/catalog', - warehouse='quickstart_catalog', - token=token, -) -``` - -It is possible to use `load_catalog` function by providing an authorization token directly. This method is useful when using an external identity provider (e.g. Google Identity). - -### Connecting Using REST APIs - -To access Polaris from the host machine, first request an access token: - -```shell -export POLARIS_TOKEN=$(curl -s http://polaris:8181/api/catalog/v1/oauth/tokens \ - --resolve polaris:8181:127.0.0.1 \ - --user ${CLIENT_ID}:${CLIENT_SECRET} \ - -d 'grant_type=client_credentials' \ - -d 'scope=PRINCIPAL_ROLE:ALL' | jq -r .access_token) -``` - -Then, use the access token in the Authorization header when accessing Polaris: - -```shell -curl -v http://127.0.0.1:8181/api/management/v1/principal-roles -H "Authorization: Bearer $POLARIS_TOKEN" -curl -v http://127.0.0.1:8181/api/management/v1/catalogs/quickstart_catalog -H "Authorization: Bearer $POLARIS_TOKEN" -``` - -## Next Steps -* Visit [Using Keycloak as the external identity provider]({{% relref "keycloak-idp" %}}). -* Visit [Using Polaris with telemetry tools]({{% relref "telemetry-tools" %}}). -* Visit [Configuring Polaris for Production]({{% relref "../../configuring-polaris-for-production" %}}). -* A Getting Started experience for using Spark with Jupyter Notebooks is documented [here](https://github.com/apache/polaris/blob/main/getting-started/spark/README.md). -* To shut down a locally-deployed Polaris server and clean up all related Docker containers, run the command listed below. Cloud Deployments have their respective termination commands on their Deployment page, while Polaris running on Gradle will terminate when the Gradle process terminates. -```shell -docker compose -p polaris \ - -f getting-started/assets/postgres/docker-compose-postgres.yml \ - -f getting-started/jdbc/docker-compose-bootstrap-db.yml \ - -f getting-started/jdbc/docker-compose.yml \ - down -``` diff --git a/1.2.0/getting-started/using-polaris/keycloak-idp.md b/1.2.0/getting-started/using-polaris/keycloak-idp.md deleted file mode 100644 index a0d27b7386..0000000000 --- a/1.2.0/getting-started/using-polaris/keycloak-idp.md +++ /dev/null @@ -1,212 +0,0 @@ ---- -# -# Licensed to the Apache Software Foundation (ASF) under one -# or more contributor license agreements. See the NOTICE file -# distributed with this work for additional information -# regarding copyright ownership. The ASF licenses this file -# to you under the Apache License, Version 2.0 (the -# "License"); you may not use this file except in compliance -# with the License. You may obtain a copy of the License at -# -# http://www.apache.org/licenses/LICENSE-2.0 -# -# Unless required by applicable law or agreed to in writing, -# software distributed under the License is distributed on an -# "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY -# KIND, either express or implied. See the License for the -# specific language governing permissions and limitations -# under the License. -# -title: Getting Started with Apache Polaris, External Authentication and Keycloak -linkTitle: Using Keycloak IDP -type: docs -weight: 400 ---- - -## Overview - -This example uses Keycloak as an **external** identity provider for Polaris. The "iceberg" realm is automatically -created and configured from the `iceberg-realm.json` file. - -This Keycloak realm contains 1 client definition: `client1:s3cr3t`. It is configured to return tokens with the following -fixed claims: - -- `principal_id`: the principal ID of the user. It is always set to zero (0) in this example. -- `principal_name`: the principal name of the user. It is always set to "root" in this example. -- `principal_roles`: the principal roles of the user. It is always set to `["server_admin", "catalog_admin"]` in this - example. - -This is obviously not a realistic configuration. In a real-world scenario, you would configure Keycloak to return the -actual principal ID, name and roles of the user. Note that principals and principal roles must have been created in -Polaris beforehand, and the principal ID, name and roles must match the ones returned by Keycloak. - -Polaris is configured with 3 realms: - -- `realm-internal`: This is the default realm, and is configured to use the internal authentication only. It accepts - token issues by Polaris itself only. -- `realm-external`: This realm is configured to use an external identity provider (IDP) for authentication only. It - accepts tokens issued by Keycloak only. -- `realm-mixed`: This realm is configured to use both the internal and external authentication. It accepts tokens - issued by both Polaris and Keycloak. - -For more information about how to configure Polaris with external authentication, see the -[IDP integration documentation]({{% relref "../../managing-security/external-idp" %}}). - -## Starting the Example - -1. Build the Polaris server image if it's not already present locally: - - ```shell - ./gradlew \ - :polaris-server:assemble \ - :polaris-server:quarkusAppPartsBuild --rerun \ - -Dquarkus.container-image.build=true - ``` - -2. Start the docker compose group by running the following command from the root of the repository: - - ```shell - docker compose -f getting-started/keycloak/docker-compose.yml up - ``` - -## Requesting a Token - -Note: the commands below require `jq` to be installed on your machine. - -### From Polaris - -You can request a token from Polaris for realms `realm-internal` and `realm-mixed`: - -1. Open a terminal and run the following command to request an access token for the `realm-internal` realm: - - ```shell - polaris_token_realm_internal=$(curl -s http://localhost:8181/api/catalog/v1/oauth/tokens \ - --user root:s3cr3t \ - -H 'Polaris-Realm: realm-internal' \ - -d 'grant_type=client_credentials' \ - -d 'scope=PRINCIPAL_ROLE:ALL' | jq -r .access_token) - ``` - - This token is valid only for the `realm-internal` realm. - -2. Open a terminal and run the following command to request an access token for the `realm-mixed` realm: - - ```shell - polaris_token_realm_mixed=$(curl -s http://localhost:8181/api/catalog/v1/oauth/tokens \ - --user root:s3cr3t \ - -H 'Polaris-Realm: realm-mixed' \ - -d 'grant_type=client_credentials' \ - -d 'scope=PRINCIPAL_ROLE:ALL' | jq -r .access_token) - ``` - - This token is valid only for the `realm-mixed` realm. - -Polaris tokens are valid for 1 hour. - -Note: if you request a Polaris token for the `realm-external` realm, it will not work because Polaris won't issue tokens -for this realm: - -```shell -curl -v http://localhost:8181/api/catalog/v1/oauth/tokens \ - --user root:s3cr3t \ - -H 'Polaris-Realm: realm-external' \ - -d 'grant_type=client_credentials' \ - -d 'scope=PRINCIPAL_ROLE:ALL' -``` - -This will return a `501 Not Implemented` error because for this realm, the internal token endpoint has been deactivated. - -### From Keycloak - -You can request a token from Keycloak for the `realm-external` and `realm-mixed` realms: - -1. Open a terminal and run the following command to request an access token from Keycloak: - - ```shell - keycloak_token=$(curl -s http://keycloak:8080/realms/iceberg/protocol/openid-connect/token \ - --resolve keycloak:8080:127.0.0.1 \ - --user client1:s3cr3t \ - -d 'grant_type=client_credentials' | jq -r .access_token) - ``` - -Note the `--resolve` option: it is used to send the request with the `Host` header set to `keycloak`. This is necessary -because Keycloak issues tokens with the `iss` claim matching the request's `Host` header; without this, the token would -not be valid when used against Polaris because the `iss` claim would be `127.0.0.1`, but Polaris expects it to be -`keycloak`, since that's Keycloak's hostname within the Docker network. - -Tokens issued by Keycloak can be used to access Polaris with the `realm-external` or `realm-mixed` realms. Access tokens -are valid for 1 hour. - -You can also access the Keycloak admin console. Open a browser and go to [http://localhost:8080](http://localhost:8080), -then log in with the username `admin` and password `admin` (you can change this in the docker-compose file). - -## Accessing Polaris with the Tokens - -You can access Polaris using the tokens you obtained above. The following examples show how to use the tokens with -`curl`: - -### Using the Polaris Token - -1. Open a terminal and run the following command to list the principal roles in the `realm-internal` realm: - - ```shell - curl -v http://localhost:8181/api/management/v1/catalogs \ - -H "Authorization: Bearer $polaris_token_realm_internal" \ - -H 'Polaris-Realm: realm-internal' \ - -H 'Accept: application/json' - ``` - -2. Open a terminal and run the following command to list the principal roles in the `realm-mixed` realm: - - ```shell - curl -v http://localhost:8181/api/management/v1/catalogs \ - -H "Authorization: Bearer $polaris_token_realm_mixed" \ - -H 'Polaris-Realm: realm-mixed' \ - -H 'Accept: application/json' - ``` - -Note: you cannot mix tokens from different realms. For example, you cannot use a token from the `realm-internal` realm to access -the `realm-mixed` realm: - -```shell -curl -v http://localhost:8181/api/management/v1/catalogs \ - -H "Authorization: Bearer $polaris_token_realm_internal" \ - -H 'Polaris-Realm: realm-mixed' \ - -H 'Accept: application/json' -``` - -This will return a `401 Unauthorized` error because the token is not valid for the `realm-mixed` realm. - -### Using the Keycloak Token - -The same Keycloak token can be used to access both the `realm-external` and `realm-mixed` realms, as it is valid for -both (both realms share the same OIDC tenant configuration). - -1. Open a terminal and run the following command to list the principal roles in the `realm-external` realm: - - ```shell - curl -v http://localhost:8181/api/management/v1/catalogs \ - -H "Authorization: Bearer $keycloak_token" \ - -H 'Polaris-Realm: realm-external' \ - -H 'Accept: application/json' - ``` - -2. Open a terminal and run the following command to list the principal roles in the `realm-mixed` realm: - - ```shell - curl -v http://localhost:8181/api/management/v1/catalogs \ - -H "Authorization: Bearer $keycloak_token" \ - -H 'Polaris-Realm: realm-mixed' \ - -H 'Accept: application/json' - ``` - -Note: you cannot use a Keycloak token to access the `realm-internal` realm: - -```shell -curl -v http://localhost:8181/api/management/v1/catalogs \ - -H "Authorization: Bearer $keycloak_token" \ - -H 'Polaris-Realm: realm-internal' \ - -H 'Accept: application/json' -``` - -This will return a `401 Unauthorized` error because the token is not valid for the `realm-internal` realm. \ No newline at end of file diff --git a/1.2.0/getting-started/using-polaris/telemetry-tools.md b/1.2.0/getting-started/using-polaris/telemetry-tools.md deleted file mode 100644 index b6a9e8f8eb..0000000000 --- a/1.2.0/getting-started/using-polaris/telemetry-tools.md +++ /dev/null @@ -1,70 +0,0 @@ ---- -# -# Licensed to the Apache Software Foundation (ASF) under one -# or more contributor license agreements. See the NOTICE file -# distributed with this work for additional information -# regarding copyright ownership. The ASF licenses this file -# to you under the Apache License, Version 2.0 (the -# "License"); you may not use this file except in compliance -# with the License. You may obtain a copy of the License at -# -# http://www.apache.org/licenses/LICENSE-2.0 -# -# Unless required by applicable law or agreed to in writing, -# software distributed under the License is distributed on an -# "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY -# KIND, either express or implied. See the License for the -# specific language governing permissions and limitations -# under the License. -# -title: Getting Started with Apache Polaris, Prometheus and Jaeger -linkTitle: Using Polaris with telemetry tools -type: docs -weight: 401 ---- - -This example requires `jq` to be installed on your machine. - -1. Build the Polaris image if it's not already present locally: - - ```shell - ./gradlew \ - :polaris-server:assemble \ - :polaris-server:quarkusAppPartsBuild --rerun \ - -Dquarkus.container-image.build=true - ``` - -2. Start the docker compose group by running the following command from the root of the repository: - - ```shell - export ASSETS_PATH=$(pwd)/getting-started/assets/ - export CLIENT_ID=root - export CLIENT_SECRET=s3cr3t - docker compose -f getting-started/telemetry/docker-compose.yml up - ``` - -3. To access Polaris from the host machine, first request an access token: - - ```shell - export POLARIS_TOKEN=$(curl -s http://localhost:8181/api/catalog/v1/oauth/tokens \ - --user root:s3cr3t \ - -d 'grant_type=client_credentials' \ - -d 'scope=PRINCIPAL_ROLE:ALL' | jq -r .access_token) - ``` - -4. Then, use the access token in the Authorization header when accessing Polaris; you can also test - the `Polaris-Request-Id` header; you should see it in all logs and traces: - - ```shell - curl -v 'http://localhost:8181/api/management/v1/principal-roles' \ - -H "Authorization: Bearer $POLARIS_TOKEN" \ - -H "Polaris-Request-Id: 1234" - curl -v 'http://localhost:8181/api/catalog/v1/config?warehouse=quickstart_catalog' \ - -H "Authorization: Bearer $POLARIS_TOKEN" \ - -H "Polaris-Request-Id: 5678" - ``` - -5. Access the following services: - - - Prometheus UI: browse to http://localhost:9093 to view metrics. - - Jaeger UI: browse to http://localhost:16686 to view traces. diff --git a/1.2.0/helm.md b/1.2.0/helm.md deleted file mode 100644 index ef82e8e675..0000000000 --- a/1.2.0/helm.md +++ /dev/null @@ -1,371 +0,0 @@ ---- -# -# Licensed to the Apache Software Foundation (ASF) under one -# or more contributor license agreements. See the NOTICE file -# distributed with this work for additional information -# regarding copyright ownership. The ASF licenses this file -# to you under the Apache License, Version 2.0 (the -# "License"); you may not use this file except in compliance -# with the License. You may obtain a copy of the License at -# -# http://www.apache.org/licenses/LICENSE-2.0 -# -# Unless required by applicable law or agreed to in writing, -# software distributed under the License is distributed on an -# "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY -# KIND, either express or implied. See the License for the -# specific language governing permissions and limitations -# under the License. -# -Title: Polaris Helm Chart -type: docs -weight: 675 ---- - - - -![Version: 1.2.0-incubating-SNAPSHOT](https://img.shields.io/badge/Version-1.2.0--incubating--SNAPSHOT-informational?style=flat-square) ![Type: application](https://img.shields.io/badge/Type-application-informational?style=flat-square) ![AppVersion: 1.2.0-incubating-SNAPSHOT](https://img.shields.io/badge/AppVersion-1.2.0--incubating--SNAPSHOT-informational?style=flat-square) - -A Helm chart for Apache Polaris (incubating). - -**Homepage:** - -## Source Code - -* - -## Installation - -### Running locally with a Minikube cluster - -The below instructions assume Minikube and Helm are installed. - -Start the Minikube cluster, build and load image into the Minikube cluster: - -```bash -minikube start -eval $(minikube docker-env) - -./gradlew \ - :polaris-server:assemble \ - :polaris-server:quarkusAppPartsBuild --rerun \ - :polaris-admin:assemble \ - :polaris-admin:quarkusAppPartsBuild --rerun \ - -Dquarkus.container-image.build=true -``` - -### Installing the chart locally - -The below instructions assume a local Kubernetes cluster is running and Helm is installed. - -#### Common setup - -Create the target namespace: -```bash -kubectl create namespace polaris -``` - -Create all the required resources in the `polaris` namespace. This usually includes a Postgres -database, Kubernetes secrets, and service accounts. The Polaris chart does not create -these resources automatically, as they are not required for all Polaris deployments. The chart will -fail if these resources are not created beforehand. You can find some examples in the -`helm/polaris/ci/fixtures` directory, but beware that these are primarily intended for tests. - -Below are two sample deployment models for installing the chart: one with a non-persistent backend and another with a persistent backend. - -{{< alert warning >}} -The examples below use values files located in the `helm/polaris/ci` directory. -**These files are intended for testing purposes primarily, and may not be suitable for production use**. -For production deployments, create your own values files based on the provided examples. -{{< /alert >}} - -#### Non-persistent backend - -Install the chart with a non-persistent backend. From Polaris repo root: -```bash -helm upgrade --install --namespace polaris \ - polaris helm/polaris -``` - -#### Persistent backend - -{{< alert warning >}} -The Postgres deployment set up in the fixtures directory is intended for testing purposes only and is not suitable for production use. For production deployments, use a managed Postgres service or a properly configured and secured Postgres instance. -{{< /alert >}} - -Install the chart with a persistent backend. From Polaris repo root: -```bash -helm upgrade --install --namespace polaris \ - --values helm/polaris/ci/persistence-values.yaml \ - polaris helm/polaris -kubectl wait --namespace polaris --for=condition=ready pod --selector=app.kubernetes.io/name=polaris --timeout=120s -``` - -To access Polaris and Postgres locally, set up port forwarding for both services (This is needed for bootstrap processes): -```bash -kubectl port-forward -n polaris $(kubectl get pod -n polaris -l app.kubernetes.io/name=polaris -o jsonpath='{.items[0].metadata.name}') 8181:8181 - -kubectl port-forward -n polaris $(kubectl get pod -n polaris -l app.kubernetes.io/name=postgres -o jsonpath='{.items[0].metadata.name}') 5432:5432 -``` - -Run the catalog bootstrap using the Polaris admin tool. This step initializes the catalog with the required configuration: -```bash -container_envs=$(kubectl exec -it -n polaris $(kubectl get pod -n polaris -l app.kubernetes.io/name=polaris -o jsonpath='{.items[0].metadata.name}') -- env) -export QUARKUS_DATASOURCE_USERNAME=$(echo "$container_envs" | grep quarkus.datasource.username | awk -F '=' '{print $2}' | tr -d '\n\r') -export QUARKUS_DATASOURCE_PASSWORD=$(echo "$container_envs" | grep quarkus.datasource.password | awk -F '=' '{print $2}' | tr -d '\n\r') -export QUARKUS_DATASOURCE_JDBC_URL=$(echo "$container_envs" | grep quarkus.datasource.jdbc.url | sed 's/postgres/localhost/2' | awk -F '=' '{print $2}' | tr -d '\n\r') - -java -jar runtime/admin/build/quarkus-app/quarkus-run.jar bootstrap -c POLARIS,root,pass -r POLARIS -``` - -### Uninstalling - -```bash -helm uninstall --namespace polaris polaris - -kubectl delete --namespace polaris -f helm/polaris/ci/fixtures/ - -kubectl delete namespace polaris -``` - -## Development & Testing - -This section is intended for developers who want to run the Polaris Helm chart tests. - -### Prerequisites - -The following tools are required to run the tests: - -* [Helm Unit Test](https://github.com/helm-unittest/helm-unittest) -* [Chart Testing](https://github.com/helm/chart-testing) - -Quick installation instructions for these tools: -```bash -helm plugin install https://github.com/helm-unittest/helm-unittest.git -brew install chart-testing -``` - -The integration tests also require some fixtures to be deployed. The `ci/fixtures` directory -contains the required resources. To deploy them, run the following command: -```bash -kubectl apply --namespace polaris -f helm/polaris/ci/fixtures/ -kubectl wait --namespace polaris --for=condition=ready pod --selector=app.kubernetes.io/name=postgres --timeout=120s -``` - -The `helm/polaris/ci` contains a number of values files that will be used to install the chart with -different configurations. - -### Running the unit tests - -Helm unit tests do not require a Kubernetes cluster. To run the unit tests, execute Helm Unit from -the Polaris repo root: -```bash -helm unittest helm/polaris -``` - -You can also lint the chart using the Chart Testing tool, with the following command: - -```bash -ct lint --charts helm/polaris -``` - -### Running the integration tests - -Integration tests require a Kubernetes cluster. See installation instructions above for setting up -a local cluster. - -Integration tests are run with the Chart Testing tool: -```bash -ct install --namespace polaris --charts ./helm/polaris -``` - -## Values - -| Key | Type | Default | Description | -|-----|------|---------|-------------| -| advancedConfig | object | `{}` | Advanced configuration. You can pass here any valid Polaris or Quarkus configuration property. Any property that is defined here takes precedence over all the other configuration values generated by this chart. Properties can be passed "flattened" or as nested YAML objects (see examples below). Note: values should be strings; avoid using numbers, booleans, or other types. | -| affinity | object | `{}` | Affinity and anti-affinity for polaris pods. See https://kubernetes.io/docs/concepts/scheduling-eviction/assign-pod-node/#affinity-and-anti-affinity. | -| authentication | object | `{"authenticator":{"type":"default"},"realmOverrides":{},"tokenBroker":{"maxTokenGeneration":"PT1H","secret":{"name":null,"privateKey":"private.pem","publicKey":"public.pem","rsaKeyPair":{"privateKey":"private.pem","publicKey":"public.pem"},"secretKey":"symmetric.pem","symmetricKey":{"secretKey":"symmetric.key"}},"type":"rsa-key-pair"},"tokenService":{"type":"default"},"type":"internal"}` | Polaris authentication configuration. | -| authentication.authenticator | object | `{"type":"default"}` | The `Authenticator` implementation to use. Only one built-in type is supported: default. | -| authentication.realmOverrides | object | `{}` | Authentication configuration overrides per realm. | -| authentication.tokenBroker | object | `{"maxTokenGeneration":"PT1H","secret":{"name":null,"privateKey":"private.pem","publicKey":"public.pem","rsaKeyPair":{"privateKey":"private.pem","publicKey":"public.pem"},"secretKey":"symmetric.pem","symmetricKey":{"secretKey":"symmetric.key"}},"type":"rsa-key-pair"}` | The `TokenBroker` implementation to use. Two built-in types are supported: rsa-key-pair and symmetric-key. Only relevant when using internal (or mixed) authentication. When using external authentication, the token broker is not used. | -| authentication.tokenBroker.maxTokenGeneration | string | `"PT1H"` | Maximum token generation duration (e.g., PT1H for 1 hour). | -| authentication.tokenBroker.secret | object | `{"name":null,"privateKey":"private.pem","publicKey":"public.pem","rsaKeyPair":{"privateKey":"private.pem","publicKey":"public.pem"},"secretKey":"symmetric.pem","symmetricKey":{"secretKey":"symmetric.key"}}` | The secret name to pull the public and private keys, or the symmetric key secret from. | -| authentication.tokenBroker.secret.name | string | `nil` | The name of the secret to pull the keys from. If not provided, a key pair will be generated. This is not recommended for production. | -| authentication.tokenBroker.secret.privateKey | string | `"private.pem"` | DEPRECATED: Use `authentication.tokenBroker.secret.rsaKeyPair.privateKey` instead. Key name inside the secret for the private key | -| authentication.tokenBroker.secret.publicKey | string | `"public.pem"` | DEPRECATED: Use `authentication.tokenBroker.secret.rsaKeyPair.publicKey` instead. Key name inside the secret for the public key | -| authentication.tokenBroker.secret.rsaKeyPair | object | `{"privateKey":"private.pem","publicKey":"public.pem"}` | Optional: configuration specific to RSA key pair secret. | -| authentication.tokenBroker.secret.rsaKeyPair.privateKey | string | `"private.pem"` | Key name inside the secret for the private key | -| authentication.tokenBroker.secret.rsaKeyPair.publicKey | string | `"public.pem"` | Key name inside the secret for the public key | -| authentication.tokenBroker.secret.secretKey | string | `"symmetric.pem"` | DEPRECATED: Use `authentication.tokenBroker.secret.symmetricKey.secretKey` instead. Key name inside the secret for the symmetric key | -| authentication.tokenBroker.secret.symmetricKey | object | `{"secretKey":"symmetric.key"}` | Optional: configuration specific to symmetric key secret. | -| authentication.tokenBroker.secret.symmetricKey.secretKey | string | `"symmetric.key"` | Key name inside the secret for the symmetric key | -| authentication.tokenService | object | `{"type":"default"}` | The token service (`IcebergRestOAuth2ApiService`) implementation to use. Two built-in types are supported: default and disabled. Only relevant when using internal (or mixed) authentication. When using external authentication, the token service is always disabled. | -| authentication.type | string | `"internal"` | The type of authentication to use. Three built-in types are supported: internal, external, and mixed. | -| autoscaling.enabled | bool | `false` | Specifies whether automatic horizontal scaling should be enabled. Do not enable this when using in-memory version store type. | -| autoscaling.maxReplicas | int | `3` | The maximum number of replicas to maintain. | -| autoscaling.minReplicas | int | `1` | The minimum number of replicas to maintain. | -| autoscaling.targetCPUUtilizationPercentage | int | `80` | Optional; set to zero or empty to disable. | -| autoscaling.targetMemoryUtilizationPercentage | string | `nil` | Optional; set to zero or empty to disable. | -| configMapLabels | object | `{}` | Additional Labels to apply to polaris configmap. | -| containerSecurityContext | object | `{"allowPrivilegeEscalation":false,"capabilities":{"drop":["ALL"]},"runAsNonRoot":true,"runAsUser":10000,"seccompProfile":{"type":"RuntimeDefault"}}` | Security context for the polaris container. See https://kubernetes.io/docs/tasks/configure-pod-container/security-context/. | -| containerSecurityContext.runAsUser | int | `10000` | UID 10000 is compatible with Polaris OSS default images; change this if you are using a different image. | -| cors | object | `{"accessControlAllowCredentials":null,"accessControlMaxAge":null,"allowedHeaders":[],"allowedMethods":[],"allowedOrigins":[],"exposedHeaders":[]}` | Polaris CORS configuration. | -| cors.accessControlAllowCredentials | string | `nil` | The `Access-Control-Allow-Credentials` response header. The value of this header will default to `true` if `allowedOrigins` property is set and there is a match with the precise `Origin` header. | -| cors.accessControlMaxAge | string | `nil` | The `Access-Control-Max-Age` response header value indicating how long the results of a pre-flight request can be cached. Must be a valid duration. | -| cors.allowedHeaders | list | `[]` | HTTP headers allowed for CORS, ex: X-Custom, Content-Disposition. If this is not set or empty, all requested headers are considered allowed. | -| cors.allowedMethods | list | `[]` | HTTP methods allowed for CORS, ex: GET, PUT, POST. If this is not set or empty, all requested methods are considered allowed. | -| cors.allowedOrigins | list | `[]` | Origins allowed for CORS, e.g. http://polaris.apache.org, http://localhost:8181. In case an entry of the list is surrounded by forward slashes, it is interpreted as a regular expression. | -| cors.exposedHeaders | list | `[]` | HTTP headers exposed to the client, ex: X-Custom, Content-Disposition. The default is an empty list. | -| extraEnv | list | `[]` | Advanced configuration via Environment Variables. Extra environment variables to add to the Polaris server container. You can pass here any valid EnvVar object: https://kubernetes.io/docs/reference/generated/kubernetes-api/v1.27/#envvar-v1-core This can be useful to get configuration values from Kubernetes secrets or config maps. | -| extraInitContainers | list | `[]` | Add additional init containers to the polaris pod(s) See https://kubernetes.io/docs/concepts/workloads/pods/init-containers/. | -| extraServices | list | `[]` | Additional service definitions. All service definitions always select all Polaris pods. Use this if you need to expose specific ports with different configurations, e.g. expose polaris-http with an alternate LoadBalancer service instead of ClusterIP. | -| extraVolumeMounts | list | `[]` | Extra volume mounts to add to the polaris container. See https://kubernetes.io/docs/concepts/storage/volumes/. | -| extraVolumes | list | `[]` | Extra volumes to add to the polaris pod. See https://kubernetes.io/docs/concepts/storage/volumes/. | -| features | object | `{"realmOverrides":{}}` | Polaris features configuration. | -| features.realmOverrides | object | `{}` | Features to enable or disable per realm. This field is a map of maps. The realm name is the key, and the value is a map of feature names to values. If a feature is not present in the map, the default value from the 'defaults' field is used. | -| fileIo | object | `{"type":"default"}` | Polaris FileIO configuration. | -| fileIo.type | string | `"default"` | The type of file IO to use. Two built-in types are supported: default and wasb. The wasb one translates WASB paths to ABFS ones. | -| image.configDir | string | `"/deployments/config"` | The path to the directory where the application.properties file, and other configuration files, if any, should be mounted. Note: if you are using EclipseLink, then this value must be at least two folders down to the root folder, e.g. `/deployments/config` is OK, whereas `/deployments` is not. | -| image.pullPolicy | string | `"IfNotPresent"` | The image pull policy. | -| image.repository | string | `"apache/polaris"` | The image repository to pull from. | -| image.tag | string | `"latest"` | The image tag. | -| imagePullSecrets | list | `[]` | References to secrets in the same namespace to use for pulling any of the images used by this chart. Each entry is a LocalObjectReference to an existing secret in the namespace. The secret must contain a .dockerconfigjson key with a base64-encoded Docker configuration file. See https://kubernetes.io/docs/tasks/configure-pod-container/pull-image-private-registry/ for more information. | -| ingress.annotations | object | `{}` | Annotations to add to the ingress. | -| ingress.className | string | `""` | Specifies the ingressClassName; leave empty if you don't want to customize it | -| ingress.enabled | bool | `false` | Specifies whether an ingress should be created. | -| ingress.hosts | list | `[{"host":"chart-example.local","paths":[]}]` | A list of host paths used to configure the ingress. | -| ingress.tls | list | `[]` | A list of TLS certificates; each entry has a list of hosts in the certificate, along with the secret name used to terminate TLS traffic on port 443. | -| livenessProbe | object | `{"failureThreshold":3,"initialDelaySeconds":5,"periodSeconds":10,"successThreshold":1,"terminationGracePeriodSeconds":30,"timeoutSeconds":10}` | Configures the liveness probe for polaris pods. | -| livenessProbe.failureThreshold | int | `3` | Minimum consecutive failures for the probe to be considered failed after having succeeded. Minimum value is 1. | -| livenessProbe.initialDelaySeconds | int | `5` | Number of seconds after the container has started before liveness probes are initiated. Minimum value is 0. | -| livenessProbe.periodSeconds | int | `10` | How often (in seconds) to perform the probe. Minimum value is 1. | -| livenessProbe.successThreshold | int | `1` | Minimum consecutive successes for the probe to be considered successful after having failed. Minimum value is 1. | -| livenessProbe.terminationGracePeriodSeconds | int | `30` | Optional duration in seconds the pod needs to terminate gracefully upon probe failure. Minimum value is 1. | -| livenessProbe.timeoutSeconds | int | `10` | Number of seconds after which the probe times out. Minimum value is 1. | -| logging | object | `{"categories":{"org.apache.iceberg.rest":"INFO","org.apache.polaris":"INFO"},"console":{"enabled":true,"format":"%d{yyyy-MM-dd HH:mm:ss,SSS} %-5p [%c{3.}] [%X{requestId},%X{realmId}] [%X{traceId},%X{parentId},%X{spanId},%X{sampled}] (%t) %s%e%n","json":false,"threshold":"ALL"},"file":{"enabled":false,"fileName":"polaris.log","format":"%d{yyyy-MM-dd HH:mm:ss,SSS} %-5p [%c{3.}] [%X{requestId},%X{realmId}] [%X{traceId},%X{parentId},%X{spanId},%X{sampled}] (%t) %s%e%n","json":false,"logsDir":"/deployments/logs","rotation":{"fileSuffix":null,"maxBackupIndex":5,"maxFileSize":"100Mi"},"storage":{"className":"standard","selectorLabels":{},"size":"512Gi"},"threshold":"ALL"},"level":"INFO","mdc":{},"requestIdHeaderName":"Polaris-Request-Id"}` | Logging configuration. | -| logging.categories | object | `{"org.apache.iceberg.rest":"INFO","org.apache.polaris":"INFO"}` | Configuration for specific log categories. | -| logging.console | object | `{"enabled":true,"format":"%d{yyyy-MM-dd HH:mm:ss,SSS} %-5p [%c{3.}] [%X{requestId},%X{realmId}] [%X{traceId},%X{parentId},%X{spanId},%X{sampled}] (%t) %s%e%n","json":false,"threshold":"ALL"}` | Configuration for the console appender. | -| logging.console.enabled | bool | `true` | Whether to enable the console appender. | -| logging.console.format | string | `"%d{yyyy-MM-dd HH:mm:ss,SSS} %-5p [%c{3.}] [%X{requestId},%X{realmId}] [%X{traceId},%X{parentId},%X{spanId},%X{sampled}] (%t) %s%e%n"` | The log format to use. Ignored if JSON format is enabled. See https://quarkus.io/guides/logging#logging-format for details. | -| logging.console.json | bool | `false` | Whether to log in JSON format. | -| logging.console.threshold | string | `"ALL"` | The log level of the console appender. | -| logging.file | object | `{"enabled":false,"fileName":"polaris.log","format":"%d{yyyy-MM-dd HH:mm:ss,SSS} %-5p [%c{3.}] [%X{requestId},%X{realmId}] [%X{traceId},%X{parentId},%X{spanId},%X{sampled}] (%t) %s%e%n","json":false,"logsDir":"/deployments/logs","rotation":{"fileSuffix":null,"maxBackupIndex":5,"maxFileSize":"100Mi"},"storage":{"className":"standard","selectorLabels":{},"size":"512Gi"},"threshold":"ALL"}` | Configuration for the file appender. | -| logging.file.enabled | bool | `false` | Whether to enable the file appender. | -| logging.file.fileName | string | `"polaris.log"` | The log file name. | -| logging.file.format | string | `"%d{yyyy-MM-dd HH:mm:ss,SSS} %-5p [%c{3.}] [%X{requestId},%X{realmId}] [%X{traceId},%X{parentId},%X{spanId},%X{sampled}] (%t) %s%e%n"` | The log format to use. Ignored if JSON format is enabled. See https://quarkus.io/guides/logging#logging-format for details. | -| logging.file.json | bool | `false` | Whether to log in JSON format. | -| logging.file.logsDir | string | `"/deployments/logs"` | The local directory where log files are stored. The persistent volume claim will be mounted here. | -| logging.file.rotation | object | `{"fileSuffix":null,"maxBackupIndex":5,"maxFileSize":"100Mi"}` | Log rotation configuration. | -| logging.file.rotation.fileSuffix | string | `nil` | An optional suffix to append to the rotated log files. If present, the rotated log files will be grouped in time buckets, and each bucket will contain at most maxBackupIndex files. The suffix must be in a date-time format that is understood by DateTimeFormatter. If the suffix ends with .gz or .zip, the rotated files will also be compressed using the corresponding algorithm. | -| logging.file.rotation.maxBackupIndex | int | `5` | The maximum number of backup files to keep. | -| logging.file.rotation.maxFileSize | string | `"100Mi"` | The maximum size of the log file before it is rotated. Should be expressed as a Kubernetes quantity. | -| logging.file.storage | object | `{"className":"standard","selectorLabels":{},"size":"512Gi"}` | The log storage configuration. A persistent volume claim will be created using these settings. | -| logging.file.storage.className | string | `"standard"` | The storage class name of the persistent volume claim to create. | -| logging.file.storage.selectorLabels | object | `{}` | Labels to add to the persistent volume claim spec selector; a persistent volume with matching labels must exist. Leave empty if using dynamic provisioning. | -| logging.file.storage.size | string | `"512Gi"` | The size of the persistent volume claim to create. | -| logging.file.threshold | string | `"ALL"` | The log level of the file appender. | -| logging.level | string | `"INFO"` | The log level of the root category, which is used as the default log level for all categories. | -| logging.mdc | object | `{}` | Configuration for MDC (Mapped Diagnostic Context). Values specified here will be added to the log context of all incoming requests and can be used in log patterns. | -| logging.requestIdHeaderName | string | `"Polaris-Request-Id"` | The header name to use for the request ID. | -| managementService | object | `{"annotations":{},"clusterIP":"None","externalTrafficPolicy":null,"internalTrafficPolicy":null,"ports":[{"name":"polaris-mgmt","nodePort":null,"port":8182,"protocol":null,"targetPort":null}],"sessionAffinity":null,"trafficDistribution":null,"type":"ClusterIP"}` | Management service settings. These settings are used to configure liveness and readiness probes, and to configure the dedicated headless service that will expose health checks and metrics, e.g. for metrics scraping and service monitoring. | -| managementService.annotations | object | `{}` | Annotations to add to the service. | -| managementService.clusterIP | string | `"None"` | By default, the management service is headless, i.e. it does not have a cluster IP. This is generally the right option for exposing health checks and metrics, e.g. for metrics scraping and service monitoring. | -| managementService.ports | list | `[{"name":"polaris-mgmt","nodePort":null,"port":8182,"protocol":null,"targetPort":null}]` | The ports the management service will listen on. At least one port is required; the first port implicitly becomes the HTTP port that the application will use for serving management requests. By default, it's 8182. Note: port names must be unique and no more than 15 characters long. | -| managementService.ports[0] | object | `{"name":"polaris-mgmt","nodePort":null,"port":8182,"protocol":null,"targetPort":null}` | The name of the management port. Required. | -| managementService.ports[0].nodePort | string | `nil` | The port on each node on which this service is exposed when type is NodePort or LoadBalancer. Usually assigned by the system. If not specified, a port will be allocated if this Service requires one. If this field is specified when creating a Service which does not need it, creation will fail. | -| managementService.ports[0].port | int | `8182` | The port the management service listens on. By default, the management interface is exposed on HTTP port 8182. | -| managementService.ports[0].protocol | string | `nil` | The IP protocol for this port. Supports "TCP", "UDP", and "SCTP". Default is TCP. | -| managementService.ports[0].targetPort | string | `nil` | Number or name of the port to access on the pods targeted by the service. If this is a string, it will be looked up as a named port in the target Pod's container ports. If this is not specified, the value of the 'port' field is used. | -| managementService.type | string | `"ClusterIP"` | The type of service to create. Valid values are: ExternalName, ClusterIP, NodePort, and LoadBalancer. The default value is ClusterIP. | -| metrics.enabled | bool | `true` | Specifies whether metrics for the polaris server should be enabled. | -| metrics.tags | object | `{}` | Additional tags (dimensional labels) to add to the metrics. | -| nodeSelector | object | `{}` | Node labels which must match for the polaris pod to be scheduled on that node. See https://kubernetes.io/docs/concepts/scheduling-eviction/assign-pod-node/#nodeselector. | -| oidc | object | `{"authServeUrl":null,"client":{"id":"polaris","secret":{"key":"clientSecret","name":null}},"principalMapper":{"idClaimPath":null,"nameClaimPath":null,"type":"default"},"principalRolesMapper":{"filter":null,"mappings":[],"rolesClaimPath":null,"type":"default"}}` | Polaris OIDC configuration. Only relevant when at least one realm is configured for external (or mixed) authentication. The currently supported configuration is for a single, default OIDC tenant. For more complex scenarios, including OIDC multi-tenancy, you will need to provide the relevant configuration using the `advancedConfig` section. | -| oidc.authServeUrl | string | `nil` | The authentication server URL. Must be provided if at least one realm is configured for external authentication. | -| oidc.client | object | `{"id":"polaris","secret":{"key":"clientSecret","name":null}}` | The client to use when authenticating with the authentication server. | -| oidc.client.id | string | `"polaris"` | The client ID to use when contacting the authentication server's introspection endpoint in order to validate tokens. | -| oidc.client.secret | object | `{"key":"clientSecret","name":null}` | The secret to pull the client secret from. If no client secret is required, leave the secret name unset. | -| oidc.client.secret.key | string | `"clientSecret"` | The key name inside the secret to pull the client secret from. | -| oidc.client.secret.name | string | `nil` | The name of the secret to pull the client secret from. If not provided, the client is assumed to not require a client secret when contacting the introspection endpoint. | -| oidc.principalMapper | object | `{"idClaimPath":null,"nameClaimPath":null,"type":"default"}` | Principal mapping configuration. | -| oidc.principalMapper.idClaimPath | string | `nil` | The path to the claim that contains the principal ID. Nested paths can be expressed using "/" as a separator, e.g. "polaris/principal_id" would look for the "principal_id" field inside the "polaris" object in the token claims. Optional. Either this option or `nameClaimPath` (or both) must be provided. | -| oidc.principalMapper.nameClaimPath | string | `nil` | The claim that contains the principal name. Nested paths can be expressed using "/" as a separator, e.g. "polaris/principal_name" would look for the "principal_name" field inside the "polaris" object in the token claims. Optional. Either this option or `idClaimPath` (or both) must be provided. | -| oidc.principalMapper.type | string | `"default"` | The `PrincipalMapper` implementation to use. Only one built-in type is supported: default. | -| oidc.principalRolesMapper | object | `{"filter":null,"mappings":[],"rolesClaimPath":null,"type":"default"}` | Principal roles mapping configuration. | -| oidc.principalRolesMapper.filter | string | `nil` | A regular expression that matches the role names in the identity. Only roles that match this regex will be included in the Polaris-specific roles. | -| oidc.principalRolesMapper.mappings | list | `[]` | A list of regex mappings that will be applied to each role name in the identity. This can be used to transform the role names in the identity into role names as expected by Polaris. The default Authenticator expects the security identity to expose role names in the format `POLARIS_ROLE:`. | -| oidc.principalRolesMapper.rolesClaimPath | string | `nil` | The path to the claim that contains the principal roles. Nested paths can be expressed using "/" as a separator, e.g. "polaris/principal_roles" would look for the "principal_roles" field inside the "polaris" object in the token claims. If not set, Quarkus looks for roles in standard locations. See https://quarkus.io/guides/security-oidc-bearer-token-authentication#token-claims-and-security-identity-roles. | -| oidc.principalRolesMapper.type | string | `"default"` | The `PrincipalRolesMapper` implementation to use. Only one built-in type is supported: default. | -| persistence | object | `{"relationalJdbc":{"secret":{"jdbcUrl":"jdbcUrl","name":null,"password":"password","username":"username"}},"type":"in-memory"}` | Polaris persistence configuration. | -| persistence.relationalJdbc | object | `{"secret":{"jdbcUrl":"jdbcUrl","name":null,"password":"password","username":"username"}}` | The configuration for the relational-jdbc persistence manager. | -| persistence.relationalJdbc.secret | object | `{"jdbcUrl":"jdbcUrl","name":null,"password":"password","username":"username"}` | The secret name to pull the database connection properties from. | -| persistence.relationalJdbc.secret.jdbcUrl | string | `"jdbcUrl"` | The secret key holding the database JDBC connection URL | -| persistence.relationalJdbc.secret.name | string | `nil` | The secret name to pull database connection properties from | -| persistence.relationalJdbc.secret.password | string | `"password"` | The secret key holding the database password for authentication | -| persistence.relationalJdbc.secret.username | string | `"username"` | The secret key holding the database username for authentication | -| persistence.type | string | `"in-memory"` | The type of persistence to use. Two built-in types are supported: in-memory and relational-jdbc. The eclipse-link type is also supported but is deprecated. | -| podAnnotations | object | `{}` | Annotations to apply to polaris pods. | -| podLabels | object | `{}` | Additional Labels to apply to polaris pods. | -| podSecurityContext | object | `{"fsGroup":10001,"seccompProfile":{"type":"RuntimeDefault"}}` | Security context for the polaris pod. See https://kubernetes.io/docs/tasks/configure-pod-container/security-context/. | -| podSecurityContext.fsGroup | int | `10001` | GID 10001 is compatible with Polaris OSS default images; change this if you are using a different image. | -| rateLimiter | object | `{"tokenBucket":{"requestsPerSecond":9999,"type":"default","window":"PT10S"},"type":"no-op"}` | Polaris rate limiter configuration. | -| rateLimiter.tokenBucket | object | `{"requestsPerSecond":9999,"type":"default","window":"PT10S"}` | The configuration for the default rate limiter, which uses the token bucket algorithm with one bucket per realm. | -| rateLimiter.tokenBucket.requestsPerSecond | int | `9999` | The maximum number of requests per second allowed for each realm. | -| rateLimiter.tokenBucket.type | string | `"default"` | The type of the token bucket rate limiter. Only the default type is supported out of the box. | -| rateLimiter.tokenBucket.window | string | `"PT10S"` | The time window. | -| rateLimiter.type | string | `"no-op"` | The type of rate limiter filter to use. Two built-in types are supported: default and no-op. | -| readinessProbe | object | `{"failureThreshold":3,"initialDelaySeconds":5,"periodSeconds":10,"successThreshold":1,"timeoutSeconds":10}` | Configures the readiness probe for polaris pods. | -| readinessProbe.failureThreshold | int | `3` | Minimum consecutive failures for the probe to be considered failed after having succeeded. Minimum value is 1. | -| readinessProbe.initialDelaySeconds | int | `5` | Number of seconds after the container has started before readiness probes are initiated. Minimum value is 0. | -| readinessProbe.periodSeconds | int | `10` | How often (in seconds) to perform the probe. Minimum value is 1. | -| readinessProbe.successThreshold | int | `1` | Minimum consecutive successes for the probe to be considered successful after having failed. Minimum value is 1. | -| readinessProbe.timeoutSeconds | int | `10` | Number of seconds after which the probe times out. Minimum value is 1. | -| realmContext | object | `{"realms":["POLARIS"],"type":"default"}` | Realm context resolver configuration. | -| realmContext.realms | list | `["POLARIS"]` | List of valid realms, for use with the default realm context resolver. The first realm in the list is the default realm. Realms not in this list will be rejected. | -| realmContext.type | string | `"default"` | The type of realm context resolver to use. Two built-in types are supported: default and test; test is not recommended for production as it does not perform any realm validation. | -| replicaCount | int | `1` | The number of replicas to deploy (horizontal scaling). Beware that replicas are stateless; don't set this number > 1 when using in-memory meta store manager. | -| resources | object | `{}` | Configures the resources requests and limits for polaris pods. We usually recommend not to specify default resources and to leave this as a conscious choice for the user. This also increases chances charts run on environments with little resources, such as Minikube. If you do want to specify resources, uncomment the following lines, adjust them as necessary, and remove the curly braces after 'resources:'. | -| revisionHistoryLimit | string | `nil` | The number of old ReplicaSets to retain to allow rollback (if not set, the default Kubernetes value is set to 10). | -| service | object | `{"annotations":{},"clusterIP":null,"externalTrafficPolicy":null,"internalTrafficPolicy":null,"ports":[{"name":"polaris-http","nodePort":null,"port":8181,"protocol":null,"targetPort":null}],"sessionAffinity":null,"trafficDistribution":null,"type":"ClusterIP"}` | Polaris main service settings. | -| service.annotations | object | `{}` | Annotations to add to the service. | -| service.clusterIP | string | `nil` | You can specify your own cluster IP address If you define a Service that has the .spec.clusterIP set to "None" then Kubernetes does not assign an IP address. Instead, DNS records for the service will return the IP addresses of each pod targeted by the server. This is called a headless service. See https://kubernetes.io/docs/concepts/services-networking/service/#headless-services | -| service.externalTrafficPolicy | string | `nil` | Controls how traffic from external sources is routed. Valid values are Cluster and Local. The default value is Cluster. Set the field to Cluster to route traffic to all ready endpoints. Set the field to Local to only route to ready node-local endpoints. If the traffic policy is Local and there are no node-local endpoints, traffic is dropped by kube-proxy. | -| service.internalTrafficPolicy | string | `nil` | Controls how traffic from internal sources is routed. Valid values are Cluster and Local. The default value is Cluster. Set the field to Cluster to route traffic to all ready endpoints. Set the field to Local to only route to ready node-local endpoints. If the traffic policy is Local and there are no node-local endpoints, traffic is dropped by kube-proxy. | -| service.ports | list | `[{"name":"polaris-http","nodePort":null,"port":8181,"protocol":null,"targetPort":null}]` | The ports the service will listen on. At least one port is required; the first port implicitly becomes the HTTP port that the application will use for serving API requests. By default, it's 8181. Note: port names must be unique and no more than 15 characters long. | -| service.ports[0] | object | `{"name":"polaris-http","nodePort":null,"port":8181,"protocol":null,"targetPort":null}` | The name of the port. Required. | -| service.ports[0].nodePort | string | `nil` | The port on each node on which this service is exposed when type is NodePort or LoadBalancer. Usually assigned by the system. If not specified, a port will be allocated if this Service requires one. If this field is specified when creating a Service which does not need it, creation will fail. | -| service.ports[0].port | int | `8181` | The port the service listens on. By default, the HTTP port is 8181. | -| service.ports[0].protocol | string | `nil` | The IP protocol for this port. Supports "TCP", "UDP", and "SCTP". Default is TCP. | -| service.ports[0].targetPort | string | `nil` | Number or name of the port to access on the pods targeted by the service. If this is a string, it will be looked up as a named port in the target Pod's container ports. If this is not specified, the value of the 'port' field is used. | -| service.sessionAffinity | string | `nil` | The session affinity for the service. Valid values are: None, ClientIP. The default value is None. ClientIP enables sticky sessions based on the client's IP address. This is generally beneficial to Polaris deployments, but some testing may be required in order to make sure that the load is distributed evenly among the pods. Also, this setting affects only internal clients, not external ones. If Ingress is enabled, it is recommended to set sessionAffinity to None. | -| service.trafficDistribution | string | `nil` | The traffic distribution field provides another way to influence traffic routing within a Kubernetes Service. While traffic policies focus on strict semantic guarantees, traffic distribution allows you to express preferences such as routing to topologically closer endpoints. The only valid value is: PreferClose. The default value is implementation-specific. | -| service.type | string | `"ClusterIP"` | The type of service to create. Valid values are: ExternalName, ClusterIP, NodePort, and LoadBalancer. The default value is ClusterIP. | -| serviceAccount.annotations | object | `{}` | Annotations to add to the service account. | -| serviceAccount.create | bool | `true` | Specifies whether a service account should be created. | -| serviceAccount.name | string | `""` | The name of the service account to use. If not set and create is true, a name is generated using the fullname template. | -| serviceMonitor.enabled | bool | `true` | Specifies whether a ServiceMonitor for Prometheus operator should be created. | -| serviceMonitor.interval | string | `""` | The scrape interval; leave empty to let Prometheus decide. Must be a valid duration, e.g. 1d, 1h30m, 5m, 10s. | -| serviceMonitor.labels | object | `{}` | Labels for the created ServiceMonitor so that Prometheus operator can properly pick it up. | -| serviceMonitor.metricRelabelings | list | `[]` | Relabeling rules to apply to metrics. Ref https://prometheus.io/docs/prometheus/latest/configuration/configuration/#relabel_config. | -| storage | object | `{"secret":{"awsAccessKeyId":null,"awsSecretAccessKey":null,"gcpToken":null,"gcpTokenLifespan":null,"name":null}}` | Storage credentials for the server. If the following properties are unset, default credentials will be used, in which case the pod must have the necessary permissions to access the storage. | -| storage.secret | object | `{"awsAccessKeyId":null,"awsSecretAccessKey":null,"gcpToken":null,"gcpTokenLifespan":null,"name":null}` | The secret to pull storage credentials from. | -| storage.secret.awsAccessKeyId | string | `nil` | The key in the secret to pull the AWS access key ID from. Only required when using AWS. | -| storage.secret.awsSecretAccessKey | string | `nil` | The key in the secret to pull the AWS secret access key from. Only required when using AWS. | -| storage.secret.gcpToken | string | `nil` | The key in the secret to pull the GCP token from. Only required when using GCP. | -| storage.secret.gcpTokenLifespan | string | `nil` | The key in the secret to pull the GCP token expiration time from. Only required when using GCP. Must be a valid ISO 8601 duration. The default is PT1H (1 hour). | -| storage.secret.name | string | `nil` | The name of the secret to pull storage credentials from. | -| tasks | object | `{"maxConcurrentTasks":null,"maxQueuedTasks":null}` | Polaris asynchronous task executor configuration. | -| tasks.maxConcurrentTasks | string | `nil` | The maximum number of concurrent tasks that can be executed at the same time. The default is the number of available cores. | -| tasks.maxQueuedTasks | string | `nil` | The maximum number of tasks that can be queued up for execution. The default is Integer.MAX_VALUE. | -| tolerations | list | `[]` | A list of tolerations to apply to polaris pods. See https://kubernetes.io/docs/concepts/scheduling-eviction/taint-and-toleration/. | -| tracing.attributes | object | `{}` | Resource attributes to identify the polaris service among other tracing sources. See https://opentelemetry.io/docs/reference/specification/resource/semantic_conventions/#service. If left empty, traces will be attached to a service named "Apache Polaris"; to change this, provide a service.name attribute here. | -| tracing.enabled | bool | `false` | Specifies whether tracing for the polaris server should be enabled. | -| tracing.endpoint | string | `"http://otlp-collector:4317"` | The collector endpoint URL to connect to (required). The endpoint URL must have either the http:// or the https:// scheme. The collector must talk the OpenTelemetry protocol (OTLP) and the port must be its gRPC port (by default 4317). See https://quarkus.io/guides/opentelemetry for more information. | -| tracing.sample | string | `"1.0d"` | Which requests should be sampled. Valid values are: "all", "none", or a ratio between 0.0 and "1.0d" (inclusive). E.g. "0.5d" means that 50% of the requests will be sampled. Note: avoid entering numbers here, always prefer a string representation of the ratio. | diff --git a/1.2.0/managing-security/_index.md b/1.2.0/managing-security/_index.md deleted file mode 100644 index 3a10c8900b..0000000000 --- a/1.2.0/managing-security/_index.md +++ /dev/null @@ -1,28 +0,0 @@ ---- -# -# Licensed to the Apache Software Foundation (ASF) under one -# or more contributor license agreements. See the NOTICE file -# distributed with this work for additional information -# regarding copyright ownership. The ASF licenses this file -# to you under the Apache License, Version 2.0 (the -# "License"); you may not use this file except in compliance -# with the License. You may obtain a copy of the License at -# -# http://www.apache.org/licenses/LICENSE-2.0 -# -# Unless required by applicable law or agreed to in writing, -# software distributed under the License is distributed on an -# "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY -# KIND, either express or implied. See the License for the -# specific language governing permissions and limitations -# under the License. -# -title: Managing Security -linkTitle: Managing Security -type: docs -weight: 550 ---- - -## [Access Control]({{< relref "access-control" >}}) - -## [Authentification and Identity Providers]({{< relref "external-idp" >}}) \ No newline at end of file diff --git a/1.2.0/managing-security/access-control.md b/1.2.0/managing-security/access-control.md deleted file mode 100644 index b8a1b697ca..0000000000 --- a/1.2.0/managing-security/access-control.md +++ /dev/null @@ -1,201 +0,0 @@ ---- -# -# Licensed to the Apache Software Foundation (ASF) under one -# or more contributor license agreements. See the NOTICE file -# distributed with this work for additional information -# regarding copyright ownership. The ASF licenses this file -# to you under the Apache License, Version 2.0 (the -# "License"); you may not use this file except in compliance -# with the License. You may obtain a copy of the License at -# -# http://www.apache.org/licenses/LICENSE-2.0 -# -# Unless required by applicable law or agreed to in writing, -# software distributed under the License is distributed on an -# "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY -# KIND, either express or implied. See the License for the -# specific language governing permissions and limitations -# under the License. -# -title: Role-Based Access Control -linkTitle: Access Control -type: docs -weight: 200 ---- - -This section provides information about how access control works for Apache Polaris (Incubating). - -Polaris uses a role-based access control (RBAC) model in which the Polaris administrator assigns access privileges to catalog roles -and then grants access to resources to principals by assigning catalog roles to principal roles. - -These are the key concepts to understanding access control in Polaris: - -- **Securable object** -- **Principal role** -- **Catalog role** -- **Privilege** - -## Securable object - -A securable object is an object to which access can be granted. Polaris -has the following securable objects: - -- Catalog -- Namespace -- Iceberg table -- View -- Policy - -## Principal role - -A principal role is a resource in Polaris that you can use to logically group Polaris principals together and grant privileges on -securable objects. - -Polaris supports a many-to-many relationship between principals and principal roles. For example, to grant the same privileges to -multiple principals, you can assign a single principal role to those principals. Likewise, a principal can be granted -multiple principal roles. - -You don't grant privileges directly to a principal role. Instead, you configure object permissions at the catalog role level, and then grant -catalog roles to a principal role. - -The following table shows examples of principal roles that you might configure in Polaris: - -| Principal role name | Description | -| -----------------------| ----------- | -| Data_engineer | A role that is granted to multiple principals for running data engineering jobs. | -| Data_scientist | A role that is granted to multiple principals for running data science or AI jobs. | - -## Catalog role - -A catalog role belongs to a particular catalog resource in Polaris and specifies a set of permissions for actions on the catalog or objects -in the catalog, such as catalog namespaces or tables. You can create one or more catalog roles for a catalog. - -You grant privileges to a catalog role and then grant the catalog role to a principal role to bestow the privileges to one or more principals. - -Polaris also supports a many-to-many relationship between catalog roles and principal roles. You can grant the same catalog role to one or more -principal roles. Likewise, a principal role can be granted to one or more catalog roles. - -The following table displays examples of catalog roles that you might -configure in Polaris: - -| Example Catalog role | Description| -| -----------------------|-----------| -| Catalog administrators | A role that has been granted multiple privileges to emulate full access to the catalog.
Principal roles that have been granted this role are permitted to create, alter, read, write, and drop tables in the catalog. | -| Catalog readers | A role that has been granted read-only privileges to tables in the catalog.
Principal roles that have been granted this role are allowed to read from tables in the catalog. | -| Catalog contributor | A role that has been granted read and write access privileges to all tables that belong to the catalog.
Principal roles that have been granted this role are allowed to perform read and write operations on tables in the catalog. | - -## RBAC model - -The following diagram illustrates the RBAC model used by Polaris. For each catalog, the Polaris administrator assigns access -privileges to catalog roles and then grants principals access to resources by assigning catalog roles to principal roles. Polaris -supports a many-to-many relationship between principals and principal roles. - -![Diagram that shows the RBAC model for Apache Polaris.](/img/rbac-model.svg "Apache Polaris RBAC model") - -## Access control privileges - -This section describes the privileges that are available in the Polaris access control model. Privileges are granted to catalog roles, catalog -roles are granted to principal roles, and principal roles are granted to principals to specify the operations that principals can -perform on objects in Polaris. - -To grant the full set of privileges (drop, list, read, write, etc.) on an object, you can use the *full privilege* option. - -### Table privileges - -| Privilege | Description | -| --------- | ----------- | -| TABLE_CREATE | Enables registering a table with the catalog. | -| TABLE_DROP | Enables dropping a table from the catalog. | -| TABLE_LIST | Enables listing any table in the catalog. | -| TABLE_READ_PROPERTIES | Enables reading properties of the table. | -| TABLE_WRITE_PROPERTIES | Enables configuring properties for the table. | -| TABLE_READ_DATA | Enables reading data from the table by receiving short-lived read-only storage credentials from the catalog. | -| TABLE_WRITE_DATA | Enables writing data to the table by receiving short-lived read+write storage credentials from the catalog. | -| TABLE_FULL_METADATA | Grants all table privileges, except TABLE_READ_DATA and TABLE_WRITE_DATA, which need to be granted individually. | -| TABLE_ATTACH_POLICY | Enables attaching policy to a table. | -| TABLE_DETACH_POLICY | Enables detaching policy from a table. | - -### View privileges - -| Privilege | Description | -| --------- | ----------- | -| VIEW_CREATE | Enables registering a view with the catalog. | -| VIEW_DROP | Enables dropping a view from the catalog. | -| VIEW_LIST | Enables listing any views in the catalog. | -| VIEW_READ_PROPERTIES | Enables reading all the view properties. | -| VIEW_WRITE_PROPERTIES | Enables configuring view properties. | -| VIEW_FULL_METADATA | Grants all view privileges. | - -### Namespace privileges - -| Privilege | Description | -| --------- | ----------- | -| NAMESPACE_CREATE | Enables creating a namespace in a catalog. | -| NAMESPACE_DROP | Enables dropping the namespace from the catalog. | -| NAMESPACE_LIST | Enables listing any object in the namespace, including nested namespaces and tables. | -| NAMESPACE_READ_PROPERTIES | Enables reading all the namespace properties. | -| NAMESPACE_WRITE_PROPERTIES | Enables configuring namespace properties. | -| NAMESPACE_FULL_METADATA | Grants all namespace privileges. | -| NAMESPACE_ATTACH_POLICY | Enables attaching policy to a namespace. | -| NAMESPACE_DETACH_POLICY | Enables detaching policy from a namespace. | - -### Catalog privileges - -| Privilege | Description | -| -----------------------| ----------- | -| CATALOG_MANAGE_ACCESS | Includes the ability to grant or revoke privileges on objects in a catalog to catalog roles, and the ability to grant or revoke catalog roles to or from principal roles. | -| CATALOG_MANAGE_CONTENT | Enables full management of content for the catalog. This privilege encompasses the following privileges:
  • CATALOG_MANAGE_METADATA
  • TABLE_FULL_METADATA
  • NAMESPACE_FULL_METADATA
  • VIEW_FULL_METADATA
  • TABLE_WRITE_DATA
  • TABLE_READ_DATA
  • CATALOG_READ_PROPERTIES
  • CATALOG_WRITE_PROPERTIES
| -| CATALOG_MANAGE_METADATA | Enables full management of the catalog, catalog roles, namespaces, and tables. | -| CATALOG_READ_PROPERTIES | Enables listing catalogs and reading properties of the catalog. | -| CATALOG_WRITE_PROPERTIES | Enables configuring catalog properties. | -| CATALOG_ATTACH_POLICY | Enables attaching policy to a catalog. | -| CATALOG_DETACH_POLICY | Enables detaching policy from a catalog. | - -### Policy privileges - -| Privilege | Description | -| -----------------------| ----------- | -| POLICY_CREATE | Enables creating a policy under specified namespace. | -| POLICY_READ | Enables reading policy content and metadata. | -| POLICY_WRITE | Enables updating the policy details such as its content or description. | -| POLICY_LIST | Enables listing any policy from the catalog. | -| POLICY_DROP | Enables dropping a policy if it is not attached to any resource entity. | -| POLICY_FULL_METADATA | Grants all policy privileges. | -| POLICY_ATTACH | Enables policy to be attached to entities. | -| POLICY_DETACH | Enables policy to be detached from entities. | - -## RBAC example - -The following diagram illustrates how RBAC works in Polaris and -includes the following users: - -- **Alice:** A service admin who signs up for Polaris. Alice can - create principals. She can also create catalogs and - namespaces and configure access control for Polaris resources. - -- **Bob:** A data engineer who uses Apache Spark™ to - interact with Polaris. - - - Alice has created a principal for Bob. It has been - granted the Data_engineer principal role, which in turn has been - granted the following catalog roles: Catalog contributor and - Data administrator (for both the Silver and Gold zone catalogs - in the following diagram). - - - The Catalog contributor role grants permission to create - namespaces and tables in the Bronze zone catalog. - - - The Data administrator roles grant full administrative rights to - the Silver zone catalog and Gold zone catalog. - -- **Mark:** A data scientist who uses trains models with data managed - by Polaris. - - - Alice has created a principal for Mark. It has been - granted the Data_scientist principal role, which in turn has - been granted the catalog role named Catalog reader. - - - The Catalog reader role grants read-only access for a catalog - named Gold zone catalog. - -![Diagram that shows an example of how RBAC works in Apache Polaris.](/img/rbac-example.svg "Apache Polaris RBAC example") diff --git a/1.2.0/managing-security/external-idp/_index.md b/1.2.0/managing-security/external-idp/_index.md deleted file mode 100644 index 4ccecbadfd..0000000000 --- a/1.2.0/managing-security/external-idp/_index.md +++ /dev/null @@ -1,255 +0,0 @@ ---- -# -# Licensed to the Apache Software Foundation (ASF) under one -# or more contributor license agreements. See the NOTICE file -# distributed with this work for additional information -# regarding copyright ownership. The ASF licenses this file -# to you under the Apache License, Version 2.0 (the -# "License"); you may not use this file except in compliance -# with the License. You may obtain a copy of the License at -# -# http://www.apache.org/licenses/LICENSE-2.0 -# -# Unless required by applicable law or agreed to in writing, -# software distributed under the License is distributed on an -# "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY -# KIND, either express or implied. See the License for the -# specific language governing permissions and limitations -# under the License. -# -title: Identity Providers -linkTitle: Identity Providers -type: docs -weight: 300 ---- - -Apache Polaris supports authentication via external identity providers (IdPs) using OpenID Connect (OIDC) in addition to the internal authentication system. This feature enables flexible identity federation with enterprise IdPs and allows gradual migration or hybrid authentication strategies across realms in Polaris. - -## Authentication Types - -Polaris supports three authentication modes: - -1. `internal` (Default) - - Only Polaris internal authentication is used. -2. `external` - - Authenticates using external OIDC providers (via Quarkus OIDC). - - Disables the internal token endpoint (returns HTTP 501). -3. `mixed` - - Tries internal authentication first; if this fails, it falls back to OIDC. - -Authentication can be configured globally or per realm by setting the following properties: - -```properties -# Global default -polaris.authentication.type=internal -# Per-realm override -polaris.authentication.realm1.type=external -polaris.authentication.realm2.type=mixed -``` - -## Key Components - -### Authenticator - -The `Authenticator` is a component responsible for resolving the principal and the principal roles, and for creating a `PolarisPrincipal` from the credentials provided by the authentication process. It is a central component and is invoked for all types of authentication. - -The `type` property is used to define the `Authenticator` implementation. It is overridable per realm: - -```properties -polaris.authentication.authenticator.type=default -polaris.authentication.realm1.authenticator.type=custom -``` - -## Internal Authentication Configuration - -### Token Broker - -The `TokenBroker` signs and verifies tokens to ensure that they can be validated and remain unaltered. - -```properties -polaris.authentication.token-broker.type=rsa-key-pair -polaris.authentication.token-broker.max-token-generation=PT1H -``` - -Two types are available: - -- `rsa-key-pair` (recommended for production): Uses an RSA key pair for token signing and validation. -- `symmetric-key`: Uses a shared secret for both operations; suitable for single-node deployments or testing. - -The property `polaris.authentication.token-broker.max-token-generation` specifies the maximum validity duration of tokens issued by the internal `TokenBroker`. - -- Format: ISO-8601 duration (e.g., `PT1H` for 1 hour, `PT30M` for 30 minutes). -- Default: `PT1H`. - -### Token Service - -The Token Service and `TokenServiceConfiguration` (Quarkus) is responsible for issuing and validating tokens (e.g., bearer tokens) for authenticated principals when internal authentication is used. It works in coordination with the `Authenticator` and `TokenBroker`. The default implementation is `default`, and this must be configured when using internal authentication. - -```properties -polaris.authentication.token-service.type=default -``` - -### Role Mapping - -When using internal authentication, token requests should include a `scope` parameter that specifies the roles to be activated for the principal. The `scope` parameter is a space-separated list of role names. - -The default `ActiveRolesProvider` expects role names to be in the following format: `PRINCIPAL_ROLE:`. - -For example, if the principal has the roles `service_admin` and `catalog_admin` and wants both activated, the `scope` parameter should look like this: - -```properties -scope=PRINCIPAL_ROLE:service_admin PRINCIPAL_ROLE:catalog_admin -``` - -Here is an example of a full request to the Polaris token endpoint using internal authentication: - -```http request -POST /api/catalog/v1/oauth/tokens HTTP/1.1 -Host: polaris.example.com:8181 -Content-Type: application/x-www-form-urlencoded - -grant_type=client_credentials&client_id=root&client_secret=s3cr3t&scope=PRINCIPAL_ROLE%3Aservice_admin%20PRINCIPAL_ROLE%3Acatalog_admin -``` - -## External Authentication Configuration - -External authentication is configured via Quarkus OIDC and Polaris-specific OIDC extensions. The following settings are used to integrate with an identity provider and extract identity and role information from tokens. - -### OIDC Tenant Configuration - -At least one OIDC tenant must be explicitly enabled. In Polaris, realms and OIDC tenants are distinct concepts. An OIDC tenant represents a specific identity provider configuration (e.g., `quarkus.oidc.idp1`). A [realm]({{% ref "../../realm" %}}) is a logical partition within Polaris. - -- Multiple realms can share a single OIDC tenant. -- Each realm can be associated with only one OIDC tenant. - -Therefore, multi-realm deployments can share a common identity provider while still enforcing realm-level scoping. To configure the default tenant: - -```properties -quarkus.oidc.tenant-enabled=true -quarkus.oidc.auth-server-url=https://auth.example.com/realms/polaris -quarkus.oidc.client-id=polaris -``` - -Alternatively, it is possible to use multiple named tenants. Each OIDC-named tenant is then configured with standard Quarkus settings: - -```properties -quarkus.oidc.oidc-tenant1.auth-server-url=http://localhost:8080/realms/polaris -quarkus.oidc.oidc-tenant1.client-id=client1 -quarkus.oidc.oidc-tenant1.application-type=service -``` - -When using multiple OIDC tenants, it's your responsibility to configure tenant resolution appropriately. See the [Quarkus OpenID Connect Multitenancy Guide](https://quarkus.io/guides/security-openid-connect-multitenancy#tenant-resolution). - -### Principal Mapping - -While OIDC tenant resolution is entirely delegated to Quarkus, Polaris requires additional configuration to extract the Polaris principal and its roles from the credentials generated and validated by Quarkus. This part of the authentication process is configured with Polaris-specific properties that map JWT claims to Polaris principal fields: - -```properties -polaris.oidc.principal-mapper.type=default -polaris.oidc.principal-mapper.id-claim-path=polaris/principal_id -polaris.oidc.principal-mapper.name-claim-path=polaris/principal_name -``` - -These properties are overridable per OIDC tenant: - -```properties -polaris.oidc.oidc-tenant1.principal-mapper.id-claim-path=polaris/principal_id -polaris.oidc.oidc-tenant1.principal-mapper.name-claim-path=polaris/principal_name -``` - -{{< alert important >}} -The default implementation of PrincipalMapper can only work with JWT tokens. If your IDP issues opaque tokens instead, you will need to provide a custom implementation. -{{< /alert >}} - -### Role Mapping - -Similarly, Polaris requires additional configuration to map roles provided by Quarkus to roles defined in Polaris. The process happens in two phases: first, Quarkus maps the JWT claims to security roles, using the `quarkus.oidc.roles.*` properties; then, Polaris-specific properties are used to map the Quarkus-provided security roles to Polaris roles: - -```properties -quarkus.oidc.roles.role-claim-path=polaris/roles -polaris.oidc.principal-roles-mapper.type=default -polaris.oidc.principal-roles-mapper.filter=^(?!profile$|email$).* -polaris.oidc.principal-roles-mapper.mappings[0].regex=^.*$ -polaris.oidc.principal-roles-mapper.mappings[0].replacement=PRINCIPAL_ROLE:$0 -``` - -These mappings can be overridden per OIDC tenant and used across different realms that rely on external identity providers. For example: - -```properties -polaris.oidc.oidc-tenant1.principal-roles-mapper.type=custom -polaris.oidc.oidc-tenant1.principal-roles-mapper.filter=PRINCIPAL_ROLE:.* -polaris.oidc.oidc-tenant1.principal-roles-mapper.mappings[0].regex=PRINCIPAL_ROLE:(.*) -polaris.oidc.oidc-tenant1.principal-roles-mapper.mappings[0].replacement=PRINCIPAL_ROLE:$1 -``` - -The default `Authenticator` expects the security identity to expose role names in the following format: `PRINCIPAL_ROLE:`. You can use the `filter` and `mappings` properties to adjust the role names as they appear in the JWT claims. - -For example, assume that the security identity produced by Quarkus exposes the following roles: `role_service_admin` and `role_catalog_admin`. Polaris expects `PRINCIPAL_ROLE:service_admin` and `PRINCIPAL_ROLE:catalog_admin` respectively. The following configuration can be used to achieve the desired mapping: - -```properties -# Exclude role names that don't start with "role_" -polaris.oidc.principal-roles-mapper.filter=role_.* -# Extract the text after "role_" -polaris.oidc.principal-roles-mapper.mappings[0].regex=role_(.*) -# Replace the extracted text with "PRINCIPAL_ROLE:" -polaris.oidc.principal-roles-mapper.mappings[0].replacement=PRINCIPAL_ROLE:$1 -``` - -See more examples below. - -### Example JWT Mappings - -#### Example 1: Custom Claim Paths - -- JWT - - ```json - { - "polaris": - { - "roles": ["PRINCIPAL_ROLE:ALL"], - "principal_name": "root", - "principal_id": 1 - } - } - ``` - -- Configuration - - ```properties - quarkus.oidc.roles.role-claim-path=polaris/roles - polaris.oidc.principal-mapper.id-claim-path=polaris/principal_id - polaris.oidc.principal-mapper.name-claim-path=polaris/principal_name - ``` - -#### Example 2: Generic OIDC Claims - -- JWT - - ```json - { - "sub": "1", - "scope": "service_admin catalog_admin profile email", - "preferred_username": "root" - } - ``` - -- Configuration - - ```properties - quarkus.oidc.roles.role-claim-path=scope - polaris.oidc.principal-mapper.id-claim-path=sub - polaris.oidc.principal-mapper.name-claim-path=preferred_username - polaris.oidc.principal-roles-mapper.filter=^(?!profile$|email$).* - polaris.oidc.principal-roles-mapper.mappings[0].regex=^.*$ - polaris.oidc.principal-roles-mapper.mappings[0].replacement=PRINCIPAL_ROLE:$0 - ``` - -- Result - - Polaris roles: `PRINCIPAL_ROLE:service_admin` and `PRINCIPAL_ROLE:catalog_admin` - -### Additional Links - -* For complete Keycloak integration example, see: [Keycloak External IDP Configuration Guide]({{< relref "../../getting-started/using-polaris/keycloak-idp.md" >}}) -* See [Developer Notes]({{< relref "idp-dev-notes.md" >}}) with internal implementation details for developers who want to understand or extend Polaris authentication. \ No newline at end of file diff --git a/1.2.0/managing-security/external-idp/idp-dev-notes.md b/1.2.0/managing-security/external-idp/idp-dev-notes.md deleted file mode 100644 index 16bc759b8d..0000000000 --- a/1.2.0/managing-security/external-idp/idp-dev-notes.md +++ /dev/null @@ -1,122 +0,0 @@ ---- -# -# Licensed to the Apache Software Foundation (ASF) under one -# or more contributor license agreements. See the NOTICE file -# distributed with this work for additional information -# regarding copyright ownership. The ASF licenses this file -# to you under the Apache License, Version 2.0 (the -# "License"); you may not use this file except in compliance -# with the License. You may obtain a copy of the License at -# -# http://www.apache.org/licenses/LICENSE-2.0 -# -# Unless required by applicable law or agreed to in writing, -# software distributed under the License is distributed on an -# "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY -# KIND, either express or implied. See the License for the -# specific language governing permissions and limitations -# under the License. -# -title: Authentification Development Details -linkTitle: Development Details -type: docs -weight: 301 ---- - -## Developer Architecture Notes - -### Authentication Architecture - -Polaris separates authentication into two logical phases using [Quarkus Security](https://quarkus.io/guides/security-overview): - -1. Credential extraction – parsing headers and tokens -2. Credential authentication – validating identity and assigning roles - -### Key Interfaces - -- [`Authenticator`](https://github.com/apache/polaris/blob/main/runtime/service/src/main/java/org/apache/polaris/service/auth/Authenticator.java): A core interface used to authenticate credentials and resolve principal and principal roles. Roles may be derived from OIDC claims or internal mappings. -- [`InternalPolarisToken`](https://github.com/apache/polaris/blob/main/runtime/service/src/main/java/org/apache/polaris/service/auth/InternalPolarisToken.java): Used in internal auth and inherits from `PrincipalCredential`. - -- The [`DefaultAuthenticator`](https://github.com/apache/polaris/blob/main/runtime/service/src/main/java/org/apache/polaris/service/auth/DefaultAuthenticator.java) is used to implement realm-specific logic based on these abstractions. - -### Token Broker Configuration - -When internal authentication is enabled, Polaris uses [`TokenBroker`](https://github.com/apache/polaris/blob/main/runtime/service/src/main/java/org/apache/polaris/service/auth/TokenBroker.java) to handle the decoding and validation of authentication tokens. These brokers are request-scoped and can be configured per realm. Each realm may use its own strategy, such as RSA key pairs or shared secrets, depending on security requirements. -See [Token Broker description]({{< relref "../external-idp#token-broker" >}}) for configuration details. - -## Developer Authentication Workflows - -### Internal Authentication - -1. [`InternalAuthenticationMechanism`](https://github.com/apache/polaris/blob/main/runtime/service/src/main/java/org/apache/polaris/service/auth/internal/InternalAuthenticationMechanism.java) parses the auth header. -2. Uses [`TokenBroker`](https://github.com/apache/polaris/blob/main/runtime/service/src/main/java/org/apache/polaris/service/auth/TokenBroker.java) to decode the token. -3. Builds [`InternalAuthenticationRequest`](https://github.com/apache/polaris/blob/main/runtime/service/src/main/java/org/apache/polaris/service/auth/internal/InternalAuthenticationRequest.java) and generates `SecurityIdentity` (Quarkus). -4. `Authenticator.authenticate()` validates the credential, resolves the principal and principal roles, then creates the `PolarisPrincipal`. - -### External Authentication - -1. `OidcAuthenticationMechanism` (Quarkus) processes the auth header. -2. [`OidcTenantResolvingAugmentor`](https://github.com/apache/polaris/blob/main/runtime/service/src/main/java/org/apache/polaris/service/auth/external/tenant/OidcTenantResolvingAugmentor.java) selects the OIDC tenant. -3. [`OidcPolarisCredentialAugmentor`](https://github.com/apache/polaris/blob/main/runtime/service/src/main/java/org/apache/polaris/service/auth/external/OidcPolarisCredentialAugmentor.java) extracts JWT claims. -4. `Authenticator.authenticate()` validates the claims, resolves the principal and principal roles, then creates the `PolarisPrincipal`. - -### Mixed Authentication - -1. [`InternalAuthenticationMechanism`](https://github.com/apache/polaris/blob/main/runtime/service/src/main/java/org/apache/polaris/service/auth/internal/InternalAuthenticationMechanism.java) tries decoding. -2. If successful, proceed with internal authentication. -3. Otherwise, fall back to external (OIDC) authentication. - -## OIDC Configuration Reference - -### Principal Mapping - -- Interface: [`PrincipalMapper`](https://github.com/apache/polaris/blob/main/runtime/service/src/main/java/org/apache/polaris/service/auth/external/mapping/PrincipalMapper.java) - - The `PrincipalMapper` is responsible for extracting the Polaris principal ID and display name from OIDC tokens. - -- Implementation selector: - - This property selects the implementation of the `PrincipalMapper` interface. The default implementation extracts fields from specific claim paths. - - ```properties - polaris.oidc.principal-mapper.type=default - ``` - -- Configuration properties for the default implementation: - - ```properties - polaris.oidc.principal-mapper.id-claim-path=polaris/principal_id - polaris.oidc.principal-mapper.name-claim-path=polaris/principal_name - ``` - -- It can be overridden per OIDC tenant. - -### Roles Mapping - -- Interface: [`PrincipalRolesMapper`](https://github.com/apache/polaris/blob/main/runtime/service/src/main/java/org/apache/polaris/service/auth/external/mapping/PrincipalRolesMapper.java) - - Polaris uses this component to transform role claims from OIDC tokens into Polaris roles. - -- Quarkus OIDC configuration: - - This setting instructs Quarkus on where to locate roles within the OIDC token. - - ```properties - quarkus.oidc.roles.role-claim-path=polaris/roles - ``` - -- Implementation selector: - - This property selects the implementation of `PrincipalRolesMapper`. The `default` implementation applies regular expression (regex) transformations to OIDC roles. - - ```properties - polaris.oidc.principal-roles-mapper.type=default - ``` - -- Configuration properties for the default implementation: - - ```properties - polaris.oidc.principal-roles-mapper.filter=^(?!profile$|email$).* - polaris.oidc.principal-roles-mapper.mappings[0].regex=^.*$ - polaris.oidc.principal-roles-mapper.mappings[0].replacement=PRINCIPAL_ROLE:$0 - ``` diff --git a/1.2.0/metastores.md b/1.2.0/metastores.md deleted file mode 100644 index c22bbdd907..0000000000 --- a/1.2.0/metastores.md +++ /dev/null @@ -1,188 +0,0 @@ ---- -# -# Licensed to the Apache Software Foundation (ASF) under one -# or more contributor license agreements. See the NOTICE file -# distributed with this work for additional information -# regarding copyright ownership. The ASF licenses this file -# to you under the Apache License, Version 2.0 (the -# "License"); you may not use this file except in compliance -# with the License. You may obtain a copy of the License at -# -# http://www.apache.org/licenses/LICENSE-2.0 -# -# Unless required by applicable law or agreed to in writing, -# software distributed under the License is distributed on an -# "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY -# KIND, either express or implied. See the License for the -# specific language governing permissions and limitations -# under the License. -# -title: Metastores -type: docs -weight: 700 ---- - -This page explains how to configure and use Polaris metastores with either the recommended Relational JDBC or the -deprecated EclipseLink persistence backends. - -## Relational JDBC -This implementation leverages Quarkus for datasource management and supports configuration through -environment variables or JVM -D flags at startup. For more information, refer to the [Quarkus configuration reference](https://quarkus.io/guides/config-reference#env-file). - -We have 2 options for configuring the persistence backend: - -### 1. Relational JDBC metastore with username and password - -using environment variables: -``` -POLARIS_PERSISTENCE_TYPE=relational-jdbc - -QUARKUS_DATASOURCE_USERNAME= -QUARKUS_DATASOURCE_PASSWORD= -QUARKUS_DATASOURCE_JDBC_URL= -``` -using properties file: - -``` -polaris.persistence.type=relational-jdbc -quarkus.datasource.jdbc.username= -quarkus.datasource.jdbc.password= -quarkus.datasource.jdbc.jdbc-url= -``` - -### 2. AWS Aurora PostgreSQL metastore using IAM AWS authentication - -``` -polaris.persistence.type=relational-jdbc -quarkus.datasource.jdbc.url=jdbc:postgresql://polaris-cluster.cluster-xyz.us-east-1.rds.amazonaws.com:6160/polaris -quarkus.datasource.jdbc.additional-jdbc-properties.wrapperPlugins=iam -quarkus.datasource.username=dbusername -quarkus.datasource.db-kind=postgresql -quarkus.datasource.jdbc.additional-jdbc-properties.ssl=true -quarkus.datasource.jdbc.additional-jdbc-properties.sslmode=require -quarkus.datasource.credentials-provider=aws - -quarkus.rds.credentials-provider.aws.use-quarkus-client=true -quarkus.rds.credentials-provider.aws.username=dbusername -quarkus.rds.credentials-provider.aws.hostname=polaris-cluster.cluster-xyz.us-east-1.rds.amazonaws.com -quarkus.rds.credentials-provider.aws.port=6160 -``` -This is the basic configuration. For more details, please refer to the [Quarkus plugin documentation](https://docs.quarkiverse.io/quarkus-amazon-services/dev/amazon-rds.html#_configuration_reference) - -The Relational JDBC metastore currently relies on a Quarkus-managed datasource and supports only PostgresSQL and H2 databases. This limitation is similar to that of EclipseLink, primarily due to underlying schema differences. At this time, official documentation is provided exclusively for usage with PostgreSQL. -Please refer to the documentation here: -[Configure data sources in Quarkus](https://quarkus.io/guides/datasource) - -Additionally, the retries can be configured via `polaris.persistence.relational.jdbc.*` properties please ref [configuration]({{% ref "configuration" %}}) - -## EclipseLink (Deprecated) -{{< alert important >}} -EclipseLink persistence will be completely removed from Polaris in 1.3.0 or in 2.0.0 (whichever happens earlier). -{{< /alert >}} - -Polaris includes EclipseLink plugin by default with PostgresSQL driver. - -Configure the `polaris.persistence` section in your Polaris configuration file -(`application.properties`) as follows: - -``` -polaris.persistence.type=eclipse-link -polaris.persistence.eclipselink.configuration-file=/path/to/persistence.xml -polaris.persistence.eclipselink.persistence-unit=polaris -``` - -Alternatively, configuration can also be done with environment variables or system properties. Refer -to the [Quarkus Configuration Reference] for more information. - -The `configuration-file` option must point to an [EclipseLink configuration file]. This file, named -`persistence.xml`, is used to set up the database connection properties, which can differ depending -on the type of database and its configuration. - -{{< alert note >}} -You have to locate the `persistence.xml` at least two folders down to the root folder, e.g. `/deployments/config/persistence.xml` is OK, whereas `/deployments/persistence.xml` will cause an infinity loop. -{{< /alert >}} -[Quarkus Configuration Reference]: https://quarkus.io/guides/config-reference -[EclipseLink configuration file]: https://eclipse.dev/eclipselink/documentation/4.0/solutions/solutions.html#TESTINGJPA002 - -Polaris creates and connects to a separate database for each realm. Specifically, the `{realm}` placeholder in `jakarta.persistence.jdbc.url` is substituted with the actual realm name, allowing the Polaris server to connect to different databases based on the realm. - -{{< alert note >}} -Some database systems such as Postgres don't create databases automatically. Database admins need to create them manually before running Polaris server. -{{< /alert >}} - -A single `persistence.xml` can describe multiple [persistence units](https://eclipse.dev/eclipselink/documentation/4.0/concepts/concepts.html#APPDEV001). For example, with both a `polaris-dev` and `polaris` persistence unit defined, you could use a single `persistence.xml` to easily switch between development and production databases. Use the `persistence-unit` option in the Polaris server configuration to easily switch between persistence units. - -### Using H2 - -{{< alert important >}} -H2 is an in-memory database and is not suitable for production! -{{< /alert >}} - -The default [persistence.xml] in Polaris is already configured for H2, but you can easily customize -your H2 configuration using the persistence unit template below: - -[persistence.xml]: https://github.com/apache/polaris/blob/main/persistence/eclipselink/src/main/resources/META-INF/persistence.xml - -```xml - - org.eclipse.persistence.jpa.PersistenceProvider - org.apache.polaris.extension.persistence.impl.eclipselink.models.ModelEntity - org.apache.polaris.extension.persistence.impl.eclipselink.models.ModelEntityActive - org.apache.polaris.extension.persistence.impl.eclipselink.models.ModelEntityChangeTracking - org.apache.polaris.extension.persistence.impl.eclipselink.models.ModelEntityDropped - org.apache.polaris.extension.persistence.impl.eclipselink.models.ModelGrantRecord - org.apache.polaris.extension.persistence.impl.eclipselink.models.ModelPrincipalSecrets - org.apache.polaris.extension.persistence.impl.eclipselink.models.ModelSequenceId - NONE - - - - - - - -``` - -To build Polaris with the necessary H2 dependency and start the Polaris service, run the following: - -```shell -./gradlew \ - :polaris-server:assemble \ - :polaris-server:quarkusAppPartsBuild --rerun \ - -PeclipseLinkDeps=com.h2database:h2:2.3.232 -java -Dpolaris.persistence.type=eclipse-link \ - -Dpolaris.persistence.eclipselink.configuration-file=/path/to/persistence.xml \ - -Dpolaris.persistence.eclipselink.persistence-unit=polaris \ - -jar runtime/server/build/quarkus-app/quarkus-run.jar -``` - -### Using Postgres - -PostgreSQL is included by default in the Polaris server distribution. - -The following shows a sample configuration for integrating Polaris with Postgres. - -```xml - - org.eclipse.persistence.jpa.PersistenceProvider - org.apache.polaris.extension.persistence.impl.eclipselink.models.ModelEntity - org.apache.polaris.extension.persistence.impl.eclipselink.models.ModelEntityActive - org.apache.polaris.extension.persistence.impl.eclipselink.models.ModelEntityChangeTracking - org.apache.polaris.extension.persistence.impl.eclipselink.models.ModelEntityDropped - org.apache.polaris.extension.persistence.impl.eclipselink.models.ModelGrantRecord - org.apache.polaris.extension.persistence.impl.eclipselink.models.ModelPrincipalSecrets - org.apache.polaris.extension.persistence.impl.eclipselink.models.ModelSequenceId - NONE - - - - - - - - - - -``` diff --git a/1.2.0/polaris-api-specs/_index.md b/1.2.0/polaris-api-specs/_index.md deleted file mode 100644 index 3f4a98498d..0000000000 --- a/1.2.0/polaris-api-specs/_index.md +++ /dev/null @@ -1,27 +0,0 @@ ---- -# -# Licensed to the Apache Software Foundation (ASF) under one -# or more contributor license agreements. See the NOTICE file -# distributed with this work for additional information -# regarding copyright ownership. The ASF licenses this file -# to you under the Apache License, Version 2.0 (the -# "License"); you may not use this file except in compliance -# with the License. You may obtain a copy of the License at -# -# http://www.apache.org/licenses/LICENSE-2.0 -# -# Unless required by applicable law or agreed to in writing, -# software distributed under the License is distributed on an -# "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY -# KIND, either express or implied. See the License for the -# specific language governing permissions and limitations -# under the License. -# -title: 'Polaris API Reference' -type: docs -weight: 1100 ---- - -The Apache Polaris API offers a comprehensive set of endpoints that enable you to manage principals, principal-roles, catalogs, and catalog-roles programmatically. - -It follows REST standards, using clear, resource-based URLs, standard HTTP methods, response codes, and secure authentication. With the Polaris API, you can create, manage, and query Iceberg catalogs efficiently. \ No newline at end of file diff --git a/1.2.0/polaris-api-specs/polaris-catalog-api.md b/1.2.0/polaris-api-specs/polaris-catalog-api.md deleted file mode 100644 index 4774c16cae..0000000000 --- a/1.2.0/polaris-api-specs/polaris-catalog-api.md +++ /dev/null @@ -1,27 +0,0 @@ ---- -# -# Licensed to the Apache Software Foundation (ASF) under one -# or more contributor license agreements. See the NOTICE file -# distributed with this work for additional information -# regarding copyright ownership. The ASF licenses this file -# to you under the Apache License, Version 2.0 (the -# "License"); you may not use this file except in compliance -# with the License. You may obtain a copy of the License at -# -# http://www.apache.org/licenses/LICENSE-2.0 -# -# Unless required by applicable law or agreed to in writing, -# software distributed under the License is distributed on an -# "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY -# KIND, either express or implied. See the License for the -# specific language governing permissions and limitations -# under the License. -# -title: 'Apache Polaris Catalog Service OpenAPI Specification' -linkTitle: 'Catalog API ↗' -weight: 200 -params: - show_page_toc: false ---- - -{{< redoc-polaris "generated/bundled-polaris-catalog-service.yaml" >}} diff --git a/1.2.0/polaris-api-specs/polaris-management-api.md b/1.2.0/polaris-api-specs/polaris-management-api.md deleted file mode 100644 index eea43448be..0000000000 --- a/1.2.0/polaris-api-specs/polaris-management-api.md +++ /dev/null @@ -1,27 +0,0 @@ ---- -# -# Licensed to the Apache Software Foundation (ASF) under one -# or more contributor license agreements. See the NOTICE file -# distributed with this work for additional information -# regarding copyright ownership. The ASF licenses this file -# to you under the Apache License, Version 2.0 (the -# "License"); you may not use this file except in compliance -# with the License. You may obtain a copy of the License at -# -# http://www.apache.org/licenses/LICENSE-2.0 -# -# Unless required by applicable law or agreed to in writing, -# software distributed under the License is distributed on an -# "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY -# KIND, either express or implied. See the License for the -# specific language governing permissions and limitations -# under the License. -# -title: 'Apache Polaris Management Service OpenAPI Specification' -linkTitle: 'Management API ↗' -weight: 100 -params: - show_page_toc: false ---- - -{{< redoc-polaris "polaris-management-service.yml" >}} diff --git a/1.2.0/polaris-spark-client.md b/1.2.0/polaris-spark-client.md deleted file mode 100644 index c990e565a5..0000000000 --- a/1.2.0/polaris-spark-client.md +++ /dev/null @@ -1,129 +0,0 @@ ---- -# -# Licensed to the Apache Software Foundation (ASF) under one -# or more contributor license agreements. See the NOTICE file -# distributed with this work for additional information -# regarding copyright ownership. The ASF licenses this file -# to you under the Apache License, Version 2.0 (the -# "License"); you may not use this file except in compliance -# with the License. You may obtain a copy of the License at -# -# http://www.apache.org/licenses/LICENSE-2.0 -# -# Unless required by applicable law or agreed to in writing, -# software distributed under the License is distributed on an -# "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY -# KIND, either express or implied. See the License for the -# specific language governing permissions and limitations -# under the License. -# -Title: Polaris Spark Client -type: docs -weight: 650 ---- - -Apache Polaris now provides Catalog support for Generic Tables (non-Iceberg tables), please check out -the [Polaris Catalog OpenAPI Spec]({{% ref "polaris-api-specs/polaris-catalog-api.md" %}}) for Generic Table API specs. - -Along with the Generic Table Catalog support, Polaris is also releasing a Spark client, which helps to -provide an end-to-end solution for Apache Spark to manage Delta tables using Polaris. - -Note the Polaris Spark client is able to handle both Iceberg and Delta tables, not just Delta. - -This page documents how to connect Spark with Polaris Service using the Polaris Spark client. - -## Quick Start with Local Polaris service -If you want to quickly try out the functionality with a local Polaris service, simply check out the Polaris repo -and follow the instructions in the Spark plugin getting-started -[README](https://github.com/apache/polaris/blob/main/plugins/spark/v3.5/getting-started/README.md). - -Check out the Polaris repo: -```shell -git clone https://github.com/apache/polaris.git ~/polaris -``` - -## Start Spark against a deployed Polaris service -Before starting, ensure that the deployed Polaris service supports Generic Tables, and that Spark 3.5(version 3.5.3 or later is installed). -Spark 3.5.6 is recommended, and you can follow the instructions below to get a Spark 3.5.6 distribution. -```shell -cd ~ -wget https://www.apache.org/dyn/closer.lua/spark/spark-3.5.6/spark-3.5.6-bin-hadoop3.tgz?action=download -mkdir spark-3.5 -tar xzvf spark-3.5.6-bin-hadoop3.tgz -C spark-3.5 --strip-components=1 -cd spark-3.5 -``` - -### Connecting with Spark using the Polaris Spark client -The following CLI command can be used to start the Spark with connection to the deployed Polaris service using -a released Polaris Spark client. - -```shell -bin/spark-shell \ ---packages ,org.apache.iceberg:iceberg-aws-bundle:1.10.0,io.delta:delta-spark_2.12:3.3.1 \ ---conf spark.sql.extensions=org.apache.iceberg.spark.extensions.IcebergSparkSessionExtensions,io.delta.sql.DeltaSparkSessionExtension \ ---conf spark.sql.catalog.spark_catalog=org.apache.spark.sql.delta.catalog.DeltaCatalog \ ---conf spark.sql.catalog..warehouse= \ ---conf spark.sql.catalog..header.X-Iceberg-Access-Delegation=vended-credentials \ ---conf spark.sql.catalog.=org.apache.polaris.spark.SparkCatalog \ ---conf spark.sql.catalog..uri= \ ---conf spark.sql.catalog..credential=':' \ ---conf spark.sql.catalog..scope='PRINCIPAL_ROLE:ALL' \ ---conf spark.sql.catalog..token-refresh-enabled=true -``` -Assume the released Polaris Spark client you want to use is `org.apache.polaris:polaris-spark-3.5_2.12:1.0.0`, -replace the `polaris-spark-client-package` field with the release. - -The `spark-catalog-name` is the catalog name you will use with Spark, and `polaris-catalog-name` is the catalog name used -by Polaris service, for simplicity, you can use the same name. - -Replace the `polaris-service-uri` with the uri of the deployed Polaris service. For example, with a locally deployed -Polaris service, the uri would be `http://localhost:8181/api/catalog`. - -For `client-id` and `client-secret` values, you can refer to [Using Polaris]({{% ref "getting-started/using-polaris" %}}) -for more details. - -You can also start the connection by programmatically initialize a SparkSession, following is an example with PySpark: -```python -from pyspark.sql import SparkSession - -spark = SparkSession.builder - .config("spark.jars.packages", ",org.apache.iceberg:iceberg-aws-bundle:1.10.0,io.delta:delta-spark_2.12:3.3.1") - .config("spark.sql.catalog.spark_catalog", "org.apache.spark.sql.delta.catalog.DeltaCatalog") - .config("spark.sql.extensions", "org.apache.iceberg.spark.extensions.IcebergSparkSessionExtensions,io.delta.sql.DeltaSparkSessionExtension") - .config("spark.sql.catalog.", "org.apache.polaris.spark.SparkCatalog") - .config("spark.sql.catalog..uri", ) - .config("spark.sql.catalog..token-refresh-enabled", "true") - .config("spark.sql.catalog..credential", ":") - .config("spark.sql.catalog..warehouse", ) - .config("spark.sql.catalog.polaris.scope", 'PRINCIPAL_ROLE:ALL') - .config("spark.sql.catalog.polaris.header.X-Iceberg-Access-Delegation", 'vended-credentials') - .getOrCreate() -``` -Similar as the CLI command, make sure the corresponding fields are replaced correctly. - -### Create tables with Spark -After Spark is started, you can use it to create and access Iceberg and Delta tables, for example: -```python -spark.sql("USE polaris") -spark.sql("CREATE NAMESPACE IF NOT EXISTS DELTA_NS") -spark.sql("CREATE NAMESPACE IF NOT EXISTS DELTA_NS.PUBLIC") -spark.sql("USE NAMESPACE DELTA_NS.PUBLIC") -spark.sql("""CREATE TABLE IF NOT EXISTS PEOPLE ( - id int, name string) -USING delta LOCATION 'file:///tmp/var/delta_tables/people'; -""") -``` - -## Connecting with Spark using local Polaris Spark client jar -If you would like to use a version of the Spark client that is currently not yet released, you can -build a Spark client jar locally from source. Please check out the Polaris repo and refer to the Spark plugin -[README](https://github.com/apache/polaris/blob/main/plugins/spark/README.md) for detailed instructions. - -## Limitations -The Polaris Spark client has the following functionality limitations: -1) Create table as select (CTAS) is not supported for Delta tables. As a result, the `saveAsTable` method of `Dataframe` - is also not supported, since it relies on the CTAS support. -2) Create a Delta table without explicit location is not supported. -3) Rename a Delta table is not supported. -4) ALTER TABLE ... SET LOCATION is not supported for DELTA table. -5) For other non-Iceberg tables like csv, it is not supported. diff --git a/1.2.0/policy.md b/1.2.0/policy.md deleted file mode 100644 index 5ad26edd4c..0000000000 --- a/1.2.0/policy.md +++ /dev/null @@ -1,199 +0,0 @@ ---- -# -# Licensed to the Apache Software Foundation (ASF) under one -# or more contributor license agreements. See the NOTICE file -# distributed with this work for additional information -# regarding copyright ownership. The ASF licenses this file -# to you under the Apache License, Version 2.0 (the -# "License"); you may not use this file except in compliance -# with the License. You may obtain a copy of the License at -# -# http://www.apache.org/licenses/LICENSE-2.0 -# -# Unless required by applicable law or agreed to in writing, -# software distributed under the License is distributed on an -# "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY -# KIND, either express or implied. See the License for the -# specific language governing permissions and limitations -# under the License. -# -title: Policy -type: docs -weight: 425 ---- - -The Polaris Policy framework empowers organizations to centrally define, manage, and enforce fine-grained governance, lifecycle, and operational rules across all data resources in the catalog. - -With the policy API, you can: -- Create and manage policies -- Attach policies to specific resources (catalogs, namespaces, tables, or views) -- Check applicable policies for any given resource - -## What is a Policy? - -A policy in Apache Polaris is a structured entity that defines rules governing actions on specified resources under -predefined conditions. Each policy contains: - -- **Name**: A unique identifier within a namespace -- **Type**: Determines the semantics and expected format of the policy content -- **Description**: Explains the purpose of the policy -- **Content**: Contains the actual rules defining the policy behavior -- **Version**: An automatically tracked revision number -- **Inheritable**: Whether the policy can be inherited by child resources, decided by its type - -### Policy Types - -Polaris supports several predefined system policy types (prefixed with `system.`): - -| Policy Type | Purpose | JSON-Schema | Applies To | -|-------------|-------------------------------------------------------|-------------|------------| -| **`system.data-compaction`** | Defines rules for data file compaction operations | [`data-compaction/2025-02-03.json`](https://polaris.apache.org/schemas/policies/system/data-compaction/2025-02-03.json) | Iceberg **table**, **namespace**, **catalog** | -| **`system.metadata-compaction`** | Defines rules for metadata file compaction operations | [`metadata-compaction/2025-02-03.json`](https://polaris.apache.org/schemas/policies/system/metadata-compaction/2025-02-03.json) | Iceberg **table**, **namespace**, **catalog** | -| **`system.orphan-file-removal`** | Defines rules for removing orphaned files | [`orphan-file-removal/2025-02-03.json`](https://polaris.apache.org/schemas/policies/system/orphan-file-removal/2025-02-03.json) | Iceberg **table**, **namespace**, **catalog** | -| **`system.snapshot-expiry`** | Defines rules for snapshot expiration | [`snapshot-expiry/2025-02-03.json`](https://polaris.apache.org/schemas/policies/system/snapshot-expiry/2025-02-03.json) | Iceberg **table**, **namespace**, **catalog** | - -Support for additional predefined system policy types and custom policy type definitions is in progress. -For more details, please refer to the [roadmap](https://github.com/apache/polaris/discussions/1028). - -### Policy Inheritance - -The entity hierarchy in Polaris is structured as follows: - -``` - Catalog - | - Namespace - | - +-----------+----------+ - | | | -Iceberg Iceberg Generic - Table View Table -``` - -Policies can be attached at any level, and inheritance flows from catalog down to namespace, then to tables and views. - -Policies can be inheritable or non-inheritable: - -- **Inheritable policies**: Apply to the target resource and all its applicable child resources -- **Non-inheritable policies**: Apply only to the specific target resource - -The inheritance follows an override mechanism: -1. Table-level policies override namespace and catalog policies -2. Namespace-level policies override parent namespace and catalog policies - -{{< alert important >}} -Because an override completely replaces the same policy type at higher levels, -**only one instance of a given policy type can be attached to (and therefore affect) a resource**. -{{< /alert >}} - -## Working with Policies - -### Creating a Policy - -To create a policy, you need to provide a name, type, and optionally a description and content: - -```json -POST /polaris/v1/{prefix}/namespaces/{namespace}/policies -{ - "name": "compaction-policy", - "type": "system.data-compaction", - "description": "Policy for optimizing table storage", - "content": "{\"version\": \"2025-02-03\", \"enable\": true, \"config\": {\"target_file_size_bytes\": 134217728}}" -} -``` - -The policy content is validated against a schema specific to its type. Here are a few policy content examples: -- Data Compaction Policy -```json -{ - "version": "2025-02-03", - "enable": true, - "config": { - "target_file_size_bytes": 134217728, - "compaction_strategy": "bin-pack", - "max-concurrent-file-group-rewrites": 5 - } -} -``` -- Orphan File Removal Policy -```json -{ - "version": "2025-02-03", - "enable": true, - "max_orphan_file_age_in_days": 30, - "locations": ["s3://my-bucket/my-table-location"], - "config": { - "prefix_mismatch_mode": "ignore" - } -} -``` - -### Attaching Policies to Resources - -Policies can be attached to different resource levels: - -1. **Catalog level**: Applies to the entire catalog -2. **Namespace level**: Applies to a specific namespace -3. **Table-like level**: Applies to individual tables or views - -Example of attaching a policy to a table: - -```json -PUT /polaris/v1/{prefix}/namespaces/{namespace}/policies/{policy-name}/mappings -{ - "target": { - "type": "table-like", - "path": ["NS1", "NS2", "test_table_1"] - } -} -``` - -For inheritable policies, only one policy of a given type can be attached to a resource. For non-inheritable policies, -multiple policies of the same type can be attached. - -### Retrieving Applicable Policies -A user can view applicable policies on a resource (e.g., table, namespace, or catalog) as long as they have -read permission on that resource. - -Here is an example to find all policies that apply to a specific resource (including inherited policies): -``` -GET /polaris/v1/catalog/applicable-policies?namespace=finance%1Fquarterly&target-name=transactions -``` - -**Sample response:** -```json -{ - "policies": [ - { - "name": "snapshot-expiry-policy", - "type": "system.snapshot-expiry", - "appliedAt": "namespace", - "content": { - "version": "2025-02-03", - "enable": true, - "config": { - "min_snapshot_to_keep": 1, - "max_snapshot_age_days": 2, - "max_ref_age_days": 3 - } - } - }, - { - "name": "compaction-policy", - "type": "system.data-compaction", - "appliedAt": "catalog", - "content": { - "version": "2025-02-03", - "enable": true, - "config": { - "target_file_size_bytes": 134217728 - } - } - } - ] -} -``` - -### API Reference - -For the complete and up-to-date API specification, see the [policy-api.yaml](https://github.com/apache/polaris/blob/main/spec/polaris-catalog-apis/policy-apis.yaml). diff --git a/1.2.0/realm.md b/1.2.0/realm.md deleted file mode 100644 index 4e0cc1ce25..0000000000 --- a/1.2.0/realm.md +++ /dev/null @@ -1,53 +0,0 @@ ---- -# -# Licensed to the Apache Software Foundation (ASF) under one -# or more contributor license agreements. See the NOTICE file -# distributed with this work for additional information -# regarding copyright ownership. The ASF licenses this file -# to you under the Apache License, Version 2.0 (the -# "License"); you may not use this file except in compliance -# with the License. You may obtain a copy of the License at -# -# http://www.apache.org/licenses/LICENSE-2.0 -# -# Unless required by applicable law or agreed to in writing, -# software distributed under the License is distributed on an -# "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY -# KIND, either express or implied. See the License for the -# specific language governing permissions and limitations -# under the License. -# -Title: Realm -type: docs -weight: 350 ---- - -This page explains what a realm is and what it is used for in Polaris. - -### What is it? - -A realm in Polaris serves as logical partitioning mechanism within the catalog system. This isolation allows for multitenancy, enabling different teams, environments or organizations to operate independently within the same Polaris deployment. - -### Key Characteristics - -**Isolation:** Each realm encapsulates its own set of resources, ensuring that operations, policies in one realm do not affect others. - -**Authentication Context:** When configuring Polaris, principals credentials are associated with a specific realm. This allows for the separation of security concerns across different realms. - -**Configuration Scope:** Realm identifiers are used in various configurations, such as connection strings feature configurations, etc. - -An example of this is: - -`jdbc:postgresql://localhost:5432/{realm}` - -This ensures that each realm's data is stored separately. - -### How is it used in the system? - -**RealmContext:** It is a key concept used to identify and resolve the context in which operations are performed. For example `DefaultRealmContextResolver`, a realm is resolved from request headers, and operations are performed based on the resolved realm identifier. - -**Authentication and Authorization:** For example, in `DefaultAuthenticator`, `RealmContext` is used to provide context about the current security domain, which is used to retrieve the correct `PolarisMetastoreManager` that manages all Polaris entities and associated grant records metadata for -authorization. - -**Isolation:** In methods like `createEntityManagerFactory(@Nonnull RealmContext realmContext)` from `PolarisEclipseLinkPersistenceUnit` interface, the realm context influence how resources are created or managed based on the security policies of that realm. -An example of this is the way a realm name can be used to create a database connection url so that you have one database instance per realm, when applicable. Or it can be more granular and applied at primary key level (within the same database instance). diff --git a/1.2.0/telemetry.md b/1.2.0/telemetry.md deleted file mode 100644 index 8bf8df03c3..0000000000 --- a/1.2.0/telemetry.md +++ /dev/null @@ -1,196 +0,0 @@ ---- -# -# Licensed to the Apache Software Foundation (ASF) under one -# or more contributor license agreements. See the NOTICE file -# distributed with this work for additional information -# regarding copyright ownership. The ASF licenses this file -# to you under the Apache License, Version 2.0 (the -# "License"); you may not use this file except in compliance -# with the License. You may obtain a copy of the License at -# -# http://www.apache.org/licenses/LICENSE-2.0 -# -# Unless required by applicable law or agreed to in writing, -# software distributed under the License is distributed on an -# "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY -# KIND, either express or implied. See the License for the -# specific language governing permissions and limitations -# under the License. -# -title: Telemetry -type: docs -weight: 450 ---- - -## Metrics - -Metrics are published using [Micrometer]; they are available from Polaris's management interface -(port 8282 by default) under the path `/q/metrics`. For example, if the server is running on -localhost, the metrics can be accessed via http://localhost:8282/q/metrics. - -[Micrometer]: https://quarkus.io/guides/telemetry-micrometer - -Metrics can be scraped by Prometheus or any compatible metrics scraping server. See: -[Prometheus](https://prometheus.io) for more information. - -Additional tags can be added to the metrics by setting the `polaris.metrics.tags.*` property. Each -tag is a key-value pair, where the key is the tag name and the value is the tag value. For example, -to add a tag `environment=prod` to all metrics, set `polaris.metrics.tags.environment=prod`. Many -tags can be added, such as below: - -```properties -polaris.metrics.tags.service=polaris -polaris.metrics.tags.environment=prod -polaris.metrics.tags.region=us-west-2 -``` - -Note that by default Polaris adds one tag: `application=Polaris`. You can override this tag by -setting the `polaris.metrics.tags.application=` property. - -### Realm ID Tag - -Polaris can add the realm ID as a tag to all API and HTTP request metrics. This is disabled by -default to prevent high cardinality issues, but can be enabled by setting the following properties: - -```properties -polaris.metrics.realm-id-tag.enable-in-api-metrics=true -polaris.metrics.realm-id-tag.enable-in-http-metrics=true -``` - -You should be particularly careful when enabling the realm ID tag in HTTP request metrics, as these -metrics typically have a much higher cardinality than API request metrics. - -In order to prevent the number of tags from growing indefinitely and causing performance issues or -crashing the server, the number of unique realm IDs in HTTP request metrics is limited to 100 by -default. If the number of unique realm IDs exceeds this value, a warning will be logged and no more -HTTP request metrics will be recorded. This threshold can be changed by setting the -`polaris.metrics.realm-id-tag.http-metrics-max-cardinality` property. - -## Traces - -Traces are published using [OpenTelemetry]. - -[OpenTelemetry]: https://quarkus.io/guides/opentelemetry-tracing - -By default OpenTelemetry is disabled in Polaris, because there is no reasonable default -for the collector endpoint for all cases. - -To enable OpenTelemetry and publish traces for Polaris set `quarkus.otel.sdk.disabled=false` -and configure a valid collector endpoint URL with `http://` or `https://` as the server property -`quarkus.otel.exporter.otlp.traces.endpoint`. - -_If these properties are not set, the server will not publish traces._ - -The collector must talk the OpenTelemetry protocol (OTLP) and the port must be its gRPC port -(by default 4317), e.g. "http://otlp-collector:4317". - -By default, Polaris adds a few attributes to the [OpenTelemetry Resource] to identify the server, -and notably: - -- `service.name`: set to `Apache Polaris Server (incubating)`; -- `service.version`: set to the Polaris version. - -[OpenTelemetry Resource]: https://opentelemetry.io/docs/languages/js/resources/ - -You can override the default resource attributes or add additional ones by setting the -`quarkus.otel.resource.attributes` property. - -This property expects a comma-separated list of key-value pairs, where the key is the attribute name -and the value is the attribute value. For example, to change the service name to `Polaris` and add -an attribute `deployment.environment=dev`, set the following property: - -```properties -quarkus.otel.resource.attributes=service.name=Polaris,deployment.environment=dev -``` - -The alternative syntax below can also be used: - -```properties -quarkus.otel.resource.attributes[0]=service.name=Polaris -quarkus.otel.resource.attributes[1]=deployment.environment=dev -``` - -Finally, two additional span attributes are added to all request parent spans: - -- `polaris.request.id`: The unique identifier of the request, if set by the caller through the - `Polaris-Request-Id` header. -- `polaris.realm`: The unique identifier of the realm. Always set (unless the request failed because - of a realm resolution error). - -### Troubleshooting Traces - -If the server is unable to publish traces, check first for a log warning message like the following: - -``` -SEVERE [io.ope.exp.int.grp.OkHttpGrpcExporter] (OkHttp http://localhost:4317/...) Failed to export spans. -The request could not be executed. Full error message: Failed to connect to localhost/0:0:0:0:0:0:0:1:4317 -``` - -This means that the server is unable to connect to the collector. Check that the collector is -running and that the URL is correct. - -## Logging - -Polaris relies on [Quarkus](https://quarkus.io/guides/logging) for logging. - -By default, logs are written to the console and to a file located in the `./logs` directory. The log -file is rotated daily and compressed. The maximum size of the log file is 10MB, and the maximum -number of backup files is 14. - -JSON logging can be enabled by setting the `quarkus.log.console.json.enabled` and `quarkus.log.file.json.enabled` -properties to `true`. By default, JSON logging is disabled. - -The log level can be set for the entire application or for specific packages. The default log level -is `INFO`. To set the log level for the entire application, use the `quarkus.log.level` property. - -To set the log level for a specific package, use the `quarkus.log.category."package-name".level`, -where `package-name` is the name of the package. For example, the package `io.smallrye.config` has a -useful logger to help debugging configuration issues; but it needs to be set to the `DEBUG` level. -This can be done by setting the following property: - -```properties -quarkus.log.category."io.smallrye.config".level=DEBUG -``` - -The log message format for both console and file output is highly configurable. The default format -is: - -``` -%d{yyyy-MM-dd HH:mm:ss,SSS} %-5p [%c{3.}] [%X{requestId},%X{realmId}] [%X{traceId},%X{parentId},%X{spanId},%X{sampled}] (%t) %s%e%n -``` - -Refer to the [Logging format](https://quarkus.io/guides/logging#logging-format) guide for more -information on placeholders and how to customize the log message format. - -### MDC Logging - -Polaris uses Mapped Diagnostic Context (MDC) to enrich log messages with additional context. The -following MDC keys are available: - -- `requestId`: The unique identifier of the request, if set by the caller through the - `Polaris-Request-Id` header. -- `realmId`: The unique identifier of the realm. Always set. -- `traceId`: The unique identifier of the trace. Present if tracing is enabled and the message is - originating from a traced context. -- `parentId`: The unique identifier of the parent span. Present if tracing is enabled and the - message is originating from a traced context. -- `spanId`: The unique identifier of the span. Present if tracing is enabled and the message is - originating from a traced context. -- `sampled`: Whether the trace has been sampled. Present if tracing is enabled and the message is - originating from a traced context. - -Other MDC keys can be added by setting the `polaris.log.mdc.*` property. Each property is a -key-value pair, where the key is the MDC key name and the value is the MDC key value. For example, -to add the MDC keys `environment=prod` and `region=us-west-2` to all log messages, set the following -properties: - -```properties -polaris.log.mdc.environment=prod -polaris.log.mdc.region=us-west-2 -``` - -MDC context is propagated across threads, including in `TaskExecutor` threads. - -## Links - -Visit [Using Polaris with telemetry tools]({{% relref "getting-started/using-polaris/telemetry-tools" %}}) to see sample Polaris config with Prometheus and Jaeger. diff --git a/1.2.0/getting-started/_index.md b/site/content/blog/2025/08/20/apache-polaris-1.0.1-incubating.md similarity index 55% rename from 1.2.0/getting-started/_index.md rename to site/content/blog/2025/08/20/apache-polaris-1.0.1-incubating.md index 1707ceacd2..16b0d32e55 100644 --- a/1.2.0/getting-started/_index.md +++ b/site/content/blog/2025/08/20/apache-polaris-1.0.1-incubating.md @@ -17,23 +17,25 @@ # specific language governing permissions and limitations # under the License. # -title: Getting Started with Apache Polaris -linkTitle: Getting Started -type: docs -weight: 101 +title: Apache Polaris 1.0.1-incubating has been released! +date: 2025-08-20 --- -The fastest way to get started is with our Docker Compose examples. Each example provides a complete working environment with detailed instructions. +The Apache Polaris team is pleased to announce Apache Polaris 1.0.1-incubating. -## Next Steps +This release is a maintenance release on the 1.0.1-incubating one, fixing a couple of issues on the Helm Chart: +* remove db-kind in Helm Chart +* add relational-jdbc to Helm Chart -1. Check/Install dependencies -2. Choose the way you want to deploy Polaris -3. Create a catalog -4. Check Using polaris page +This release can be downloaded: +* https://polaris.apache.org/downloads/ -## Getting Help +The artifacts are available on Maven Central. -- Documentation: https://polaris.apache.org -- GitHub Issues: https://github.com/apache/polaris/issues -- Slack: [Join Apache Polaris Community](https://join.slack.com/t/apache-polaris/shared_invite/zt-2y3l3r0fr-VtoW42ltir~nSzCYOrQgfw) +The Docker images are available on Docker Hub: +* https://hub.docker.com/r/apache/polaris/tags +* https://hub.docker.com/r/apache/polaris-admin-tool/tags + +Enjoy ! + +The Apache Polaris team. diff --git a/site/content/blog/2025/09/15/doris-polaris-integration.md b/site/content/blog/2025/09/15/doris-polaris-integration.md new file mode 100644 index 0000000000..d798337401 --- /dev/null +++ b/site/content/blog/2025/09/15/doris-polaris-integration.md @@ -0,0 +1,427 @@ +--- +# +# Licensed to the Apache Software Foundation (ASF) under one +# or more contributor license agreements. See the NOTICE file +# distributed with this work for additional information +# regarding copyright ownership. The ASF licenses this file +# to you under the Apache License, Version 2.0 (the +# "License"); you may not use this file except in compliance +# with the License. You may obtain a copy of the License at +# +# http://www.apache.org/licenses/LICENSE-2.0 +# +# Unless required by applicable law or agreed to in writing, +# software distributed under the License is distributed on an +# "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY +# KIND, either express or implied. See the License for the +# specific language governing permissions and limitations +# under the License. +# +title: "Doris X Polaris: Building Unified Data Lakehouse with Iceberg REST Catalog - A Practical Guide" +date: 2025-09-15 +author: zy-kkk +--- + +With the continuous evolution of data lake technologies, efficiently and securely managing massive datasets stored on object storage (such as AWS S3) while providing unified access endpoints for upstream analytics engines (like [Apache Doris](https://doris.apache.org)) has become a core challenge in modern data architectures. [Apache Polaris](https://polaris.apache.org/), as an open and standardized REST Catalog service for Iceberg, provides an ideal solution to this challenge. It not only handles centralized metadata management but also significantly enhances data lake security and manageability through fine-grained access control and flexible credential management mechanisms. + +This document will provide a detailed guide on integrating Apache Doris with Polaris to achieve efficient querying and management of Iceberg data on S3. We'll guide you through the complete process from environment preparation to final data querying step by step + +**Through this documentation, you will quickly learn:** + +* **AWS Environment Setup**: How to create and configure S3 buckets in AWS, and prepare the necessary IAM roles and policies for both Polaris and Doris, enabling Polaris to access S3 and vend temporary credentials for Doris. + +* **Polaris Deployment and Configuration**: How to download and start the Polaris service, and create Iceberg Catalog, Namespace, and corresponding Principal/Role/permissions in Polaris to provide secure metadata access endpoints for Doris. + +* **Doris-Polaris Integration**: Explains how Doris obtains metadata access tokens from Polaris via OAuth2, and demonstrates two core underlying storage access methods: + + 1. Temporary AK/SK distribution by Polaris (Credential Vending mechanism) + + 2. Doris directly using static AK/SK to access S3 + +## About Apache Doris + +[Apache Doris](https://doris.apache.org) is the fastest analytical and search database for the AI era. + +It provides high-performance hybrid search capabilities across structured data, semi-structured data (such as JSON), and vector data. It excels at delivering high-concurrency, low-latency queries, while also offering advanced optimization for complex join operations. In addition, Doris can serve as a unified query engine, delivering high-performance analytical services not only on its self-managed internal table format but also on open lakehouse formats such as Iceberg. + +With Doris, users can easily build a real-time lakehouse data platform. + +## About Apache Polaris + +Apache Polaris (Incubating) is a catalog implementation for Apache Iceberg™ tables and is built on the open source Apache Iceberg™ REST protocol. + +With Polaris, you can provide centralized, secure read and write access to your Iceberg tables across different REST-compatible query engines. + +## Hands-on Guide + +### 1. AWS Environment Setup + +Before we begin, we need to prepare S3 buckets and corresponding IAM roles on AWS, which form the foundation for Polaris to manage data and Doris to access data. + +#### 1.1 Create S3 Bucket + +First, we create an S3 bucket named `polaris-doris-test` to store the Iceberg table data that will be created later. + +```bash +# Create an S3 bucket +aws s3 mb s3://polaris-doris-test --region us-west-2 +# Verify that the bucket was created successfully +aws s3 ls | grep polaris-doris-test +``` + +#### 1.2 Create IAM Role for Object Storage Access + +To implement secure credential management, we need to create an IAM role for Polaris to use through the STS AssumeRole mechanism. This design follows the security best practices of the least privileged principle and separation of duties. + +1. Create a trust policy file + + Create the `polaris-trust-policy.json` file: + + > Note: Replace YOUR\_ACCOUNT\_ID with your actual AWS account ID, which can be obtained using `aws sts get-caller-identity --query Account --output text`. + + ```bash + cat > polaris-trust-policy.json < If you do not perform this step, you need to export `AWS_ACCESS_KEY_ID` and `AWS_SECRET_ACCESS_KEY` before starting polaris + +If your Polaris service will run on an EC2 instance, it is best to bind an IAM role to the EC2 instance instead of using access keys. This avoids hard-coding credentials in the code and improves security. + +1. Create a trust policy for the EC2 instance role + + First, create the trust policy file that allows the EC2 service to assume this role: + + ```json + cat > ec2-trust-policy.json < This document uses the source code quick start method. For more deployment methods, please refer to: https://polaris.apache.org/releases/1.0.1/getting-started/deploying-polaris/ + +#### 2.1 Clone Source Code and Start Polaris + +1. Configure AWS Credentials(Optional) + + If you're not running Polaris on EC2, or if the EC2 instance doesn't have the appropriate IAM Role attached, you need to provide Polaris with AK/SK that has permission to assume the `polaris-doris-demo` role through environment variables. + + ```bash + export AWS_ACCESS_KEY_ID=YOUR_AWS_ACCESS_KEY_ID + export AWS_SECRET_ACCESS_KEY=YOUR_AWS_SECRET_ACCESS_KEY + ``` + +2. Clone Polaris Repository and Switch to Specific Version + + ```bash + git clone https://github.com/apache/polaris.git + cd polaris + # Recommend using a released stable version + git checkout apache-polaris-1.0.1-incubating + ``` + +3. Run Polaris + + Ensure you have Java 21+ and Docker 27+ installed. + + ```bash + ./gradlew run -Dpolaris.bootstrap.credentials=POLARIS,root,secret + ``` + + * `POLARIS` is the realm + + * `root` is the `CLIENT_ID` + + * `secret` is the `CLIENT_SECRET` + + * If credentials are not set, it will use preset credentials `POLARIS,root,s3cr3t` + + This command will compile and start the Polaris service, which listens on port 8181 by default. + + > You can also use binary distribution, see: https://github.com/apache/polaris/tree/main/runtime/distribution + + +#### 2.2 Create Catalog and Namespace in Polaris + +1. Export ROOT Credentials + + > The `CLIENT_ID` and `CLIENT_SECRET` here are the same as those we set when we started Polaris + + ```bash + export CLIENT_ID=root + export CLIENT_SECRET=secret + ``` + +2. Create Catalog (Pointing to S3 Storage) + + ```bash + ./polaris catalogs create \ + --storage-type s3 \ + --default-base-location s3://polaris-doris-test/polaris1 \ + --role-arn arn:aws:iam:::role/polaris-doris-demo \ + --external-id polaris-doris-demo \ + doris_catalog + ``` + + * `--storage-type`: Specifies the underlying storage as S3. + + * `--default-base-location`: Default root path for Iceberg table data. + + * `--role-arn`: IAM Role that Polaris service uses to assume for S3 access. + + * `--external-id`: External ID used when assuming the role, must match the configuration in the IAM Role trust policy. + +3. Create Namespace + + ```bash + ./polaris namespaces create --catalog doris_catalog doris_demo + ``` + + This creates a namespace (database) named `doris_demo` under `doris_catalog`. + +#### 2.3 Polaris Security Roles and Permission Configuration + +To allow Doris to access as a `non-root` user, we need to create a new user and role with appropriate permissions. + +1. Create Principal Role and Catalog Role + + ```bash + # Create a Principal Role for aggregating permissions + ./polaris principal-roles create doris_pr_role + + # Create a Catalog Role under doris_catalog + ./polaris catalog-roles create --catalog doris_catalog doris_catalog_role + ``` + +2. Grant Permissions to Catalog Role + + ```bash + # Grant doris_catalog_role permission to manage content within the Catalog + ./polaris privileges catalog grant \ + --catalog doris_catalog \ + --catalog-role doris_catalog_role \ + CATALOG_MANAGE_CONTENT + ``` + +3. Associate Principal Role and Catalog Role + + ```bash + # Assign doris_catalog_role to doris_pr_role + ./polaris catalog-roles grant \ + --catalog doris_catalog \ + --principal-role doris_pr_role \ + doris_catalog_role + ``` + +4. Create New Principal (User) and Bind Role + + ```bash + # Create a new user (Principal) named doris_user + ./polaris principals create doris_user + # Example output: {"clientId": "6e155b128dc06c13", "clientSecret": "ce9fbb4cc91c43ff2955f2c6545239d7"} + # Please note down this new client_id and client_secret pair, as Doris will use them for connection. + + # Bind doris_user to doris_pr_role + ./polaris principal-roles grant \ + doris_pr_role \ + --principal doris_user + ``` + + With this, all Polaris-side configuration is complete. We've created a user named `doris_user` that obtains permission to manage `doris_catalog` through `doris_pr_role`. + +### 3. Doris-Polaris Integration + +Now, we'll create an Iceberg Catalog in Doris that connects to the newly configured Polaris service. Doris supports multiple flexible authentication combinations. + +> Note: In this example, we use OAuth2 authentication credential to connect to the Polaris rest service. In addition, Doris also supports using `iceberg.rest.oauth2.token `to directly provide a pre-obtained Bearer Token + +#### Method 1: OAuth2 + Temporary Storage Credentials (Credential Vending) + +This is the **most recommended** approach. Doris uses OAuth2 credentials to authenticate with Polaris and obtain metadata. When needing to read/write data files on S3, Doris requests a temporary S3 access credential with minimal privileges from Polaris. + +**Doris Catalog Creation Statement:** + +Use the `clientId` and `clientSecret` generated for `doris_user`. + +```sql +CREATE CATALOG polaris_vended PROPERTIES ( + 'type' = 'iceberg', + -- Catalog name in Polaris + 'warehouse' = 'doris_catalog', + 'iceberg.catalog.type' = 'rest', + -- Polaris service address + 'iceberg.rest.uri' = 'http://YOUR_POLARIS_HOST:8181/api/catalog', + -- Metadata authentication method + 'iceberg.rest.security.type' = 'oauth2', + -- Replace with doris_user's client_id:client_secret + 'iceberg.rest.oauth2.credential' = 'client_id:client_secret', + 'iceberg.rest.oauth2.server-uri' = 'http://YOUR_POLARIS_HOST:8181/api/catalog/v1/oauth/tokens', + 'iceberg.rest.oauth2.scope' = 'PRINCIPAL_ROLE:doris_pr_role', + -- Enable credential vending + 'iceberg.rest.vended-credentials-enabled' = 'true', + -- S3 basic configuration (no keys required) + 's3.endpoint' = 'https://s3.us-west-2.amazonaws.com', + 's3.region' = 'us-west-2' +); +``` + +#### Method 2: OAuth2 + Static Storage Credentials (AK/SK) + +In this approach, Doris still uses OAuth2 to access Polaris metadata, but when accessing S3 data, it uses static AK/SK hardcoded in the Doris Catalog configuration. This method is simple to configure and suitable for quick testing, but has lower security. + +**Doris Catalog Creation Statement:** + +```sql +CREATE CATALOG polaris_aksk PROPERTIES ( + 'type' = 'iceberg', + 'warehouse' = 'doris_catalog', + 'iceberg.catalog.type' = 'rest', + 'iceberg.rest.uri' = 'http://YOUR_POLARIS_HOST:8181/api/catalog', + 'iceberg.rest.security.type' = 'oauth2', + 'iceberg.rest.oauth2.credential' = 'client_id:client_secret', + 'iceberg.rest.oauth2.server-uri' = 'http://YOUR_POLARIS_HOST:8181/api/catalog/v1/oauth/tokens', + 'iceberg.rest.oauth2.scope' = 'PRINCIPAL_ROLE:doris_pr_role', + -- Directly provide S3 access keys + 's3.access_key' = 'YOUR_S3_ACCESS_KEY', + 's3.secret_key' = 'YOUR_S3_SECRET_KEY', + 's3.endpoint' = 'https://s3.us-west-2.amazonaws.com', + 's3.region' = 'us-west-2' +); +``` + +### 4. Managing Iceberg Table in Doris with Polaris + +Regardless of which method you use to create the Catalog, you can manage the Iceberg table with following SQL statements. + +```sql +-- Switch to the Catalog you created and the Namespace configured in Polaris +USE polaris_vended.doris_demo; + +-- Create an Iceberg table +CREATE TABLE my_iceberg_table ( + id INT, + name STRING +) +PROPERTIES ( + 'write-format'='parquet' +); + +-- Insert data +INSERT INTO my_iceberg_table VALUES (1, 'Doris'), (2, 'Polaris'); + +-- Query data +SELECT * FROM my_iceberg_table; +-- Expected result: +-- +------+---------+ +-- | id | name | +-- +------+---------+ +-- | 1 | Doris | +-- | 2 | Polaris | +-- +------+---------+ +``` + +If all the above operations succeed, congratulations! You have successfully established the complete data lake pipeline from Doris -> Polaris -> Iceberg on S3. + +For more information about managing Iceberg table with Doris, please visit: + +https://doris.apache.org/docs/lakehouse/catalogs/iceberg-catalog + diff --git a/site/content/blog/2025/09/19/apache-polaris-1.1.0-incubating.md b/site/content/blog/2025/09/19/apache-polaris-1.1.0-incubating.md new file mode 100644 index 0000000000..1e6e3027dd --- /dev/null +++ b/site/content/blog/2025/09/19/apache-polaris-1.1.0-incubating.md @@ -0,0 +1,53 @@ +--- +# +# Licensed to the Apache Software Foundation (ASF) under one +# or more contributor license agreements. See the NOTICE file +# distributed with this work for additional information +# regarding copyright ownership. The ASF licenses this file +# to you under the Apache License, Version 2.0 (the +# "License"); you may not use this file except in compliance +# with the License. You may obtain a copy of the License at +# +# http://www.apache.org/licenses/LICENSE-2.0 +# +# Unless required by applicable law or agreed to in writing, +# software distributed under the License is distributed on an +# "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY +# KIND, either express or implied. See the License for the +# specific language governing permissions and limitations +# under the License. +# +title: Apache Polaris 1.1.0-incubating has been released! +date: 2025-09-19 +--- + +The Apache Polaris team is pleased to announce Apache Polaris 1.1.0-incubating. + +This release includes: +* **New features & enchancements** +** HMS support +** IMPLICIT authentication type +** Support for non-AWS S3 compatible storage with STS: MinIO, s3a scheme support +** Use of Realm instead of RealmId +** Modularized Federation Architecture +** Federated Catalog Support in Polaris CLI +** Expanded External Identity Provider support +** Python package (official) +** Documentation improvements (release process, multi-realms configuration) +* **Bug fixes** +** Fix drop view with default server configuration +** Fix MinIO support +** Remove ThreadLocal + +This release can be downloaded: +* https://polaris.apache.org/downloads/ + +The artifacts are available on Maven Central. + +The Docker images are available on Docker Hub: +* https://hub.docker.com/r/apache/polaris/tags +* https://hub.docker.com/r/apache/polaris-admin-tool/tags + +Enjoy ! + +The Apache Polaris team. diff --git a/site/content/blog/2025/10/02/puppygraph-polaris-integration.md b/site/content/blog/2025/10/02/puppygraph-polaris-integration.md new file mode 100644 index 0000000000..8665cbc5c4 --- /dev/null +++ b/site/content/blog/2025/10/02/puppygraph-polaris-integration.md @@ -0,0 +1,384 @@ +--- +# +# Licensed to the Apache Software Foundation (ASF) under one +# or more contributor license agreements. See the NOTICE file +# distributed with this work for additional information +# regarding copyright ownership. The ASF licenses this file +# to you under the Apache License, Version 2.0 (the +# "License"); you may not use this file except in compliance +# with the License. You may obtain a copy of the License at +# +# http://www.apache.org/licenses/LICENSE-2.0 +# +# Unless required by applicable law or agreed to in writing, +# software distributed under the License is distributed on an +# "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY +# KIND, either express or implied. See the License for the +# specific language governing permissions and limitations +# under the License. +# +title: "Integrating Apache Polaris with PuppyGraph for Real-time Graph Analysis" +date: 2025-10-02 +author: Danfeng Xu +--- + +Unified data governance has become a hot topic over the last few years. As AI and other data-hungry use cases infiltrate the market, the need for a comprehensive data catalog solution with governance in mind has become critical. [Apache Polaris](https://github.com/apache/polaris) has found its calling as an open-source solution, specifically built to handle data governed by [Apache Iceberg](https://iceberg.apache.org/), that is changing the way we manage and access data across various clouds, formats, and platforms. With a foundation rooted in Apache Iceberg, Apache Polaris ensures compatibility with various compute engines and data formats, making it an ideal choice for organizations focused on scalable, open data architectures. + +The beauty of such catalog technologies is their interoperability with other technologies that can leverage their data. [**PuppyGraph**](https://www.puppygraph.com/), the first graph compute engine to integrate with Apache Polaris natively, is part of this revolution of making data (and graph analytics) more accessible \- all without a separate specialized graph database. By working with the Apache Polaris team, PuppyGraph’s integration with the Apache Polaris is a significant leap forward in graph compute technology, offering a unique and powerful approach to exploring and analyzing data within Apache Polaris. + +As the first graph query engine to natively integrate with Apache Polaris, PuppyGraph offers a unique approach to querying the data within an Apache Polaris instance: **through graph**. Although SQL querying will remain a staple for many developers, graph queries offer organizations a way to explore their interconnected data in unique and new ways that SQL-based querying cannot handle efficiently. This blog will explore the power of pairing Apache Polaris with graph analytics capabilities using PuppyGraph’s zero-ETL graph query engine. Let’s start by looking a bit closer at the inner workings of the Apache Polaris. + +## What is Apache Polaris? + +Apache Polaris is an open-source, interoperable catalog for Apache Iceberg. It offers a centralized governance solution for data across various cloud platforms, formats, and compute engines. For users, it provides fine-grained access controls to secure data handling, simplifies data discovery, and fosters collaboration by managing structured and unstructured data, machine learning models, and files. + +![](/img/blog/2025/10/02/fig1-what-is-apache-polaris.png) + +A significant component of Apache Polaris is its commitment to open accessibility and regulatory compliance. By supporting major data protection and privacy frameworks like GDPR, CCPA, and HIPAA, Apache Polaris helps organizations meet critical regulatory standards. This focus on compliance and secure data governance reduces risk while fostering greater confidence in how data is stored, accessed, and analyzed. + +### **Key Features & Benefits** + +Apache Polaris offers several key features and benefits that users should know. Diving a bit deeper, based on the image above, here are some noteworthy benefits: + +#### Cross-Engine Read and Write + +Apache Polaris leverages Apache Iceberg's open-source REST protocol, enabling multiple engines to read and write data seamlessly. This interoperability extends to popular engines like PuppyGraph, [Apache Flink](https://flink.apache.org/), [Apache Spark](https://spark.apache.org/), [Trino](https://trino.io/), and many others, ensuring flexibility and choice for users. + +![](/img/blog/2025/10/02/fig2-cross-engine-rw.png) + +#### Centralized Security and Access + +With Apache Polaris, you can define principals/users and roles, and manage RBAC (Role-Based Access Controls) on Iceberg tables for these users or roles. This centralized security management approach streamlines access control and simplifies data governance. + +#### Run Anywhere, No Lock-In + +Apache Polaris offers deployment flexibility, allowing you to run it in your own infrastructure within a container (e.g., Docker, Kubernetes) or as a managed service on Snowflake. This adaptability ensures you can retain RBAC, namespaces, and table definitions even if you switch infrastructure, providing long-term flexibility and cost optimization. + +The Apache Polaris offers various ways to query, analyze, and integrate data, one of the most flexible and scalable options for organizations to store and govern data effectively. + +## Why Add Graph Capabilities to Apache Polaris? + +While SQL querying is a mainstay for most developers dealing with data and traditional SQL queries are highly effective for many data operations, they can fall short when working with highly interconnected data. Specific use cases lend themselves to graph querying, such as: + +* **Social Network Analysis:** Understanding relationships between people, groups, and organizations. +* **Fraud Detection:** Identifying patterns and anomalies in financial transactions or online activities. +* **Knowledge Graphs:** Representing and querying complex networks of interconnected concepts and entities. +* **Recommendation Engines:** Suggesting products, services, or content based on user preferences and relationships. +* **Network and IT Infrastructure Analysis:** Modeling and analyzing network topologies, dependencies, and performance. + +Enhancing Apache Polaris with a graph query engine introduces advanced graph analytics, making it easier and more intuitive to handle complex, relationship-based queries like the ones mentioned above. Here's why integrating graph capabilities benefits querying in Apache Polaris: + +* **Enhanced Data Relationships**: Graph queries are designed to uncover complex patterns within data, making them particularly useful for exploring multi-level relationships or hierarchies that can be cumbersome to analyze with SQL. +* **Performance**: When traversing extensive relationships, graph queries are often faster than SQL, especially for deep link analysis, as graph databases are optimized for this type of network traversal. +* **Flexibility**: Graph databases allow for a more intuitive approach to modeling interconnected data, avoiding the need for complex `JOIN` operations common in SQL queries. Nodes and edges in graph models naturally represent connections, simplifying queries for relationship-based data. +* **Advanced Analytics**: Graph platforms support advanced analytics, such as community detection, shortest path calculations, and centrality measures. Many of these algorithms are built into graph platforms, making them more accessible and efficient than implementing such analytics manually in SQL. + +Users gain deeper insights, faster query performance, and simpler ways to handle complex data structures by adding the capability to perform graph-based querying and analytics within Apache Polaris. When adding these capabilities, PuppyGraph’s zero-ETL graph query engine integrates seamlessly with Apache Polaris, making it easy and fast to unlock these advantages. Let’s look at how seamlessly the two platforms fit together architecturally. + +## Apache Polaris \+ PuppyGraph Architecture + +Traditionally, enabling graph querying and analytics on organizational data required replicating data into a separate graph database before running queries. This complex process involved multiple technologies, teams, and a significant timeline. Generally, the most cumbersome part of the equation was the struggling of ETL to get the data transformed into a graph-compatible format and actually loaded into the database. Because of this, implementing graph analytics on data stored in SQL-based systems has historically been challenging, so graph analysis was often viewed as a niche technology—valuable but costly to implement. + +PuppyGraph overcomes these limitations by offering a novel approach: adding graph capabilities without needing a dedicated graph database. Removing the graph database from the equation shortens the implementation timeline, reducing both time-to-market and overall costs. With PuppyGraph’s Zero-ETL graph query engine, users can connect directly to the data source, enabling graph queries directly on an Apache Polaris instance while maintaining fine-grained governance and lineage. + +![](/img/blog/2025/10/02/fig3-apache-polaris-puppygraph-architecture.png) + +This approach allows for performant graph querying, such as supporting 10-hop neighbor queries across half a billion edges in 2.26 seconds through scalable and performant zero-ETL. PuppyGraph achieves this by leveraging the column-based data file format coupled with massively parallel processing and vectorized evaluation technology built into the PuppyGraph engine. This distributed compute engine design ensures fast query execution even without efficient indexing and caching, delivering a performant and efficient graph querying and analytics experience without the hassles of the traditional graph infrastructure. + +To prove just how easy it is, let's look at how you can connect PuppyGraph to the data you have stored in Apache Polaris. + +## Connecting PuppyGraph to Apache Polaris + +Enabling graph capabilities on your underlying data is extremely simple with PuppyGraph. We like to summarize it into three steps: deploy, connect, and query. Many users can be up and running in a matter of minutes. We’ll walk through the steps below to show how easy it is. + +### Deploy Apache Polaris + +Check out the code from the Apache Polaris repository. +```shell +git clone https://github.com/apache/polaris.git +``` + +Build and run an Apache Polaris server. Note that JDK 21 is required to build and run the Apache Polaris. +```shell +cd polaris +./gradlew run +``` + +Then use the provided spark shell to create a data catalog and prepare data. Start a different shell and run the following command in the polaris directory: +```shell +./regtests/run_spark_sql.sh +``` + +The command will download Spark and start a Spark SQL shell. Run the following command to generate a new database and several tables in this newly created catalog. +```sql +CREATE DATABASE IF NOT EXISTS modern; + +CREATE TABLE modern.person (id string, name string, age int) USING iceberg; +INSERT INTO modern.person VALUES + ('v1', 'marko', 29), + ('v2', 'vadas', 27), + ('v4', 'josh', 32), + ('v6', 'peter', 35); + +CREATE TABLE modern.software (id string, name string, lang string) USING iceberg; +INSERT INTO modern.software VALUES + ('v3', 'lop', 'java'), + ('v5', 'ripple', 'java'); + +CREATE TABLE modern.created (id string, from_id string, to_id string, weight double) USING iceberg; +INSERT INTO modern.created VALUES + ('e9', 'v1', 'v3', 0.4), + ('e10', 'v4', 'v5', 1.0), + ('e11', 'v4', 'v3', 0.4), + ('e12', 'v6', 'v3', 0.2); + +CREATE TABLE modern.knows (id string, from_id string, to_id string, weight double) USING iceberg; +INSERT INTO modern.knows VALUES + ('e7', 'v1', 'v2', 0.5), + ('e8', 'v1', 'v4', 1.0); +``` +### Deploy PuppyGraph + +Then you’ll need to deploy PuppyGraph. Luckily, this is easy and can currently be done through Docker (see [Docs](https://docs.puppygraph.com/getting-started)) or an [AWS AMI](https://aws.amazon.com/marketplace/pp/prodview-dgmn5jnwnfacu) through AWS Marketplace. The AMI approach requires a few clicks and will deploy your instance on the infrastructure of your choice. Below, we will focus on what it takes to launch a PuppyGraph instance on Docker. + +With Docker installed, you can run the following command in your terminal: +```shell +docker run -p 8081:8081 -p 8183:8182 -p 7687:7687 -v /tmp/polaris:/tmp/polaris --name puppy --rm -itd puppygraph/puppygraph:stable +``` + +This will spin up a PuppyGraph instance on your local machine (or on a cloud or bare metal server if that's where you want to deploy it). Next, you can go to **localhost:8081** or the URL on which you launched the instance. This will show you the PuppyGraph login screen: + +![](/img/blog/2025/10/02/fig4-puppygraph-login-page.png) + +After logging in with the default credentials (username: “`puppygraph”` and default password: “`puppygraph123`”) you’ll enter the application itself. At this point, our instance is ready to go and we can proceed with connecting to the underlying data stored in Apache Polaris. + +### Connect to Your Data Source and Define Your Schema + +Next, we must connect to our data source to run graph queries against it. Users have a choice of how they would like to go about this. Firstly, you could use a JSON schema document to define your connectivity parameters and data mapping. As an example, here is what one of these schemas might look like: +```json +{ + "catalogs": [ + { + "name": "test", + "type": "iceberg", + "metastore": { + "type": "rest", + "uri": "http://172.17.0.1:8181/api/catalog", + "warehouse": "manual_spark", + "credential": "root:s3cr3t", + "scope": "PRINCIPAL_ROLE:ALL" + } + } + ], + "graph": { + "vertices": [ + { + "label": "person", + "oneToOne": { + "tableSource": { + "catalog": "test", + "schema": "modern", + "table": "person" + }, + "id": { + "fields": [ + { + "type": "String", + "field": "id", + "alias": "id" + } + ] + }, + "attributes": [ + { + "type": "String", + "field": "name", + "alias": "name" + }, + { + "type": "Int", + "field": "age", + "alias": "age" + } + ] + } + }, + { + "label": "software", + "oneToOne": { + "tableSource": { + "catalog": "test", + "schema": "modern", + "table": "software" + }, + "id": { + "fields": [ + { + "type": "String", + "field": "id", + "alias": "id" + } + ] + }, + "attributes": [ + { + "type": "String", + "field": "name", + "alias": "name" + }, + { + "type": "String", + "field": "lang", + "alias": "lang" + } + ] + } + } + ], + "edges": [ + { + "label": "created", + "fromVertex": "person", + "toVertex": "software", + "tableSource": { + "catalog": "test", + "schema": "modern", + "table": "created" + }, + "id": { + "fields": [ + { + "type": "String", + "field": "id", + "alias": "id" + } + ] + }, + "fromId": { + "fields": [ + { + "type": "String", + "field": "from_id", + "alias": "from_id" + } + ] + }, + "toId": { + "fields": [ + { + "type": "String", + "field": "to_id", + "alias": "to_id" + } + ] + }, + "attributes": [ + { + "type": "Double", + "field": "weight", + "alias": "weight" + } + ] + }, + { + "label": "knows", + "fromVertex": "person", + "toVertex": "person", + "tableSource": { + "catalog": "test", + "schema": "modern", + "table": "knows" + }, + "id": { + "fields": [ + { + "type": "String", + "field": "id", + "alias": "id" + } + ] + }, + "fromId": { + "fields": [ + { + "type": "String", + "field": "from_id", + "alias": "from_id" + } + ] + }, + "toId": { + "fields": [ + { + "type": "String", + "field": "to_id", + "alias": "to_id" + } + ] + }, + "attributes": [ + { + "type": "Double", + "field": "weight", + "alias": "weight" + } + ] + } + ] + } +} +``` + +In the example, you can see the data store details under the **catalogs** section. This is all that is needed to connect to the Apache Polaris instance. Underneath the **catalogs** section, you’ll notice that we have defined the nodes and edges and where the data comes from. This tells PuppyGraph how to map the SQL data into the graph hosted within PuppyGraph. This can then be uploaded to PuppyGraph, and you’ll be ready to query\! + +![](/img/blog/2025/10/02/fig5-puppygraph-schema-upload-page.png) +To create the schema in this way, just save the schema to a JSON file, modify the uri and credential to reflect the actual Polaris uri and credential, and select “Upload Graph Schema JSON” to upload the JSON file. The graph will then be created. + +Alternatively, for those who want a more UI-based approach, PuppyGraph also offers a schema builder that allows users to use a drag-and-drop editor to build their schema. In an example similar to the one above, here is what the UI flow looks like with the schema built this way. + +Click on “Create graph schema” on the Web UI instead. +First, you would add in the details about your Apache Polaris Catalog data source: + +![](/img/blog/2025/10/02/fig6-puppygraph-schema-builder-ui.png) +Here are the detailed explanations of the fields: + +* Catalog type: `Apache Iceberg` +* Catalog name: Some name for the catalog as you like. +* Metastore Type: `Iceberg-Rest` +* RestUri: `http://host.docker.internal:8181/api/catalog`. On Linux the IP for the host might be `172.17.0.1` if you do not add `--add-host=host.docker.internal:host-gateway` to the docker run command. +* Warehouse: `manual_spark`. This was created by the `run_spark_sql.sh` script. +* Credential: Fill in the root principal credentials from the Apache Polaris Catalog server's output. For example `f6973789e5270e5d:dce8e8e53d8f770eb9804f22de923645`. +* Scope: `PRINCIPAL_ROLE:ALL` +* Storage type: `Get from metastore` + +Then, with the data source added, you can begin to build out the nodes and edges for the schema using the UI. PuppyGraph will allow you to easily create these by prepopulating the dropdowns based on the table data it retrieves from your Iceberg instance. + +![](/img/blog/2025/10/02/fig7-puppygraph-add-vertex.png) + +After that use the Auto Suggestion to create other nodes and edges. Select person as the start vertex (node) and add the auto suggested nodes and edges. + +![](/img/blog/2025/10/02/fig8-puppygraph-auto-suggestion.png) + +After clicking on “Add neighbors”, you’ll see it rendered on the screen as nodes and edges are added. It should look like this: +![](/img/blog/2025/10/02/fig9-finished-schema.png) + +You can click the **Submit** button to submit the schema to the server when it is complete. + +After this, your integration and schema creation are complete. That’s all that there is to it. Next step: graph querying\! + +### Query your Data as a Graph + +Now, you can query your data as a graph without needing data replication or ETL. Our next step is to figure out **how** we want to query our data and **what** insights we want to gather from it. + +PuppyGraph allows users to use [Gremlin](https://docs.puppygraph.com/reference/gremlin-query-language), Cypher, or Jupyter Notebook. +Click on Query in the left sidebar to bring the Interactive Query UI: For example, based on the schemas above, a Gremlin query, shown in a visualized format that can be explored further, will look like this: + +![](/img/blog/2025/10/02/fig10-puppygraph-sample-query.png) + +As you can see, graph capabilities can be achieved with PuppyGraph in minutes without the heavy lift usually associated with graph databases. Whether you’re a seasoned graph professional looking to expand the data you have to query as a graph or a budding graph enthusiast testing out a use case, PuppyGraph offers a performant and straightforward way to add graph querying and analytics to the data you have sitting within Apache Polaris. + +## Summary + +In this blog, we looked at how to enable real-time graph querying with PuppyGraph’s zero-ETL graph query engine and how to pair it with Apache Polaris. In a matter of minutes, we explored how PuppyGraph can be deployed and connected to your Apache Polaris instance, enabling graph queries without the overhead of traditional graph technologies. + +Everything in this guide is free to try. You can download PuppyGraph's [forever-free Developer Edition](https://www.puppygraph.com/dev-download) and start running graph queries on your Apache Polaris instance with fine-grained governance & lineage in 10 minutes. You can also check the [instructions for connecting to an Apache Polaris Catalog](https://docs.puppygraph.com/getting-started/querying-polaris-catalog-data-as-a-graph/) in PuppyGraph Documentation. \ No newline at end of file diff --git a/1.0.0/polaris-catalog-service.md b/site/content/blog/_index.adoc similarity index 81% rename from 1.0.0/polaris-catalog-service.md rename to site/content/blog/_index.adoc index 02fed63f46..ec20c78133 100644 --- a/1.0.0/polaris-catalog-service.md +++ b/site/content/blog/_index.adoc @@ -17,10 +17,15 @@ # specific language governing permissions and limitations # under the License. # -linkTitle: 'Catalog API Spec' -weight: 900 -params: - show_page_toc: false +linkTitle: "Blog" +title: "Apache Polaris Blog" +weight: 200 +cascade: + - type: "blog" +# TODO remove when adding the first post +top_hidden: true +toc_hide: true +hide_summary: true --- -{{< redoc-polaris "generated/bundled-polaris-catalog-service.yaml" >}} += Blog