From 08368940ad15df0792cd9bd36a3c2cd58cd1acaa Mon Sep 17 00:00:00 2001 From: Yufei Gu Date: Wed, 18 Jun 2025 15:43:06 -0700 Subject: [PATCH 1/3] Publish 1.0.0 documents --- site/content/in-dev/1.0.0/_index.md | 186 +++ site/content/in-dev/1.0.0/access-control.md | 212 +++ site/content/in-dev/1.0.0/admin-tool.md | 142 ++ .../in-dev/1.0.0/command-line-interface.md | 1224 +++++++++++++++++ site/content/in-dev/1.0.0/configuration.md | 187 +++ .../configuring-polaris-for-production.md | 222 +++ site/content/in-dev/1.0.0/entities.md | 95 ++ site/content/in-dev/1.0.0/evolution.md | 115 ++ site/content/in-dev/1.0.0/generic-table.md | 169 +++ .../in-dev/1.0.0/getting-started/_index.md | 25 + .../deploying-polaris/_index.md | 27 + .../quickstart-deploy-aws.md | 57 + .../quickstart-deploy-azure.md | 52 + .../quickstart-deploy-gcp.md | 52 + .../getting-started/install-dependencies.md | 118 ++ .../1.0.0/getting-started/quickstart.md | 116 ++ .../1.0.0/getting-started/using-polaris.md | 315 +++++ site/content/in-dev/1.0.0/metastores.md | 151 ++ .../in-dev/1.0.0/polaris-catalog-service.md | 26 + .../1.0.0/polaris-management-service.md | 27 + .../in-dev/1.0.0/polaris-spark-client.md | 141 ++ site/content/in-dev/1.0.0/policy.md | 197 +++ site/content/in-dev/1.0.0/realm.md | 53 + site/content/in-dev/1.0.0/telemetry.md | 192 +++ .../quickstart-deploy-aws.md | 2 +- .../quickstart-deploy-azure.md | 2 +- .../quickstart-deploy-gcp.md | 2 +- site/hugo.yaml | 3 + 28 files changed, 4107 insertions(+), 3 deletions(-) create mode 100644 site/content/in-dev/1.0.0/_index.md create mode 100644 site/content/in-dev/1.0.0/access-control.md create mode 100644 site/content/in-dev/1.0.0/admin-tool.md create mode 100644 site/content/in-dev/1.0.0/command-line-interface.md create mode 100644 site/content/in-dev/1.0.0/configuration.md create mode 100644 site/content/in-dev/1.0.0/configuring-polaris-for-production.md create mode 100644 site/content/in-dev/1.0.0/entities.md create mode 100644 site/content/in-dev/1.0.0/evolution.md create mode 100644 site/content/in-dev/1.0.0/generic-table.md create mode 100644 site/content/in-dev/1.0.0/getting-started/_index.md create mode 100644 site/content/in-dev/1.0.0/getting-started/deploying-polaris/_index.md create mode 100644 site/content/in-dev/1.0.0/getting-started/deploying-polaris/quickstart-deploy-aws.md create mode 100644 site/content/in-dev/1.0.0/getting-started/deploying-polaris/quickstart-deploy-azure.md create mode 100644 site/content/in-dev/1.0.0/getting-started/deploying-polaris/quickstart-deploy-gcp.md create mode 100644 site/content/in-dev/1.0.0/getting-started/install-dependencies.md create mode 100644 site/content/in-dev/1.0.0/getting-started/quickstart.md create mode 100644 site/content/in-dev/1.0.0/getting-started/using-polaris.md create mode 100644 site/content/in-dev/1.0.0/metastores.md create mode 100644 site/content/in-dev/1.0.0/polaris-catalog-service.md create mode 100644 site/content/in-dev/1.0.0/polaris-management-service.md create mode 100644 site/content/in-dev/1.0.0/polaris-spark-client.md create mode 100644 site/content/in-dev/1.0.0/policy.md create mode 100644 site/content/in-dev/1.0.0/realm.md create mode 100644 site/content/in-dev/1.0.0/telemetry.md diff --git a/site/content/in-dev/1.0.0/_index.md b/site/content/in-dev/1.0.0/_index.md new file mode 100644 index 0000000000..b82c9366c2 --- /dev/null +++ b/site/content/in-dev/1.0.0/_index.md @@ -0,0 +1,186 @@ +--- +# +# Licensed to the Apache Software Foundation (ASF) under one +# or more contributor license agreements. See the NOTICE file +# distributed with this work for additional information +# regarding copyright ownership. The ASF licenses this file +# to you under the Apache License, Version 2.0 (the +# "License"); you may not use this file except in compliance +# with the License. You may obtain a copy of the License at +# +# http://www.apache.org/licenses/LICENSE-2.0 +# +# Unless required by applicable law or agreed to in writing, +# software distributed under the License is distributed on an +# "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY +# KIND, either express or implied. See the License for the +# specific language governing permissions and limitations +# under the License. +# +linkTitle: '1.0.0' +title: 'Overview' +type: docs +weight: 200 +params: + top_hidden: true + show_page_toc: false +cascade: + type: docs + params: + show_page_toc: true +# This file will NOT be copied into a new release's versioned docs folder. +--- + +{{< alert title="Warning" color="warning" >}} +These pages refer to the current state of the main branch, which is still under active development. + +Functionalities can be changed, removed or added without prior notice. +{{< /alert >}} + +Apache Polaris (Incubating) is a catalog implementation for Apache Iceberg™ tables and is built on the open source Apache Iceberg™ REST protocol. + +With Polaris, you can provide centralized, secure read and write access to your Iceberg tables across different REST-compatible query engines. + +![Conceptual diagram of Apache Polaris (Incubating).](/img/overview.svg "Apache Polaris (Incubating) overview") + +## Key concepts + +This section introduces key concepts associated with using Apache Polaris (Incubating). + +In the following diagram, a sample [Apache Polaris (Incubating) structure](#catalog) with nested [namespaces](#namespace) is shown for Catalog1. No tables +or namespaces have been created yet for Catalog2 or Catalog3. + +![Diagram that shows an example Apache Polaris (Incubating) structure.](/img/sample-catalog-structure.svg "Sample Apache Polaris (Incubating) structure") + +### Catalog + +In Polaris, you can create one or more catalog resources to organize Iceberg tables. + +Configure your catalog by setting values in the storage configuration for S3, Azure, or Google Cloud Storage. An Iceberg catalog enables a +query engine to manage and organize tables. The catalog forms the first architectural layer in the [Apache Iceberg™ table specification](https://iceberg.apache.org/spec/#overview) and must support the following tasks: + +- Storing the current metadata pointer for one or more Iceberg tables. A metadata pointer maps a table name to the location of that table's + current metadata file. + +- Performing atomic operations so that you can update the current metadata pointer for a table to the metadata pointer of a new version of + the table. + +To learn more about Iceberg catalogs, see the [Apache Iceberg™ documentation](https://iceberg.apache.org/concepts/catalog/). + +#### Catalog types + +A catalog can be one of the following two types: + +- Internal: The catalog is managed by Polaris. Tables from this catalog can be read and written in Polaris. + +- External: The catalog is externally managed by another Iceberg catalog provider (for example, Snowflake, Glue, Dremio Arctic). Tables from + this catalog are synced to Polaris. These tables are read-only in Polaris. + +A catalog is configured with a storage configuration that can point to S3, Azure storage, or GCS. + +### Namespace + +You create *namespaces* to logically group Iceberg tables within a catalog. A catalog can have multiple namespaces. You can also create +nested namespaces. Iceberg tables belong to namespaces. + +> **Important** +> +> For the access privileges defined for a catalog to be enforced correctly, the following conditions must be met: +> +> - The directory only contains the data files that belong to a single table. +> - The directory hierarchy matches the namespace hierarchy for the catalog. +> +> For example, if a catalog includes the following items: +> +> - Top-level namespace namespace1 +> - Nested namespace namespace1a +> - A customers table, which is grouped under nested namespace namespace1a +> - An orders table, which is grouped under nested namespace namespace1a +> +> The directory hierarchy for the catalog must follow this structure: +> +> - /namespace1/namespace1a/customers/ +> - /namespace1/namespace1a/orders/ + +### Storage configuration + +A storage configuration stores a generated identity and access management (IAM) entity for your cloud storage and is created +when you create a catalog. The storage configuration is used to set the values to connect Polaris to your cloud storage. During the +catalog creation process, an IAM entity is generated and used to create a trust relationship between the cloud storage provider and Polaris +Catalog. + +When you create a catalog, you supply the following information about your cloud storage: + +| Cloud storage provider | Information | +| -----------------------| ----------- | +| Amazon S3 |
  • Default base location for your Amazon S3 bucket
  • Locations for your Amazon S3 bucket
  • S3 role ARN
  • External ID (optional)
| +| Google Cloud Storage (GCS) |
  • Default base location for your GCS bucket
  • Locations for your GCS bucket
| +| Azure |
  • Default base location for your Microsoft Azure container
  • Locations for your Microsoft Azure container
  • Azure tenant ID
| + +## Example workflow + +In the following example workflow, Bob creates an Apache Iceberg™ table named Table1 and Alice reads data from Table1. + +1. Bob uses Apache Spark™ to create the Table1 table under the + Namespace1 namespace in the Catalog1 catalog and insert values into + Table1. + + Bob can create Table1 and insert data into it because he is using a + service connection with a service principal that has + the privileges to perform these actions. + +2. Alice uses Snowflake to read data from Table1. + + Alice can read data from Table1 because she is using a service + connection with a service principal with a catalog integration that + has the privileges to perform this action. Alice + creates an unmanaged table in Snowflake to read data from Table1. + +![Diagram that shows an example workflow for Apache Polaris (Incubating)](/img/example-workflow.svg "Example workflow for Apache Polaris (Incubating)") + +## Security and access control + +### Credential vending + +To secure interactions with service connections, Polaris vends temporary storage credentials to the query engine during query +execution. These credentials allow the query engine to run the query without requiring access to your cloud storage for +Iceberg tables. This process is called credential vending. + +As of now, the following limitation is known regarding Apache Iceberg support: + +- **remove_orphan_files:** Apache Spark can't use credential vending + for this due to a known issue. See [apache/iceberg#7914](https://github.com/apache/iceberg/pull/7914) for details. + +### Identity and access management (IAM) + +Polaris uses the identity and access management (IAM) entity to securely connect to your storage for accessing table data, Iceberg +metadata, and manifest files that store the table schema, partitions, and other metadata. Polaris retains the IAM entity for your +storage location. + +### Access control + +Polaris enforces the access control that you configure across all tables registered with the service and governs security for all +queries from query engines in a consistent manner. + +Polaris uses a role-based access control (RBAC) model that lets you centrally configure access for Polaris service principals to catalogs, +namespaces, and tables. + +Polaris RBAC uses two different role types to delegate privileges: + +- **Principal roles:** Granted to Polaris service principals and + analogous to roles in other access control systems that you grant to + service principals. + +- **Catalog roles:** Configured with certain privileges on Polaris + catalog resources and granted to principal roles. + +For more information, see [Access control]({{% ref "access-control" %}}). + +## Legal Notices + +Apache®, Apache Iceberg™, Apache Spark™, Apache Flink®, and Flink® are either registered trademarks or trademarks of the Apache Software Foundation in the United States and/or other countries. + + + diff --git a/site/content/in-dev/1.0.0/access-control.md b/site/content/in-dev/1.0.0/access-control.md new file mode 100644 index 0000000000..f8c21ab781 --- /dev/null +++ b/site/content/in-dev/1.0.0/access-control.md @@ -0,0 +1,212 @@ +--- +# +# Licensed to the Apache Software Foundation (ASF) under one +# or more contributor license agreements. See the NOTICE file +# distributed with this work for additional information +# regarding copyright ownership. The ASF licenses this file +# to you under the Apache License, Version 2.0 (the +# "License"); you may not use this file except in compliance +# with the License. You may obtain a copy of the License at +# +# http://www.apache.org/licenses/LICENSE-2.0 +# +# Unless required by applicable law or agreed to in writing, +# software distributed under the License is distributed on an +# "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY +# KIND, either express or implied. See the License for the +# specific language governing permissions and limitations +# under the License. +# +Title: Access Control +type: docs +weight: 500 +--- + +This section provides information about how access control works for Apache Polaris (Incubating). + +Polaris uses a role-based access control (RBAC) model in which the Polaris administrator assigns access privileges to catalog roles +and then grants access to resources to service principals by assigning catalog roles to principal roles. + +These are the key concepts to understanding access control in Polaris: + +- **Securable object** +- **Principal role** +- **Catalog role** +- **Privilege** + +## Securable object + +A securable object is an object to which access can be granted. Polaris +has the following securable objects: + +- Catalog +- Namespace +- Iceberg table +- View + +## Principal role + +A principal role is a resource in Polaris that you can use to logically group Polaris service principals together and grant privileges on +securable objects. + +Polaris supports a many-to-one relationship between service principals and principal roles. For example, to grant the same privileges to +multiple service principals, you can grant a single principal role to those service principals. A service principal can be granted one +principal role. When registering a service connection, the Polaris administrator specifies the principal role that is granted to the +service principal. + +You don't grant privileges directly to a principal role. Instead, you configure object permissions at the catalog role level, and then grant +catalog roles to a principal role. + +The following table shows examples of principal roles that you might configure in Polaris: + +| Principal role name | Description | +| -----------------------| ----------- | +| Data_engineer | A role that is granted to multiple service principals for running data engineering jobs. | +| Data_scientist | A role that is granted to multiple service principals for running data science or AI jobs. | + +## Catalog role + +A catalog role belongs to a particular catalog resource in Polaris and specifies a set of permissions for actions on the catalog or objects +in the catalog, such as catalog namespaces or tables. You can create one or more catalog roles for a catalog. + +You grant privileges to a catalog role and then grant the catalog role to a principal role to bestow the privileges to one or more service +principals. + +> **Note** +> +> If you update the privileges bestowed to a service principal, the updates won't take effect for up to one hour. This means that if you +> revoke or grant some privileges for a catalog, the updated privileges won't take effect on any service principal with access to that catalog +> for up to one hour. + +Polaris also supports a many-to-many relationship between catalog roles and principal roles. You can grant the same catalog role to one or more +principal roles. Likewise, a principal role can be granted to one or more catalog roles. + +The following table displays examples of catalog roles that you might +configure in Polaris: + +| Example Catalog role | Description| +| -----------------------|-----------| +| Catalog administrators | A role that has been granted multiple privileges to emulate full access to the catalog.
Principal roles that have been granted this role are permitted to create, alter, read, write, and drop tables in the catalog. | +| Catalog readers | A role that has been granted read-only privileges to tables in the catalog.
Principal roles that have been granted this role are allowed to read from tables in the catalog. | +| Catalog contributor | A role that has been granted read and write access privileges to all tables that belong to the catalog.
Principal roles that have been granted this role are allowed to perform read and write operations on tables in the catalog. | + +## RBAC model + +The following diagram illustrates the RBAC model used by Polaris. For each catalog, the Polaris administrator assigns access +privileges to catalog roles and then grants service principals access to resources by assigning catalog roles to principal roles. Polaris +supports a many-to-one relationship between service principals and principal roles. + +![Diagram that shows the RBAC model for Apache Polaris.](/img/rbac-model.svg "Apache Polaris RBAC model") + +## Access control privileges + +This section describes the privileges that are available in the Polaris access control model. Privileges are granted to catalog roles, catalog +roles are granted to principal roles, and principal roles are granted to service principals to specify the operations that service principals can +perform on objects in Polaris. + +> **Important** +> +> You can only grant privileges at the catalog level. Fine-grained access controls are not available. For example, you can grant read +> privileges to all tables in a catalog but not to an individual table in the catalog. + +To grant the full set of privileges (drop, list, read, write, etc.) on an object, you can use the *full privilege* option. + +### Table privileges + +| Privilege | Description | +| --------- | ----------- | +| TABLE_CREATE | Enables registering a table with the catalog. | +| TABLE_DROP | Enables dropping a table from the catalog. | +| TABLE_LIST | Enables listing any table in the catalog. | +| TABLE_READ_PROPERTIES | Enables reading properties of the table. | +| TABLE_WRITE_PROPERTIES | Enables configuring properties for the table. | +| TABLE_READ_DATA | Enables reading data from the table by receiving short-lived read-only storage credentials from the catalog. | +| TABLE_WRITE_DATA | Enables writing data to the table by receiving short-lived read+write storage credentials from the catalog. | +| TABLE_FULL_METADATA | Grants all table privileges, except TABLE_READ_DATA and TABLE_WRITE_DATA, which need to be granted individually. | +| TABLE_ATTACH_POLICY | Enables attaching policy to a table. | +| TABLE_DETACH_POLICY | Enables detaching policy from a table. | + +### View privileges + +| Privilege | Description | +| --------- | ----------- | +| VIEW_CREATE | Enables registering a view with the catalog. | +| VIEW_DROP | Enables dropping a view from the catalog. | +| VIEW_LIST | Enables listing any views in the catalog. | +| VIEW_READ_PROPERTIES | Enables reading all the view properties. | +| VIEW_WRITE_PROPERTIES | Enables configuring view properties. | +| VIEW_FULL_METADATA | Grants all view privileges. | + +### Namespace privileges + +| Privilege | Description | +| --------- | ----------- | +| NAMESPACE_CREATE | Enables creating a namespace in a catalog. | +| NAMESPACE_DROP | Enables dropping the namespace from the catalog. | +| NAMESPACE_LIST | Enables listing any object in the namespace, including nested namespaces and tables. | +| NAMESPACE_READ_PROPERTIES | Enables reading all the namespace properties. | +| NAMESPACE_WRITE_PROPERTIES | Enables configuring namespace properties. | +| NAMESPACE_FULL_METADATA | Grants all namespace privileges. | +| NAMESPACE_ATTACH_POLICY | Enables attaching policy to a namespace. | +| NAMESPACE_DETACH_POLICY | Enables detaching policy from a namespace. | + +### Catalog privileges + +| Privilege | Description | +| -----------------------| ----------- | +| CATALOG_MANAGE_ACCESS | Includes the ability to grant or revoke privileges on objects in a catalog to catalog roles, and the ability to grant or revoke catalog roles to or from principal roles. | +| CATALOG_MANAGE_CONTENT | Enables full management of content for the catalog. This privilege encompasses the following privileges:
  • CATALOG_MANAGE_METADATA
  • TABLE_FULL_METADATA
  • NAMESPACE_FULL_METADATA
  • VIEW_FULL_METADATA
  • TABLE_WRITE_DATA
  • TABLE_READ_DATA
  • CATALOG_READ_PROPERTIES
  • CATALOG_WRITE_PROPERTIES
| +| CATALOG_MANAGE_METADATA | Enables full management of the catalog, catalog roles, namespaces, and tables. | +| CATALOG_READ_PROPERTIES | Enables listing catalogs and reading properties of the catalog. | +| CATALOG_WRITE_PROPERTIES | Enables configuring catalog properties. | +| CATALOG_ATTACH_POLICY | Enables attaching policy to a catalog. | +| CATALOG_DETACH_POLICY | Enables detaching policy from a catalog. | + +### Policy privileges + +| Privilege | Description | +| -----------------------| ----------- | +| POLICY_CREATE | Enables creating a policy under specified namespace. | +| POLICY_READ | Enables reading policy content and metadata. | +| POLICY_WRITE | Enables updating the policy details such as its content or description. | +| POLICY_LIST | Enables listing any policy from the catalog. | +| POLICY_DROP | Enables dropping a policy if it is not attached to any resource entity. | +| POLICY_FULL_METADATA | Grants all policy privileges. | +| POLICY_ATTACH | Enables policy to be attached to entities. | +| POLICY_DETACH | Enables policy to be detached from entities. | + +## RBAC example + +The following diagram illustrates how RBAC works in Polaris and +includes the following users: + +- **Alice:** A service admin who signs up for Polaris. Alice can + create service principals. She can also create catalogs and + namespaces and configure access control for Polaris resources. + +- **Bob:** A data engineer who uses Apache Spark™ to + interact with Polaris. + + - Alice has created a service principal for Bob. It has been + granted the Data_engineer principal role, which in turn has been + granted the following catalog roles: Catalog contributor and + Data administrator (for both the Silver and Gold zone catalogs + in the following diagram). + + - The Catalog contributor role grants permission to create + namespaces and tables in the Bronze zone catalog. + + - The Data administrator roles grant full administrative rights to + the Silver zone catalog and Gold zone catalog. + +- **Mark:** A data scientist who uses trains models with data managed + by Polaris. + + - Alice has created a service principal for Mark. It has been + granted the Data_scientist principal role, which in turn has + been granted the catalog role named Catalog reader. + + - The Catalog reader role grants read-only access for a catalog + named Gold zone catalog. + +![Diagram that shows an example of how RBAC works in Apache Polaris.](/img/rbac-example.svg "Apache Polaris RBAC example") diff --git a/site/content/in-dev/1.0.0/admin-tool.md b/site/content/in-dev/1.0.0/admin-tool.md new file mode 100644 index 0000000000..14f37b6f0f --- /dev/null +++ b/site/content/in-dev/1.0.0/admin-tool.md @@ -0,0 +1,142 @@ +--- +# +# Licensed to the Apache Software Foundation (ASF) under one +# or more contributor license agreements. See the NOTICE file +# distributed with this work for additional information +# regarding copyright ownership. The ASF licenses this file +# to you under the Apache License, Version 2.0 (the +# "License"); you may not use this file except in compliance +# with the License. You may obtain a copy of the License at +# +# http://www.apache.org/licenses/LICENSE-2.0 +# +# Unless required by applicable law or agreed to in writing, +# software distributed under the License is distributed on an +# "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY +# KIND, either express or implied. See the License for the +# specific language governing permissions and limitations +# under the License. +# +title: Admin Tool +type: docs +weight: 300 +--- + +Polaris includes a tool for administrators to manage the metastore. + +The tool must be built with the necessary JDBC drivers to access the metastore database. For +example, to build the tool with support for Postgres, run the following: + +```shell +./gradlew \ + :polaris-admin:assemble \ + :polaris-admin:quarkusAppPartsBuild --rerun \ + -Dquarkus.container-image.build=true +``` + +The above command will generate: + +- One standalone JAR in `runtime/admin/build/polaris-admin-*-runner.jar` +- Two distribution archives in `runtime/admin/build/distributions` +- Two Docker images named `apache/polaris-admin-tool:latest` and `apache/polaris-admin-tool:` + +## Usage + +Please make sure the admin tool and Polaris server are with the same version before using it. +To run the standalone JAR, use the following command: + +```shell +java -jar runtime/admin/build/polaris-admin-*-runner.jar --help +``` + +To run the Docker image, use the following command: + +```shell +docker run apache/polaris-admin-tool:latest --help +``` + +The basic usage of the Polaris Admin Tool is outlined below: + +``` +Usage: polaris-admin-runner.jar [-hV] [COMMAND] +Polaris Admin Tool + -h, --help Show this help message and exit. + -V, --version Print version information and exit. +Commands: + help Display help information about the specified command. + bootstrap Bootstraps realms and principal credentials. + purge Purge principal credentials. +``` + +## Configuration + +The Polaris Admin Tool must be executed with the same configuration as the Polaris server. The +configuration can be done via environment variables or system properties. + +At a minimum, it is necessary to configure the Polaris Admin Tool to connect to the same database +used by the Polaris server. + +See the [metastore documentation]({{% ref "metastores" %}}) for more information on configuring the +database connection. + +Note: Polaris will always create schema 'polaris_schema' during bootstrap under the configured database. + +## Bootstrapping Realms and Principal Credentials + +The `bootstrap` command is used to bootstrap realms and create the necessary principal credentials +for the Polaris server. This command is idempotent and can be run multiple times without causing any +issues. If a realm is already bootstrapped, running the `bootstrap` command again will not have any +effect on that realm. + +```shell +java -jar runtime/admin/build/polaris-admin-*-runner.jar bootstrap --help +``` + +The basic usage of the `bootstrap` command is outlined below: + +``` +Usage: polaris-admin-runner.jar bootstrap [-hV] [-c=]... -r= [-r=]... +Bootstraps realms and root principal credentials. + -c, --credential= + Root principal credentials to bootstrap. Must be of the form + 'realm,clientId,clientSecret'. + -h, --help Show this help message and exit. + -r, --realm= The name of a realm to bootstrap. + -V, --version Print version information and exit. +``` + +For example, to bootstrap the `realm1` realm and create its root principal credential with the +client ID `admin` and client secret `admin`, you can run the following command: + +```shell +java -jar runtime/admin/build/polaris-admin-*-runner.jar bootstrap -r realm1 -c realm1,admin,admin +``` + +## Purging Realms and Principal Credentials + +The `purge` command is used to remove realms and principal credentials from the Polaris server. + +> Warning: Running the `purge` command will remove all data associated with the specified realms! + This includes all entities (catalogs, namespaces, tables, views, roles), all principal + credentials, grants, and any other data associated with the realms. + +```shell +java -jar runtime/admin/build/polaris-admin-*-runner.jar purge --help +``` + +The basic usage of the `purge` command is outlined below: + +``` +Usage: polaris-admin-runner.jar purge [-hV] -r= [-r=]... +Purge realms and all associated entities. + -h, --help Show this help message and exit. + -r, --realm= The name of a realm to purge. + -V, --version Print version information and exit. +``` + +For example, to purge the `realm1` realm, you can run the following command: + +```shell +java -jar runtime/admin/build/polaris-admin-*-runner.jar purge -r realm1 +``` \ No newline at end of file diff --git a/site/content/in-dev/1.0.0/command-line-interface.md b/site/content/in-dev/1.0.0/command-line-interface.md new file mode 100644 index 0000000000..f20210e2c6 --- /dev/null +++ b/site/content/in-dev/1.0.0/command-line-interface.md @@ -0,0 +1,1224 @@ +--- +# +# Licensed to the Apache Software Foundation (ASF) under one +# or more contributor license agreements. See the NOTICE file +# distributed with this work for additional information +# regarding copyright ownership. The ASF licenses this file +# to you under the Apache License, Version 2.0 (the +# "License"); you may not use this file except in compliance +# with the License. You may obtain a copy of the License at +# +# http://www.apache.org/licenses/LICENSE-2.0 +# +# Unless required by applicable law or agreed to in writing, +# software distributed under the License is distributed on an +# "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY +# KIND, either express or implied. See the License for the +# specific language governing permissions and limitations +# under the License. +# +title: Command Line Interface +type: docs +weight: 300 +--- + +In order to help administrators quickly set up and manage their Polaris server, Polaris provides a simple command-line interface (CLI) for common tasks. + +The basic syntax of the Polaris CLI is outlined below: + +``` +polaris [options] COMMAND ... + +options: +--host +--port +--base-url +--client-id +--client-secret +--access-token +--profile +``` + +`COMMAND` must be one of the following: +1. catalogs +2. principals +3. principal-roles +4. catalog-roles +5. namespaces +6. privileges +7. profiles + +Each _command_ supports several _subcommands_, and some _subcommands_ have _actions_ that come after the subcommand in turn. Finally, _arguments_ follow to form a full invocation. Within a set of named arguments at the end of an invocation ordering is generally not important. Many invocations also have a required positional argument of the type that the _command_ refers to. Again, the ordering of this positional argument relative to named arguments is not important. + +Some example full invocations: + +``` +polaris principals list +polaris catalogs delete some_catalog_name +polaris catalogs update --property foo=bar some_other_catalog +polaris catalogs update another_catalog --property k=v +polaris privileges namespace grant --namespace some.schema --catalog fourth_catalog --catalog-role some_catalog_role TABLE_READ_DATA +polaris profiles list +``` + +### Authentication + +As outlined above, the Polaris CLI may take credentials using the `--client-id` and `--client-secret` options. For example: + +``` +polaris --client-id 4b5ed1ca908c3cc2 --client-secret 07ea8e4edefb9a9e57c247e8d1a4f51c principals ... +``` + +If `--client-id` and `--client-secret` are not provided, the Polaris CLI will try to read the client ID and client secret from environment variables called `CLIENT_ID` and `CLIENT_SECRET` respectively. If these flags are not provided and the environment variables are not set, the CLI will fail. + +Alternatively, the `--access-token` option can be used instead of `--client-id` and `--client-secret`, but both authentication methods cannot be used simultaneously. + +Additionally, the `--profile` option can be used to specify a saved profile instead of providing authentication details directly. If `--profile` is not provided, the CLI will check the `CLIENT_PROFILE` environment variable. Profiles store authentication details and connection settings, simplifying repeated CLI usage. + +If the `--host` and `--port` options are not provided, the CLI will default to communicating with `localhost:8181`. + +Alternatively, the `--base-url` option can be used instead of `--host` and `--port`, but both options cannot be used simultaneously. This allows specifying arbitrary Polaris URLs, including HTTPS ones, that have additional base prefixes before the `/api/*/v1` subpaths. + +### PATH + +These examples assume the Polaris CLI is on the PATH and so can be invoked just by the command `polaris`. You can add the CLI to your PATH environment variable with a command like the following: + +``` +export PATH="~/polaris:$PATH" +``` + +Alternatively, you can run the CLI by providing a path to it, such as with the following invocation: + +``` +~/polaris principals list +``` + +## Commands + +Each of the commands `catalogs`, `principals`, `principal-roles`, `catalog-roles`, and `privileges` is used to manage a different type of entity within Polaris. + +In addition to these, the `profiles` command is available for managing stored authentication profiles, allowing login credentials to be configured for reuse. This provides an alternative to passing authentication details with every command. + +To find details on the options that can be provided to a particular command or subcommand ad-hoc, you may wish to use the `--help` flag. For example: + +``` +polaris catalogs --help +polaris principals create --help +polaris profiles --help +``` + +### catalogs + +The `catalogs` command is used to create, discover, and otherwise manage catalogs within Polaris. + +`catalogs` supports the following subcommands: + +1. create +2. delete +3. get +4. list +5. update + +#### create + +The `create` subcommand is used to create a catalog. + +``` +input: polaris catalogs create --help +options: + create + Named arguments: + --type The type of catalog to create in [INTERNAL, EXTERNAL]. INTERNAL by default. + --storage-type (Required) The type of storage to use for the catalog + --default-base-location (Required) Default base location of the catalog + --allowed-location An allowed location for files tracked by the catalog. Multiple locations can be provided by specifying this option more than once. + --role-arn (Required for S3) A role ARN to use when connecting to S3 + --external-id (Only for S3) The external ID to use when connecting to S3 + --tenant-id (Required for Azure) A tenant ID to use when connecting to Azure Storage + --multi-tenant-app-name (Only for Azure) The app name to use when connecting to Azure Storage + --consent-url (Only for Azure) A consent URL granting permissions for the Azure Storage location + --service-account (Only for GCS) The service account to use when connecting to GCS + --property A key/value pair such as: tag=value. Multiple can be provided by specifying this option more than once + Positional arguments: + catalog +``` + +##### Examples + +``` +polaris catalogs create \ + --storage-type s3 \ + --default-base-location s3://example-bucket/my_data \ + --role-arn ${ROLE_ARN} \ + my_catalog + +polaris catalogs create \ + --storage-type s3 \ + --default-base-location s3://example-bucket/my_other_data \ + --allowed-location s3://example-bucket/second_location \ + --allowed-location s3://other-bucket/third_location \ + --role-arn ${ROLE_ARN} \ + my_other_catalog + +polaris catalogs create \ + --storage-type file \ + --default-base-location file:///example/tmp \ + quickstart_catalog +``` + +#### delete + +The `delete` subcommand is used to delete a catalog. + +``` +input: polaris catalogs delete --help +options: + delete + Positional arguments: + catalog +``` + +##### Examples + +``` +polaris catalogs delete some_catalog +``` + +#### get + +The `get` subcommand is used to retrieve details about a catalog. + +``` +input: polaris catalogs get --help +options: + get + Positional arguments: + catalog +``` + +##### Examples + +``` +polaris catalogs get some_catalog + +polaris catalogs get another_catalog +``` + +#### list + +The `list` subcommand is used to show details about all catalogs, or those that a certain principal role has access to. The principal used to perform this operation must have the `CATALOG_LIST` privilege. + +``` +input: polaris catalogs list --help +options: + list + Named arguments: + --principal-role The name of a principal role +``` + +##### Examples + +``` +polaris catalogs list + +polaris catalogs list --principal-role some_user +``` + +#### update + +The `update` subcommand is used to update a catalog. Currently, this command supports changing the properties of a catalog or updating its storage configuration. + +``` +input: polaris catalogs update --help +options: + update + Named arguments: + --default-base-location (Required) Default base location of the catalog + --allowed-location An allowed location for files tracked by the catalog. Multiple locations can be provided by specifying this option more than once. + --property A key/value pair such as: tag=value. Multiple can be provided by specifying this option more than once + Positional arguments: + catalog +``` + +##### Examples + +``` +polaris catalogs update --property tag=new_value my_catalog + +polaris catalogs update --default-base-location s3://new-bucket/my_data my_catalog +``` + +### Principals + +The `principals` command is used to manage principals within Polaris. + +`principals` supports the following subcommands: + +1. create +2. delete +3. get +4. list +5. rotate-credentials +6. update +7. access + +#### create + +The `create` subcommand is used to create a new principal. + +``` +input: polaris principals create --help +options: + create + Named arguments: + --type The type of principal to create in [SERVICE] + --property A key/value pair such as: tag=value. Multiple can be provided by specifying this option more than once + Positional arguments: + principal +``` + +##### Examples + +``` +polaris principals create some_user + +polaris principals create --client-id ${CLIENT_ID} --property admin=true some_admin_user +``` + +#### delete + +The `delete` subcommand is used to delete a principal. + +``` +input: polaris principals delete --help +options: + delete + Positional arguments: + principal +``` + +##### Examples + +``` +polaris principals delete some_user + +polaris principals delete some_admin_user +``` + +#### get + +The `get` subcommand retrieves details about a principal. + +``` +input: polaris principals get --help +options: + get + Positional arguments: + principal +``` + +##### Examples + +``` +polaris principals get some_user + +polaris principals get some_admin_user +``` + +#### list + +The `list` subcommand shows details about all principals. + +##### Examples + +``` +polaris principals list +``` + +#### rotate-credentials + +The `rotate-credentials` subcommand is used to update the credentials used by a principal. After this command runs successfully, the new credentials will be printed to stdout. + +``` +input: polaris principals rotate-credentials --help +options: + rotate-credentials + Positional arguments: + principal +``` + +##### Examples + +``` +polaris principals rotate-credentials some_user + +polaris principals rotate-credentials some_admin_user +``` + +#### update + +The `update` subcommand is used to update a principal. Currently, this supports rewriting the properties associated with a principal. + +``` +input: polaris principals update --help +options: + update + Named arguments: + --property A key/value pair such as: tag=value. Multiple can be provided by specifying this option more than once + Positional arguments: + principal +``` + +##### Examples + +``` +polaris principals update --property key=value --property other_key=other_value some_user + +polaris principals update --property are_other_keys_removed=yes some_user +``` + +#### access + +The `access` subcommand retrieves entities relation about a principal. + +``` +input: polaris principals access --help +options: + access + Positional arguments: + principal +``` + +##### Examples + +``` +polaris principals access quickstart_user +``` + +### Principal Roles + +The `principal-roles` command is used to create, discover, and manage principal roles within Polaris. Additionally, this command can identify principals or catalog roles associated with a principal role, and can be used to grant a principal role to a principal. + +`principal-roles` supports the following subcommands: + +1. create +2. delete +3. get +4. list +5. update +6. grant +7. revoke + +#### create + +The `create` subcommand is used to create a new principal role. + +``` +input: polaris principal-roles create --help +options: + create + Named arguments: + --property A key/value pair such as: tag=value. Multiple can be provided by specifying this option more than once + Positional arguments: + principal_role +``` + +##### Examples + +``` +polaris principal-roles create data_engineer + +polaris principal-roles create --property key=value data_analyst +``` + +#### delete + +The `delete` subcommand is used to delete a principal role. + +``` +input: polaris principal-roles delete --help +options: + delete + Positional arguments: + principal_role +``` + +##### Examples + +``` +polaris principal-roles delete data_engineer + +polaris principal-roles delete data_analyst +``` + +#### get + +The `get` subcommand retrieves details about a principal role. + +``` +input: polaris principal-roles get --help +options: + get + Positional arguments: + principal_role +``` + +##### Examples + +``` +polaris principal-roles get data_engineer + +polaris principal-roles get data_analyst +``` + +#### list + +The list subcommand is used to print out all principal roles or, alternatively, to list all principal roles associated with a given principal or with a given catalog role. + +``` +input: polaris principal-roles list --help +options: + list + Named arguments: + --catalog-role The name of a catalog role. If provided, show only principal roles assigned to this catalog role. + --principal The name of a principal. If provided, show only principal roles assigned to this principal. +``` + +##### Examples + +``` +polaris principal-roles list + +polaris principal-roles --principal d.knuth + +polaris principal-roles --catalog-role super_secret_data +``` + +#### update + +The `update` subcommand is used to update a principal role. Currently, this supports updating the properties tied to a principal role. + +``` +input: polaris principal-roles update --help +options: + update + Named arguments: + --property A key/value pair such as: tag=value. Multiple can be provided by specifying this option more than once + Positional arguments: + principal_role +``` + +##### Examples + +``` +polaris principal-roles update --property key=value2 data_engineer + +polaris principal-roles update data_analyst --property key=value3 +``` + +#### grant + +The `grant` subcommand is used to grant a principal role to a principal. + +``` +input: polaris principal-roles grant --help +options: + grant + Named arguments: + --principal A principal to grant this principal role to + Positional arguments: + principal_role +``` + +##### Examples + +``` +polaris principal-roles grant --principal d.knuth data_engineer + +polaris principal-roles grant data_scientist --principal a.ng +``` + +#### revoke + +The `revoke` subcommand is used to revoke a principal role from a principal. + +``` +input: polaris principal-roles revoke --help +options: + revoke + Named arguments: + --principal A principal to revoke this principal role from + Positional arguments: + principal_role +``` + +##### Examples + +``` +polaris principal-roles revoke --principal former.employee data_engineer + +polaris principal-roles revoke data_scientist --principal changed.role +``` + +### Catalog Roles + +The catalog-roles command is used to create, discover, and manage catalog roles within Polaris. Additionally, this command can be used to grant a catalog role to a principal role. + +`catalog-roles` supports the following subcommands: + +1. create +2. delete +3. get +4. list +5. update +6. grant +7. revoke + +#### create + +The `create` subcommand is used to create a new catalog role. + +``` +input: polaris catalog-roles create --help +options: + create + Named arguments: + --catalog The name of an existing catalog + --property A key/value pair such as: tag=value. Multiple can be provided by specifying this option more than once + Positional arguments: + catalog_role +``` + +##### Examples + +``` +polaris catalog-roles create --property key=value --catalog some_catalog sales_data + +polaris catalog-roles create --catalog other_catalog sales_data +``` + +#### delete + +The `delete` subcommand is used to delete a catalog role. + +``` +input: polaris catalog-roles delete --help +options: + delete + Named arguments: + --catalog The name of an existing catalog + Positional arguments: + catalog_role +``` + +##### Examples + +``` +polaris catalog-roles delete --catalog some_catalog sales_data + +polaris catalog-roles delete --catalog other_catalog sales_data +``` + +#### get + +The `get` subcommand retrieves details about a catalog role. + +``` +input: polaris catalog-roles get --help +options: + get + Named arguments: + --catalog The name of an existing catalog + Positional arguments: + catalog_role +``` + +##### Examples + +``` +polaris catalog-roles get --catalog some_catalog inventory_data + +polaris catalog-roles get --catalog other_catalog inventory_data +``` + +#### list + +The `list` subcommand is used to print all catalog roles. Alternatively, if a principal role is provided, only catalog roles associated with that principal are shown. + +``` +input: polaris catalog-roles list --help +options: + list + Named arguments: + --principal-role The name of a principal role + Positional arguments: + catalog +``` + +##### Examples + +``` +polaris catalog-roles list + +polaris catalog-roles list --principal-role data_engineer +``` + +#### update + +The `update` subcommand is used to update a catalog role. Currently, only updating properties associated with the catalog role is supported. + +``` +input: polaris catalog-roles update --help +options: + update + Named arguments: + --catalog The name of an existing catalog + --property A key/value pair such as: tag=value. Multiple can be provided by specifying this option more than once + Positional arguments: + catalog_role +``` + +##### Examples + +``` +polaris catalog-roles update --property contains_pii=true --catalog some_catalog sales_data + +polaris catalog-roles update sales_data --catalog some_catalog --property key=value +``` + +#### grant + +The `grant` subcommand is used to grant a catalog role to a principal role. + +``` +input: polaris catalog-roles grant --help +options: + grant + Named arguments: + --catalog The name of an existing catalog + --principal-role The name of a catalog role + Positional arguments: + catalog_role +``` + +##### Examples + +``` +polaris catalog-roles grant sensitive_data --catalog some_catalog --principal-role power_user + +polaris catalog-roles grant --catalog sales_data contains_cc_info_catalog_role --principal-role financial_analyst_role +``` + +#### revoke + +The `revoke` subcommand is used to revoke a catalog role from a principal role. + +``` +input: polaris catalog-roles revoke --help +options: + revoke + Named arguments: + --catalog The name of an existing catalog + --principal-role The name of a catalog role + Positional arguments: + catalog_role +``` + +##### Examples + +``` +polaris catalog-roles revoke sensitive_data --catalog some_catalog --principal-role power_user + +polaris catalog-roles revoke --catalog sales_data contains_cc_info_catalog_role --principal-role financial_analyst_role +``` + +### Namespaces + +The `namespaces` command is used to manage namespaces within Polaris. + +`namespaces` supports the following subcommands: + +1. create +2. delete +3. get +4. list + +#### create + +The `create` subcommand is used to create a new namespace. + +When creating a namespace with an explicit location, that location must reside within the parent catalog or namespace. + +``` +input: polaris namespaces create --help +options: + create + Named arguments: + --catalog The name of an existing catalog + --location If specified, the location at which to store the namespace and entities inside it + --property A key/value pair such as: tag=value. Multiple can be provided by specifying this option more than once + Positional arguments: + namespace +``` + +##### Examples + +``` +polaris namespaces create --catalog my_catalog outer + +polaris namespaces create --catalog my_catalog --location 's3://bucket/outer/inner_SUFFIX' outer.inner +``` + +#### delete + +The `delete` subcommand is used to delete a namespace. + +``` +input: polaris namespaces delete --help +options: + delete + Named arguments: + --catalog The name of an existing catalog + Positional arguments: + namespace +``` + +##### Examples + +``` +polaris namespaces delete outer_namespace.inner_namespace --catalog my_catalog + +polaris namespaces delete --catalog my_catalog outer_namespace +``` + +#### get + +The `get` subcommand retrieves details about a namespace. + +``` +input: polaris namespaces get --help +options: + get + Named arguments: + --catalog The name of an existing catalog + Positional arguments: + namespace +``` + +##### Examples + +``` +polaris namespaces get --catalog some_catalog a.b + +polaris namespaces get a.b.c --catalog some_catalog +``` + +#### list + +The `list` subcommand shows details about all namespaces directly within a catalog or, optionally, within some parent prefix in that catalog. + +``` +input: polaris namespaces list --help +options: + list + Named arguments: + --catalog The name of an existing catalog + --parent If specified, list namespaces inside this parent namespace +``` + +##### Examples + +``` +polaris namespaces list --catalog my_catalog + +polaris namespaces list --catalog my_catalog --parent a + +polaris namespaces list --catalog my_catalog --parent a.b +``` + +### Privileges + +The `privileges` command is used to grant various privileges to a catalog role, or to revoke those privileges. Privileges can be on the level of a catalog, a namespace, a table, or a view. For more information on privileges, please refer to the [docs]({{% ref "entities#privilege" %}}). + +Note that when using the `privileges` command, the user specifies the relevant catalog and catalog role before selecting a subcommand. + +`privileges` supports the following subcommands: + +1. list +2. catalog +3. namespace +4. table +5. view + +Each of these subcommands, except `list`, supports the `grant` and `revoke` actions and requires an action to be specified. + +Note that each subcommand's `revoke` action always accepts the same options that the corresponding `grant` action does, but with the addition of the `cascade` option. `cascade` is used to revoke all other privileges that depend on the specified privilege. + +#### list + +The `list` subcommand shows details about all privileges for a catalog role. + +``` +input: polaris privileges list --help +options: + list + Named arguments: + --catalog The name of an existing catalog + --catalog-role The name of a catalog role +``` + +##### Examples + +``` +polaris privileges list --catalog my_catalog --catalog-role my_role + +polaris privileges my_role list --catalog-role my_other_role --catalog my_catalog +``` + +#### catalog + +The `catalog` subcommand manages privileges at the catalog level. `grant` is used to grant catalog privileges to the specified catalog role, and `revoke` is used to revoke them. + +``` +input: polaris privileges catalog --help +options: + catalog + grant + Named arguments: + --catalog The name of an existing catalog + --catalog-role The name of a catalog role + Positional arguments: + privilege + revoke + Named arguments: + --cascade When revoking privileges, additionally revoke privileges that depend on the specified privilege + --catalog The name of an existing catalog + --catalog-role The name of a catalog role + Positional arguments: + privilege +``` + +##### Examples + +``` +polaris privileges \ + catalog \ + grant \ + --catalog my_catalog \ + --catalog-role catalog_role \ + TABLE_CREATE + +polaris privileges \ + catalog \ + revoke \ + --catalog my_catalog \ + --catalog-role catalog_role \ + --cascade \ + TABLE_CREATE +``` + +#### namespace + +The `namespace` subcommand manages privileges at the namespace level. + +``` +input: polaris privileges namespace --help +options: + namespace + grant + Named arguments: + --namespace A period-delimited namespace + --catalog The name of an existing catalog + --catalog-role The name of a catalog role + Positional arguments: + privilege + revoke + Named arguments: + --namespace A period-delimited namespace + --cascade When revoking privileges, additionally revoke privileges that depend on the specified privilege + --catalog The name of an existing catalog + --catalog-role The name of a catalog role + Positional arguments: + privilege +``` + +##### Examples + +``` +polaris privileges \ + namespace \ + grant \ + --catalog my_catalog \ + --catalog-role catalog_role \ + --namespace a.b \ + TABLE_LIST + +polaris privileges \ + namespace \ + revoke \ + --catalog my_catalog \ + --catalog-role catalog_role \ + --namespace a.b \ + TABLE_LIST +``` + +#### table + +The `table` subcommand manages privileges at the table level. + +``` +input: polaris privileges table --help +options: + table + grant + Named arguments: + --namespace A period-delimited namespace + --table The name of a table + --catalog The name of an existing catalog + --catalog-role The name of a catalog role + Positional arguments: + privilege + revoke + Named arguments: + --namespace A period-delimited namespace + --table The name of a table + --cascade When revoking privileges, additionally revoke privileges that depend on the specified privilege + --catalog The name of an existing catalog + --catalog-role The name of a catalog role + Positional arguments: + privilege +``` + +##### Examples + +``` +polaris privileges \ + table \ + grant \ + --catalog my_catalog \ + --catalog-role catalog_role \ + --namespace a.b \ + --table t \ + TABLE_DROP + +polaris privileges \ + table \ + grant \ + --catalog my_catalog \ + --catalog-role catalog_role \ + --namespace a.b \ + --table t \ + --cascade \ + TABLE_DROP +``` + +#### view + +The `view` subcommand manages privileges at the view level. + +``` +input: polaris privileges view --help +options: + view + grant + Named arguments: + --namespace A period-delimited namespace + --view The name of a view + --catalog The name of an existing catalog + --catalog-role The name of a catalog role + Positional arguments: + privilege + revoke + Named arguments: + --namespace A period-delimited namespace + --view The name of a view + --cascade When revoking privileges, additionally revoke privileges that depend on the specified privilege + --catalog The name of an existing catalog + --catalog-role The name of a catalog role + Positional arguments: + privilege +``` + +##### Examples + +``` +polaris privileges \ + view \ + grant \ + --catalog my_catalog \ + --catalog-role catalog_role \ + --namespace a.b.c \ + --view v \ + VIEW_FULL_METADATA + +polaris privileges \ + view \ + grant \ + --catalog my_catalog \ + --catalog-role catalog_role \ + --namespace a.b.c \ + --view v \ + --cascade \ + VIEW_FULL_METADATA +``` + +### profiles + +The `profiles` command is used to manage stored authentication profiles in Polaris. Profiles allow authentication credentials to be saved and reused, eliminating the need to pass credentials with every command. + +`profiles` supports the following subcommands: + +1. create +2. delete +3. get +4. list +5. update + +#### create + +The `create` subcommand is used to create a new authentication profile. + +``` +input: polaris profiles create --help +options: + create + Positional arguments: + profile +``` + +##### Examples + +``` +polaris profiles create dev +``` + +#### delete + +The `delete` subcommand removes a stored profile. + +``` +input: polaris profiles delete --help +options: + delete + Positional arguments: + profile +``` + +##### Examples + +``` +polaris profiles delete dev +``` + +#### get + +The `get` subcommand removes a stored profile. + +``` +input: polaris profiles get --help +options: + get + Positional arguments: + profile +``` + +##### Examples + +``` +polaris profiles get dev +``` + +#### list + +The `list` subcommand displays all stored profiles. + +``` +input: polaris profiles list --help +options: + list +``` + +##### Examples + +``` +polaris profiles list +``` + +#### update + +The `update` subcommand modifies an existing profile. + +``` +input: polaris profiles update --help +options: + update + Positional arguments: + profile +``` + +##### Examples + +``` +polaris profiles update dev +``` + +## Examples + +This section outlines example code for a few common operations as well as for some more complex ones. + +For especially complex operations, you may wish to instead directly use the Python API. + +### Creating a principal and a catalog + +``` +polaris principals create my_user + +polaris catalogs create \ + --type internal \ + --storage-type s3 \ + --default-base-location s3://iceberg-bucket/polaris-base \ + --role-arn arn:aws:iam::111122223333:role/ExampleCorpRole \ + --allowed-location s3://iceberg-bucket/polaris-alt-location-1 \ + --allowed-location s3://iceberg-bucket/polaris-alt-location-2 \ + my_catalog +``` + +### Granting a principal the ability to manage the content of a catalog + +``` +polaris principal-roles create power_user +polaris principal-roles grant --principal my_user power_user + +polaris catalog-roles create --catalog my_catalog my_catalog_role +polaris catalog-roles grant \ + --catalog my_catalog \ + --principal-role power_user \ + my_catalog_role + +polaris privileges \ + catalog \ + --catalog my_catalog \ + --catalog-role my_catalog_role \ + grant \ + CATALOG_MANAGE_CONTENT +``` + +### Identifying the tables a given principal has been granted explicit access to read + +_Note that some other privileges, such as `CATALOG_MANAGE_CONTENT`, subsume `TABLE_READ_DATA` and would not be discovered here._ + +``` +principal_roles=$(polaris principal-roles list --principal my_principal) +for principal_role in ${principal_roles}; do + catalog_roles=$(polaris catalog-roles --list --principal-role "${principal_role}") + for catalog_role in ${catalog_roles}; do + grants=$(polaris privileges list --catalog-role "${catalog_role}" --catalog "${catalog}") + for grant in $(echo "${grants}" | jq -c '.[] | select(.privilege == "TABLE_READ_DATA")'); do + echo "${grant}" + fi + done + done +done +``` + + diff --git a/site/content/in-dev/1.0.0/configuration.md b/site/content/in-dev/1.0.0/configuration.md new file mode 100644 index 0000000000..95d77230f9 --- /dev/null +++ b/site/content/in-dev/1.0.0/configuration.md @@ -0,0 +1,187 @@ +--- +# +# Licensed to the Apache Software Foundation (ASF) under one +# or more contributor license agreements. See the NOTICE file +# distributed with this work for additional information +# regarding copyright ownership. The ASF licenses this file +# to you under the Apache License, Version 2.0 (the +# "License"); you may not use this file except in compliance +# with the License. You may obtain a copy of the License at +# +# http://www.apache.org/licenses/LICENSE-2.0 +# +# Unless required by applicable law or agreed to in writing, +# software distributed under the License is distributed on an +# "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY +# KIND, either express or implied. See the License for the +# specific language governing permissions and limitations +# under the License. +# +title: Configuring Polaris +type: docs +weight: 550 +--- + +## Overview + +This page provides information on how to configure Apache Polaris (Incubating). Unless stated +otherwise, this information is valid both for Polaris Docker images (and Kubernetes deployments) as +well as for Polaris binary distributions. + +> Note: for Production tips and best practices, refer to [Configuring Polaris for Production]({{% ref "configuring-polaris-for-production.md" %}}). + +First off, Polaris server runs on Quarkus, and uses its configuration mechanisms. Read Quarkus +[configuration guide](https://quarkus.io/guides/config) to get familiar with the basics. + +Quarkus aggregates configuration properties from multiple sources, applying them in a specific order +of precedence. When a property is defined in multiple sources, the value from the source with the +higher priority overrides those from lower-priority sources. + +The sources are listed below, from highest to lowest priority: + +1. System properties: properties set via the Java command line using `-Dproperty.name=value`. +2. Environment variables (see below for important details). +3. Settings in `$PWD/config/application.properties` file. +4. The `application.properties` files packaged in Polaris. +5. Default values: hardcoded defaults within the application. + +When using environment variables, there are two naming conventions: + +1. If possible, just use the property name as the environment variable name. This works fine in most + cases, e.g. in Kubernetes deployments. For example, `polaris.realm-context.realms` can be + included as is in a container YAML definition: + ```yaml + env: + - name: "polaris.realm-context.realms" + value: "realm1,realm2" + ``` + +2. If running from a script or shell prompt, however, stricter naming rules apply: variable names + can consist solely of uppercase letters, digits, and the `_` (underscore) sign. In such + situations, the environment variable name must be derived from the property name, by using + uppercase letters, and replacing all dots, dashes and quotes by underscores. For example, + `polaris.realm-context.realms` becomes `POLARIS_REALM_CONTEXT_REALMS`. See + [here](https://smallrye.io/smallrye-config/Main/config/environment-variables/) for more details. + +> [!IMPORTANT] +> While convenient, uppercase-only environment variables can be problematic for complex property +> names. In these situations, it's preferable to use system properties or a configuration file. + +As stated above, a configuration file can also be provided at runtime; it should be available +(mounted) at `$PWD/config/application.properties` for Polaris server to recognize it. In Polaris +official Docker images, this location is `/deployment/config/application.properties`. + +For Kubernetes deployments, the configuration file is typically defined as a `ConfigMap`, then +mounted in the container at `/deployment/config/application.properties`. It can be mounted in +read-only mode, as Polaris only reads the configuration file once, at startup. + +## Polaris Configuration Options Reference + +| Configuration Property | Default Value | Description | +|----------------------------------------------------------------------------------------|-----------------------|--------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------| +| `polaris.persistence.type` | `relational-jdbc` | Define the persistence backend used by Polaris (`in-memory`, `relational-jdbc`, `eclipse-link` (deprecated)). See [Configuring Apache Polaris for Production)[{{% ref "configuring-polaris-for-production.md" %}}) | +| `polaris.persistence.relational.jdbc.max-retries` | `1` | Total number of retries JDBC persistence will attempt on connection resets or serialization failures before giving up. | +| `polaris.persistence.relational.jdbc.max_duaration_in_ms` | `5000 ms` | Max time interval (ms) since the start of a transaction when retries can be attempted. | +| `polaris.persistence.relational.jdbc.initial_delay_in_ms` | `100 ms` | Initial delay before retrying. The delay is doubled after each retry. | +| `polaris.persistence.eclipselink.configurationFile` | | Define the location of the `persistence.xml`. By default, it's the built-in `persistence.xml` in use. | +| `polaris.persistence.eclipselink.persistenceUnit` | `polaris` | Define the name of the persistence unit to use, as defined in the `persistence.xml`. | +| `polaris.realm-context.type` | `default` | Define the type of the Polaris realm to use. | +| `polaris.realm-context.realms` | `POLARIS` | Define the list of realms to use. | +| `polaris.realm-context.header-name` | `Polaris-Realm` | Define the header name defining the realm context. | +| `polaris.features."ENFORCE_PRINCIPAL_CREDENTIAL_ROTATION_REQUIRED_CHECKING"` | `false` | Flag to enforce check if credential rotation. | +| `polaris.features."SUPPORTED_CATALOG_STORAGE_TYPES"` | `FILE` | Define the catalog supported storage. Supported values are `S3`, `GCS`, `AZURE`, `FILE`. | +| `polaris.features.realm-overrides."my-realm"."SKIP_CREDENTIAL_SUBSCOPING_INDIRECTION"` | `true` | "Override" realm features, here the skip credential subscoping indirection flag. | +| `polaris.authentication.authenticator.type` | `default` | Define the Polaris authenticator type. | +| `polaris.authentication.token-service.type` | `default` | Define the Polaris token service type. | +| `polaris.authentication.token-broker.type` | `rsa-key-pair` | Define the Polaris token broker type. Also configure the location of the key files. For RSA: if the locations of the key files are not configured, an ephemeral key-pair will be created on each Polaris server instance startup, which breaks existing tokens after server restarts and is also incompatible with running multiple Polaris server instances. | +| `polaris.authentication.token-broker.max-token-generation` | `PT1H` | Define the max token generation policy on the token broker. | +| `polaris.authentication.token-broker.rsa-key-pair.private-key-file` | | Define the location of the RSA-256 private key file, if present the `public-key` file must be specified, too. | +| `polaris.authentication.token-broker.rsa-key-pair.public-key-file` | | Define the location of the RSA-256 public key file, if present the `private-key` file must be specified, too. | +| `polaris.authentication.token-broker.symmetric-key.secret` | `secret` | Define the secret of the symmetric key. | +| `polaris.authentication.token-broker.symmetric-key.file` | `/tmp/symmetric.key` | Define the location of the symmetric key file. | +| `polaris.storage.aws.access-key` | `accessKey` | Define the AWS S3 access key. If unset, the default credential provider chain will be used. | +| `polaris.storage.aws.secret-key` | `secretKey` | Define the AWS S3 secret key. If unset, the default credential provider chain will be used. | +| `polaris.storage.gcp.token` | `token` | Define the Google Cloud Storage token. If unset, the default credential provider chain will be used. | +| `polaris.storage.gcp.lifespan` | `PT1H` | Define the Google Cloud Storage lifespan type. If unset, the default credential provider chain will be used. | +| `polaris.log.request-id-header-name` | `Polaris-Request-Id` | Define the header name to match request ID in the log. | +| `polaris.log.mdc.aid` | `polaris` | Define the log context (e.g. MDC) AID. | +| `polaris.log.mdc.sid` | `polaris-service` | Define the log context (e.g. MDC) SID. | +| `polaris.rate-limiter.filter.type` | `no-op` | Define the Polaris rate limiter. Supported values are `no-op`, `token-bucket`. | +| `polaris.rate-limiter.token-bucket.type` | `default` | Define the token bucket rate limiter. | +| `polaris.rate-limiter.token-bucket.requests-per-second` | `9999` | Define the number of requests per second for the token bucket rate limiter. | +| `polaris.rate-limiter.token-bucket.window` | `PT10S` | Define the window type for the token bucket rate limiter. | +| `polaris.metrics.tags.=` | `application=Polaris` | Define arbitrary metric tags to include in every request. | +| `polaris.metrics.realm-id-tag.api-metrics-enabled` | `false` | Whether to enable the `realm_id` metric tag in API metrics. | +| `polaris.metrics.realm-id-tag.http-metrics-enabled` | `false` | Whether to enable the `realm_id` metric tag in HTTP request metrics. | +| `polaris.metrics.realm-id-tag.http-metrics-max-cardinality` | `100` | The maximum cardinality for the `realm_id` tag in HTTP request metrics. | +| `polaris.tasks.max-concurrent-tasks` | `100` | Define the max number of concurrent tasks. | +| `polaris.tasks.max-queued-tasks` | `1000` | Define the max number of tasks in queue. | + +There are non Polaris configuration properties that can be useful: + +| Configuration Property | Default Value | Description | +|------------------------------------------------------|---------------------------------|-----------------------------------------------------------------------------| +| `quarkus.log.level` | `INFO` | Define the root log level. | +| `quarkus.log.category."org.apache.polaris".level` | | Define the log level for a specific category. | +| `quarkus.default-locale` | System locale | Force the use of a specific locale, for instance `en_US`. | +| `quarkus.http.port` | `8181` | Define the HTTP port number. | +| `quarkus.http.auth.basic` | `false` | Enable the HTTP basic authentication. | +| `quarkus.http.limits.max-body-size` | `10240K` | Define the HTTP max body size limit. | +| `quarkus.http.cors.origins` | | Define the HTTP CORS origins. | +| `quarkus.http.cors.methods` | `PATCH, POST, DELETE, GET, PUT` | Define the HTTP CORS covered methods. | +| `quarkus.http.cors.headers` | `*` | Define the HTTP CORS covered headers. | +| `quarkus.http.cors.exposed-headers` | `*` | Define the HTTP CORS covered exposed headers. | +| `quarkus.http.cors.access-control-max-age` | `PT10M` | Define the HTTP CORS access control max age. | +| `quarkus.http.cors.access-control-allow-credentials` | `true` | Define the HTTP CORS access control allow credentials flag. | +| `quarkus.management.enabled` | `true` | Enable the management server. | +| `quarkus.management.port` | `8182` | Define the port number of the Polaris management server. | +| `quarkus.management.root-path` | | Define the root path where `/metrics` and `/health` endpoints are based on. | +| `quarkus.otel.sdk.disabled` | `true` | Enable the OpenTelemetry layer. | + +> Note: This section is only relevant for Polaris Docker images and Kubernetes deployments. + +There are many other actionable environment variables available in the official Polaris Docker +image; they come from the base image used by Polaris, [ubi9/openjdk-21-runtime]. They should be used +to fine-tune the Java runtime directly, e.g. to enable debugging or to set the heap size. These +variables are not specific to Polaris, but are inherited from the base image. If in doubt, leave +everything at its default! + +[ubi9/openjdk-21-runtime]: https://catalog.redhat.com/software/containers/ubi9/openjdk-21-runtime/6501ce769a0d86945c422d5f + +| Environment variable | Description | +|----------------------------------|----------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------| +| `JAVA_OPTS` or `JAVA_OPTIONS` | **NOT RECOMMENDED**. JVM options passed to the `java` command (example: "-verbose:class"). Setting this variable will override all options set by any of the other variables in this table. To pass extra settings, use `JAVA_OPTS_APPEND` instead. | +| `JAVA_OPTS_APPEND` | User specified Java options to be appended to generated options in `JAVA_OPTS` (example: "-Dsome.property=foo"). | +| `JAVA_TOOL_OPTIONS` | This variable is defined and honored by all OpenJDK distros, see [here](https://bugs.openjdk.org/browse/JDK-4971166). Options defined here take precedence over all else; using this variable is generally not necessary, but can be useful e.g. to enforce JVM startup parameters, to set up remote debug, or to define JVM agents. | +| `JAVA_MAX_MEM_RATIO` | Is used to calculate a default maximal heap memory based on a containers restriction. If used in a container without any memory constraints for the container then this option has no effect. If there is a memory constraint then `-XX:MaxRAMPercentage` is set to a ratio of the container available memory as set here. The default is `80` which means 80% of the available memory is used as an upper boundary. You can skip this mechanism by setting this value to `0` in which case no `-XX:MaxRAMPercentage` option is added. | +| `JAVA_DEBUG` | If set remote debugging will be switched on. Disabled by default (example: true"). | +| `JAVA_DEBUG_PORT` | Port used for remote debugging. Defaults to "5005" (tip: use "*:5005" to enable debugging on all network interfaces). | +| `GC_MIN_HEAP_FREE_RATIO` | Minimum percentage of heap free after GC to avoid expansion. Default is 10. | +| `GC_MAX_HEAP_FREE_RATIO` | Maximum percentage of heap free after GC to avoid shrinking. Default is 20. | +| `GC_TIME_RATIO` | Specifies the ratio of the time spent outside the garbage collection. Default is 4. | +| `GC_ADAPTIVE_SIZE_POLICY_WEIGHT` | The weighting given to the current GC time versus previous GC times. Default is 90. | +| `GC_METASPACE_SIZE` | The initial metaspace size. There is no default (example: "20"). | +| `GC_MAX_METASPACE_SIZE` | The maximum metaspace size. There is no default (example: "100"). | +| `GC_CONTAINER_OPTIONS` | Specify Java GC to use. The value of this variable should contain the necessary JRE command-line options to specify the required GC, which will override the default of `-XX:+UseParallelGC` (example: `-XX:+UseG1GC`). | +Here are some examples: + +| Example | `docker run` option | +|--------------------------------------------|---------------------------------------------------------------------------------------------------------------------| +| Using another GC | `-e GC_CONTAINER_OPTIONS="-XX:+UseShenandoahGC"` lets Polaris use Shenandoah GC instead of the default parallel GC. | +| Set the Java heap size to a _fixed_ amount | `-e JAVA_OPTS_APPEND="-Xms8g -Xmx8g"` lets Polaris use a Java heap of 8g. | +| Set the maximum heap percentage | `-e JAVA_MAX_MEM_RATIO="70"` lets Polaris use 70% percent of the available memory. | + + +## Troubleshooting Configuration Issues + +If you encounter issues with the configuration, you can ask Polaris to print out the configuration it +is using. To do this, set the log level for the `io.smallrye.config` category to `DEBUG`, and also +set the console appender level to `DEBUG`: + +```properties +quarkus.log.console.level=DEBUG +quarkus.log.category."io.smallrye.config".level=DEBUG +``` + +> [!IMPORTANT] This will print out all configuration values, including sensitive ones like +> passwords. Don't do this in production, and don't share this output with anyone you don't trust! diff --git a/site/content/in-dev/1.0.0/configuring-polaris-for-production.md b/site/content/in-dev/1.0.0/configuring-polaris-for-production.md new file mode 100644 index 0000000000..fac51b40f9 --- /dev/null +++ b/site/content/in-dev/1.0.0/configuring-polaris-for-production.md @@ -0,0 +1,222 @@ +--- +# +# Licensed to the Apache Software Foundation (ASF) under one +# or more contributor license agreements. See the NOTICE file +# distributed with this work for additional information +# regarding copyright ownership. The ASF licenses this file +# to you under the Apache License, Version 2.0 (the +# "License"); you may not use this file except in compliance +# with the License. You may obtain a copy of the License at +# +# http://www.apache.org/licenses/LICENSE-2.0 +# +# Unless required by applicable law or agreed to in writing, +# software distributed under the License is distributed on an +# "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY +# KIND, either express or implied. See the License for the +# specific language governing permissions and limitations +# under the License. +# +title: Configuring Polaris for Production +linkTitle: Production Configuration +type: docs +weight: 600 +--- + +The default server configuration is intended for development and testing. When you deploy Polaris in production, +review and apply the following checklist: +- [ ] Configure OAuth2 keys +- [ ] Enforce realm header validation (`require-header=true`) +- [ ] Use a durable metastore (JDBC + PostgreSQL) +- [ ] Bootstrap valid realms in the metastore +- [ ] Disable local FILE storage + +### Configure OAuth2 + +Polaris authentication requires specifying a token broker factory type. Two implementations are +supported out of the box: + +- [rsa-key-pair] uses a pair of public and private keys; +- [symmetric-key] uses a shared secret. + +[rsa-key-pair]: https://github.com/apache/polaris/blob/390f1fa57bb1af24a21aa95fdbff49a46e31add7/service/common/src/main/java/org/apache/polaris/service/auth/JWTRSAKeyPairFactory.java +[symmetric-key]: https://github.com/apache/polaris/blob/390f1fa57bb1af24a21aa95fdbff49a46e31add7/service/common/src/main/java/org/apache/polaris/service/auth/JWTSymmetricKeyFactory.java + +By default, Polaris uses `rsa-key-pair`, with randomly generated keys. + +> [!IMPORTANT] +> The default `rsa-key-pair` configuration is not suitable when deploying many replicas of Polaris, +> as each replica will have its own set of keys. This will cause token validation to fail when a +> request is routed to a different replica than the one that issued the token. + +It is highly recommended to configure Polaris with previously-generated RSA keys. This can be done +by setting the following properties: + +```properties +polaris.authentication.token-broker.type=rsa-key-pair +polaris.authentication.token-broker.rsa-key-pair.public-key-file=/tmp/public.key +polaris.authentication.token-broker.rsa-key-pair.private-key-file=/tmp/private.key +``` + +To generate an RSA key pair, you can use the following commands: + +```shell +openssl genrsa -out private.key 2048 +openssl rsa -in private.key -pubout -out public.key +``` + +Alternatively, you can use a symmetric key by setting the following properties: + +```properties +polaris.authentication.token-broker.type=symmetric-key +polaris.authentication.token-broker.symmetric-key.file=/tmp/symmetric.key +``` + +Note: it is also possible to set the symmetric key secret directly in the configuration file. If +possible, pass the secret as an environment variable to avoid storing sensitive information in the +configuration file: + +```properties +polaris.authentication.token-broker.symmetric-key.secret=${POLARIS_SYMMETRIC_KEY_SECRET} +``` + +Finally, you can also configure the token broker to use a maximum lifespan by setting the following +property: + +```properties +polaris.authentication.token-broker.max-token-generation=PT1H +``` + +Typically, in Kubernetes, you would define the keys as a `Secret` and mount them as files in the +container. + +### Realm Context Resolver + +By default, Polaris resolves realms based on incoming request headers. You can configure the realm +context resolver by setting the following properties in `application.properties`: + +```properties +polaris.realm-context.realms=POLARIS,MY-REALM +polaris.realm-context.header-name=Polaris-Realm +``` + +Where: + +- `realms` is a comma-separated list of allowed realms. This setting _must_ be correctly configured. + At least one realm must be specified. +- `header-name` is the name of the header used to resolve the realm; by default, it is + `Polaris-Realm`. + +If a request contains the specified header, Polaris will use the realm specified in the header. If +the realm is not in the list of allowed realms, Polaris will return a `404 Not Found` response. + +If a request _does not_ contain the specified header, however, by default Polaris will use the first +realm in the list as the default realm. In the above example, `POLARIS` is the default realm and +would be used if the `Polaris-Realm` header is not present in the request. + +This is not recommended for production use, as it may lead to security vulnerabilities. To avoid +this, set the following property to `true`: + +```properties +polaris.realm-context.require-header=true +``` + +This will cause Polaris to also return a `404 Not Found` response if the realm header is not present +in the request. + +### Metastore Configuration + +A metastore should be configured with an implementation that durably persists Polaris entities. By +default, Polaris uses an in-memory metastore. + +> [!IMPORTANT] +> The default in-memory metastore is not suitable for production use, as it will lose all data +> when the server is restarted; it is also unusable when multiple Polaris replicas are used. + +To enable a durable metastore, configure your system to use the Relational JDBC-backed metastore. +This implementation leverages Quarkus for datasource management and supports configuration through +environment variables or JVM -D flags at startup. For more information, refer to the [Quarkus configuration reference](https://quarkus.io/guides/config-reference#env-file). + +Configure the metastore by setting the following ENV variables: + +``` +POLARIS_PERSISTENCE_TYPE=relational-jdbc + +QUARKUS_DATASOURCE_DB_KIND=postgresql +QUARKUS_DATASOURCE_USERNAME= +QUARKUS_DATASOURCE_PASSWORD= +QUARKUS_DATASOURCE_JDBC_URL= +``` + + +The relational JDBC metastore is a Quarkus-managed datasource and only supports Postgres and H2 as of now. +Please refer to the documentation here: +[Configure data sources in Quarkus](https://quarkus.io/guides/datasource) + +> [!IMPORTANT] +> Be sure to secure your metastore backend since it will be storing sensitive data and catalog +> metadata. + +Note: Polaris will always create schema 'polaris_schema' during bootstrap under the configured database. + +### Bootstrapping + +Before using Polaris, you must **bootstrap** the metastore. This is a manual operation that must be +performed **only once** for each realm in order to prepare the metastore to integrate with Polaris. + +By default, when bootstrapping a new realm, Polaris will create randomised `CLIENT_ID` and +`CLIENT_SECRET` for the `root` principal and store their hashes in the metastore backend. + +Depending on your database, this may not be convenient as the generated credentials are not stored +in clear text in the database. + +In order to provide your own credentials for `root` principal (so you can request tokens via +`api/catalog/v1/oauth/tokens`), use the [Polaris Admin Tool]({{% ref "admin-tool" %}}) + +You can verify the setup by attempting a token issue for the `root` principal: + +```bash +curl -X POST http://localhost:8181/api/catalog/v1/oauth/tokens \ + -d "grant_type=client_credentials" \ + -d "client_id=my-client-id" \ + -d "client_secret=my-client-secret" \ + -d "scope=PRINCIPAL_ROLE:ALL" +``` + +Which should return an access token: + +```json +{ + "access_token": "...", + "token_type": "bearer", + "issued_token_type": "urn:ietf:params:oauth:token-type:access_token", + "expires_in": 3600 +} +``` + +If you used a non-default realm name, add the appropriate request header to the `curl` command, +otherwise Polaris will resolve the realm to the first one in the configuration +`polaris.realm-context.realms`. Here is an example to set realm header: + +```bash +curl -X POST http://localhost:8181/api/catalog/v1/oauth/tokens \ + -H "Polaris-Realm: my-realm" \ + -d "grant_type=client_credentials" \ + -d "client_id=my-client-id" \ + -d "client_secret=my-client-secret" \ + -d "scope=PRINCIPAL_ROLE:ALL" +``` + +### Disable FILE Storage Type +By default, Polaris allows using the local file system (`FILE`) for catalog storage. This is fine for testing, +but **not recommended for production**. To disable it, set the supported storage types like this: +```hocon +polaris.features."SUPPORTED_CATALOG_STORAGE_TYPES" = [ "S3", "Azure" ] +``` +Leave out `FILE` to prevent its use. Only include the storage types your setup needs. + +### Upgrade Considerations + +The [Polaris Evolution](../evolution) page discusses backward compatibility and +upgrade concerns. + diff --git a/site/content/in-dev/1.0.0/entities.md b/site/content/in-dev/1.0.0/entities.md new file mode 100644 index 0000000000..04d625bb94 --- /dev/null +++ b/site/content/in-dev/1.0.0/entities.md @@ -0,0 +1,95 @@ +--- +# +# Licensed to the Apache Software Foundation (ASF) under one +# or more contributor license agreements. See the NOTICE file +# distributed with this work for additional information +# regarding copyright ownership. The ASF licenses this file +# to you under the Apache License, Version 2.0 (the +# "License"); you may not use this file except in compliance +# with the License. You may obtain a copy of the License at +# +# http://www.apache.org/licenses/LICENSE-2.0 +# +# Unless required by applicable law or agreed to in writing, +# software distributed under the License is distributed on an +# "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY +# KIND, either express or implied. See the License for the +# specific language governing permissions and limitations +# under the License. +# +Title: Entities +type: docs +weight: 400 +--- + +This page documents various entities that can be managed in Apache Polaris (Incubating). + +## Catalog + +A catalog is a top-level entity in Polaris that may contain other entities like [namespaces](#namespace) and [tables](#table). These map directly to [Apache Iceberg catalogs](https://iceberg.apache.org/concepts/catalog/). + +For information on managing catalogs with the REST API or for more information on what data can be associated with a catalog, see [the API docs]({{% github-polaris "client/python/docs/CreateCatalogRequest.md" %}}). + +### Storage Type + +All catalogs in Polaris are associated with a _storage type_. Valid Storage Types are `S3`, `Azure`, and `GCS`. The `FILE` type is also additionally available for testing. Each of these types relates to a different storage provider where data within the catalog may reside. Depending on the storage type, various other configurations may be set for a catalog including credentials to be used when accessing data inside the catalog. + +For details on how to use Storage Types in the REST API, see [the API docs]({{% github-polaris "client/python/docs/StorageConfigInfo.md" %}}). + +For usage examples of storage types, see [docs]({{% ref "command-line-interface" %}}). + +## Namespace + +A namespace is a logical entity that resides within a [catalog](#catalog) and can contain other entities such as [tables](#table) or [views](#view). Some other systems may refer to namespaces as _schemas_ or _databases_. + +In Polaris, namespaces can be nested. For example, `a.b.c.d.e.f.g` is a valid namespace. `b` is said to reside within `a`, and so on. + +For information on managing namespaces with the REST API or for more information on what data can be associated with a namespace, see [the API docs]({{% github-polaris "client/python/docs/CreateNamespaceRequest.md" %}}). + +## Table + +Polaris tables are entities that map to [Apache Iceberg tables](https://iceberg.apache.org/docs/nightly/configuration/), [Delta tables](https://docs.databricks.com/aws/en/delta/table-properties), or [Hudi tables](https://hudi.apache.org/docs/next/configurations#TABLE_CONFIG). + +For information on managing tables with the REST API or for more information on what data can be associated with a table, see [the API docs]({{% github-polaris "client/python/docs/CreateTableRequest.md" %}}). + +## View + +Polaris views are entities that map to [Apache Iceberg views](https://iceberg.apache.org/view-spec/). + +For information on managing views with the REST API or for more information on what data can be associated with a view, see [the API docs]({{% github-polaris "client/python/docs/CreateViewRequest.md" %}}). + +## Principal + +Polaris principals are unique identities that can be used to represent users or services. Each principal may have one or more [principal roles](#principal-role) assigned to it for the purpose of accessing catalogs and the entities within them. + +For information on managing principals with the REST API or for more information on what data can be associated with a principal, see [the API docs]({{% github-polaris "client/python/docs/CreatePrincipalRequest.md" %}}). + +## Principal Role + +Polaris principal roles are labels that may be granted to [principals](#principal). Each principal may have one or more principal roles, and the same principal role may be granted to multiple principals. Principal roles may be assigned based on the persona or responsibilities of a given principal, or on how that principal will need to access different entities within Polaris. + +For information on managing principal roles with the REST API or for more information on what data can be associated with a principal role, see [the API docs]({{% github-polaris "client/python/docs/CreatePrincipalRoleRequest.md" %}}). + +## Catalog Role + +Polaris catalog roles are labels that may be granted to [catalogs](#catalog). Each catalog may have one or more catalog roles, and the same catalog role may be granted to multiple catalogs. Catalog roles may be assigned based on the nature of data that will reside in a catalog, or by the groups of users and services that might need to access that data. + +Each catalog role may have multiple [privileges](#privilege) granted to it, and each catalog role can be granted to one or more [principal roles](#principal-role). This is the mechanism by which principals are granted access to entities inside a catalog such as namespaces and tables. + +## Policy + +Polaris policy is a set of rules governing actions on specified resources under predefined conditions. Polaris support policy for Iceberg table compaction, snapshot expiry, row-level access control, and custom policy definitions. + +Policy can be applied at catalog level, namespace level, or table level. Policy inheritance can be achieved by attaching one to a higher-level scope, such as namespace or catalog. As a result, tables registered under those entities do not need to be declared individually for the same policy. If a table or a namespace requires a different policy, user can assign a different policy, hence overriding policy of the same type declared at the higher level entities. + +## Privilege + +Polaris privileges are granted to [catalog roles](#catalog-role) in order to grant principals with a given principal role some degree of access to catalogs with a given catalog role. When a privilege is granted to a catalog role, any principal roles granted that catalog role receive the privilege. In turn, any principals who are granted that principal role receive it. + +A privilege can be scoped to any entity inside a catalog, including the catalog itself. + +For a list of supported privileges for each privilege class, see the API docs: +* [Table Privileges]({{% github-polaris "client/python/docs/TablePrivilege.md" %}}) +* [View Privileges]({{% github-polaris "client/python/docs/ViewPrivilege.md" %}}) +* [Namespace Privileges]({{% github-polaris "client/python/docs/NamespacePrivilege.md" %}}) +* [Catalog Privileges]({{% github-polaris "client/python/docs/CatalogPrivilege.md" %}}) diff --git a/site/content/in-dev/1.0.0/evolution.md b/site/content/in-dev/1.0.0/evolution.md new file mode 100644 index 0000000000..ea29badc84 --- /dev/null +++ b/site/content/in-dev/1.0.0/evolution.md @@ -0,0 +1,115 @@ +--- +# +# Licensed to the Apache Software Foundation (ASF) under one +# or more contributor license agreements. See the NOTICE file +# distributed with this work for additional information +# regarding copyright ownership. The ASF licenses this file +# to you under the Apache License, Version 2.0 (the +# "License"); you may not use this file except in compliance +# with the License. You may obtain a copy of the License at +# +# http://www.apache.org/licenses/LICENSE-2.0 +# +# Unless required by applicable law or agreed to in writing, +# software distributed under the License is distributed on an +# "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY +# KIND, either express or implied. See the License for the +# specific language governing permissions and limitations +# under the License. +# +title: Polaris Evolution +type: docs +weight: 1000 +--- + +This page discusses what can be expected from Apache Polaris as the project evolves. + +## Using Polaris as a Catalog + +Polaris is primarily intended to be used as a Catalog of Tables and Views. As such, +it implements the Iceberg REST Catalog API and its own REST APIs. + +Revisions of the Iceberg REST Catalog API are controlled by the [Apache Iceberg](https://iceberg.apache.org/) +community. Polaris attempts to accurately implement this specification. Nonetheless, +optional REST Catalog features may or may not be supported immediately. In general, +there is no guarantee that Polaris releases always implement the latest version of +the Iceberg REST Catalog API. + +Any API under Polaris control that is not in an "experimental" or "beta" state +(e.g. the Management API) is maintained as a versioned REST API. New releases of Polaris +may include changes to the current version of the API. When that happens those changes +are intended to be compatible with prior versions of Polaris clients. Certain endpoints +and parameters may be deprecated. + +In case a major change is required to an API that cannot be implemented in a +backward-compatible way, new endpoints (URI paths) may be introduced. New URI "roots" may +be introduced too (e.g. `api/catalog/v2`). + +Note that those "v1", "v2", etc. URI path segments are not meant to be 1:1 with Polaris +releases or Polaris project version numbers (e.g. a "v2" path segment does not mean that +it is added in Polaris 2.0). + +Polaris servers will support deprecated API endpoints / parameters / versions / etc. +for some transition period to allow clients to migrate. + +### Managing Polaris Database + +Polaris stores its data in a database, which is sometimes referred to as "Metastore" or +"Persistence" in other docs. + +Each Polaris release may support multiple Persistence [implementations](../metastores), +for example, "EclipseLink" (deprecated) and "JDBC" (current). + +Each type of Persistence evolves individually. Within each Persistence type, Polaris +attempts to support rolling upgrades (both version X and X + 1 servers running at the +same time). + +However, migrating between different Persistence types is not supported in a rolling +upgrade manner (for example, migrating from "EclipseLink" to "JDBC"). Polaris provides +[tools](https://github.com/apache/polaris-tools/) for migrating between different +catalogs and those tools may be used to migrate between different Persistence types +as well. Service interruption (downtime) should be expected in those cases. + +## Using Polaris as a Build-Time Dependency + +Polaris produces several jars. These jars or custom builds of Polaris code may be used in +downstream projects according to the terms of the license included into Polaris distributions. + +The minimal version of the JRE required by Polaris code (compilation target) may be updated in +any release. Different Polaris jars may have different minimal JRE version requirements. + +Changes in Java class should be expected at any time regardless of the module name or +whether the class / method is `public` or not. + +This approach is not meant to discourage the use of Polaris code in downstream projects, but +to allow more flexibility in evolving the codebase to support new catalog-level features +and improve code efficiency. Maintainers of downstream projects are encouraged to join Polaris +mailing lists to monitor project changes, suggest improvements, and engage with the Polaris +community in case of specific compatibility concerns. + +## Semantic Versioning + +Polaris strives to follow [Semantic Versioning](https://semver.org/) conventions both with +respect to REST APIs (beta and experimental APIs excepted), [Polaris Policies](../policy/) +and user-facing [configuration](../configuration/). + +The following are some examples of Polaris approach to SemVer in REST APIs / configuration. +These examples are for illustration purposes and should not be considered to be +exhaustive. + +* Polaris implementing an optional Iceberg REST Catalog feature that was unimplemented +in the previous release is not considered a major change. + +* Supporting a new revision of the Iceberg REST Catalog spec in a backward-compatible way +is not considered a major change. Specifically, supporting new REST API prefixes (e.g. `v2`) +is not a major change because it does not affect older clients. + +* Changing the implementation of an Iceberg REST Catalog feature / endpoint in a non-backward +compatible way (e.g. removing or renaming a request parameter) is a major change. + +* Dropping support for a configuration property with the `polaris.` name prefix is a major change. + +* Dropping support for any previously defined [Policy](../policy/) type or property is a major change. + +* Upgrading Quarkus Runtime to its next major version is a major change (because +Quarkus-managed configuration may change). diff --git a/site/content/in-dev/1.0.0/generic-table.md b/site/content/in-dev/1.0.0/generic-table.md new file mode 100644 index 0000000000..2e0e3fe8e6 --- /dev/null +++ b/site/content/in-dev/1.0.0/generic-table.md @@ -0,0 +1,169 @@ +--- +# +# Licensed to the Apache Software Foundation (ASF) under one +# or more contributor license agreements. See the NOTICE file +# distributed with this work for additional information +# regarding copyright ownership. The ASF licenses this file +# to you under the Apache License, Version 2.0 (the +# "License"); you may not use this file except in compliance +# with the License. You may obtain a copy of the License at +# +# http://www.apache.org/licenses/LICENSE-2.0 +# +# Unless required by applicable law or agreed to in writing, +# software distributed under the License is distributed on an +# "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY +# KIND, either express or implied. See the License for the +# specific language governing permissions and limitations +# under the License. +# +title: Generic Table (Beta) +type: docs +weight: 435 +--- + +The Generic Table in Apache Polaris is designed to provide support for non-Iceberg tables across different table formats includes delta, csv etc. It currently provides the following capabilities: +- Create a generic table under a namespace +- Load a generic table +- Drop a generic table +- List all generic tables under a namespace + +**NOTE** The current generic table is in beta release. Please use it with caution and report any issue if encountered. + +## What is a Generic Table? + +A generic table in Polaris is an entity that defines the following fields: + +- **name** (required): A unique identifier for the table within a namespace +- **format** (required): The format for the generic table, i.e. "delta", "csv" +- **base-location** (optional): Table base location in URI format. For example: s3:///path/to/table + - The table base location is a location that includes all files for the table + - A table with multiple disjoint locations (i.e. containing files that are outside the configured base location) is not compliant with the current generic table support in Polaris. + - If no location is provided, clients or users are responsible for managing the location. +- **properties** (optional): Properties for the generic table passed on creation. + - Currently, there is no reserved property key defined. + - The property definition and interpretation is delegated to client or engine implementations. +- **doc** (optional): Comment or description for the table + +## Generic Table API Vs. Iceberg Table API + +Generic Table provides a different set of APIs to operate on the generic table entities while Iceberg APIs operates on +the Iceberg table entities. + +| Operations | **Iceberg Table API** | **Generic Table API** | +|--------------|---------------------------------------------------------------------------------------------------------------------------------------------------------------------|-------------------------------------------------------------------------------------------------------------------------------| +| Create Table | Create an Iceberg table | Create a generic table | +| Load Table | Load an Iceberg table. If the table to load is a generic table, you need to call the Generic Table loadTable API, otherwise a TableNotFoundException will be thrown | Load a generic table. Similarly, try to load an Iceberg table through Generic Table API will thrown a TableNotFoundException. | +| Drop Table | Drop an Iceberg table. Similar as load table, if the table to drop is a Generic table, a tableNotFoundException will be thrown. | Drop a generic table. Drop an Iceberg table through Generic table endpoint will thrown an TableNotFound Exception | +| List Table | List all Iceberg tables | List all generic tables | + +Note that generic table shares the same namespace with Iceberg tables, the table name has to be unique under the same namespace. Furthermore, since +there is currently no support for Update Generic Table, any update to the existing table requires a drop and re-create. + +## Working with Generic Table + +There are two ways to work with Polaris Generic Tables today: +1) Directly communicate with Polaris through REST API calls using tools such as `curl`. Details will be described in the later section. +2) Use the Spark client provided if you are working with Spark. Please refer to [Polaris Spark Client]({{% ref "polaris-spark-client" %}}) for detailed instructions. + +### Create a Generic Table + +To create a generic table, you need to provide the corresponding fields as described in [What is a Generic Table](#what-is-a-generic-table). + +The REST API for creating a generic Table is `POST /polaris/v1/{prefix}/namespaces/{namespace}/generic-tables`, and the +request body looks like the following: + +```json +{ + "name": "", + "format": "", + "base-location": "", + "doc": "", + "properties": { + "": "" + } +} +``` + +Here is an example to create a generic table with name `delta_table` and format as `delta` under a namespace `delta_ns` +for catalog `delta_catalog` using curl: + +```shell +curl -X POST http://localhost:8181/api/catalog/polaris/v1/delta_catalog/namespaces/delta_ns/generic-tables \ + -H "Content-Type: application/json" \ + -d '{ + "name": "delta_table", + "format": "delta", + "base-location": "s3:///path/to/table", + "doc": "delta table example", + "properties": { + "key1": "value1" + } + }' +``` + +### Load a Generic Table +The REST endpoint for load a generic table is `GET /polaris/v1/{prefix}/namespaces/{namespace}/generic-tables/{generic-table}`. + +Here is an example to load the table `delta_table` using curl: +```shell +curl -X GET http://localhost:8181/api/catalog/polaris/v1/delta_catalog/namespaces/delta_ns/generic-tables/delta_table +``` +And the response looks like the following: +```json +{ + "table": { + "name": "delta_table", + "format": "delta", + "base-location": "s3:///path/to/table", + "doc": "delta table example", + "properties": { + "key1": "value1" + } + } +} +``` + +### List Generic Tables +The REST endpoint for listing the generic tables under a given +namespace is `GET /polaris/v1/{prefix}/namespaces/{namespace}/generic-tables/`. + +Following curl command lists all tables under namespace delta_namespace: +```shell +curl -X GET http://localhost:8181/api/catalog/polaris/v1/delta_catalog/namespaces/delta_ns/generic-tables/ +``` +Example Response: +```json +{ + "identifiers": [ + { + "namespace": ["delta_ns"], + "name": "delta_table" + } + ], + "next-page-token": null +} +``` + +### Drop a Generic Table +The drop generic table REST endpoint is `DELETE /polaris/v1/{prefix}/namespaces/{namespace}/generic-tables/{generic-table}` + +The following curl call drops the table `delat_table`: +```shell +curl -X DELETE http://localhost:8181/api/catalog/polaris/v1/delta_catalog/namespaces/delta_ns/generic-tables/{generic-table} +``` + +### API Reference + +For the complete and up-to-date API specification, see the [Catalog API Spec](https://editor-next.swagger.io/?url=https://raw.githubusercontent.com/apache/polaris/refs/heads/main/spec/generated/bundled-polaris-catalog-service.yaml). + +## Limitations + +Current limitations of Generic Table support: +1) Limited spec information. Currently, there is no spec for information like Schema, Partition etc. +2) No commit coordination or update capability provided at the catalog service level. + +Therefore, the catalog itself is unaware of anything about the underlying table except some of the loosely defined metadata. +It is the responsibility of the engine (and plugins used by the engine) to determine exactly how loading or commiting data +should look like based on the metadata. For example, with the delta support, th delta log serialization, deserialization +and update all happens at client side. diff --git a/site/content/in-dev/1.0.0/getting-started/_index.md b/site/content/in-dev/1.0.0/getting-started/_index.md new file mode 100644 index 0000000000..d4f13e6f63 --- /dev/null +++ b/site/content/in-dev/1.0.0/getting-started/_index.md @@ -0,0 +1,25 @@ +--- +# +# Licensed to the Apache Software Foundation (ASF) under one +# or more contributor license agreements. See the NOTICE file +# distributed with this work for additional information +# regarding copyright ownership. The ASF licenses this file +# to you under the Apache License, Version 2.0 (the +# "License"); you may not use this file except in compliance +# with the License. You may obtain a copy of the License at +# +# http://www.apache.org/licenses/LICENSE-2.0 +# +# Unless required by applicable law or agreed to in writing, +# software distributed under the License is distributed on an +# "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY +# KIND, either express or implied. See the License for the +# specific language governing permissions and limitations +# under the License. +# +title: 'Getting Started' +type: docs +weight: 101 +build: + render: never +--- \ No newline at end of file diff --git a/site/content/in-dev/1.0.0/getting-started/deploying-polaris/_index.md b/site/content/in-dev/1.0.0/getting-started/deploying-polaris/_index.md new file mode 100644 index 0000000000..32fd5dafd6 --- /dev/null +++ b/site/content/in-dev/1.0.0/getting-started/deploying-polaris/_index.md @@ -0,0 +1,27 @@ +--- +# +# Licensed to the Apache Software Foundation (ASF) under one +# or more contributor license agreements. See the NOTICE file +# distributed with this work for additional information +# regarding copyright ownership. The ASF licenses this file +# to you under the Apache License, Version 2.0 (the +# "License"); you may not use this file except in compliance +# with the License. You may obtain a copy of the License at +# +# http://www.apache.org/licenses/LICENSE-2.0 +# +# Unless required by applicable law or agreed to in writing, +# software distributed under the License is distributed on an +# "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY +# KIND, either express or implied. See the License for the +# specific language governing permissions and limitations +# under the License. +# +Title: Deploying Polaris on Cloud Providers +type: docs +weight: 300 +--- + +We will now demonstrate how to deploy Polaris locally, as well as with all supported Cloud Providers: Amazon Web Services (AWS), Azure, and Google Cloud Platform (GCP). + +Locally, Polaris can be deployed using both Docker and local build. On the cloud, this tutorial will deploy Polaris using Docker only - but local builds can also be executed. \ No newline at end of file diff --git a/site/content/in-dev/1.0.0/getting-started/deploying-polaris/quickstart-deploy-aws.md b/site/content/in-dev/1.0.0/getting-started/deploying-polaris/quickstart-deploy-aws.md new file mode 100644 index 0000000000..fd95b72b0c --- /dev/null +++ b/site/content/in-dev/1.0.0/getting-started/deploying-polaris/quickstart-deploy-aws.md @@ -0,0 +1,57 @@ +--- +# +# Licensed to the Apache Software Foundation (ASF) under one +# or more contributor license agreements. See the NOTICE file +# distributed with this work for additional information +# regarding copyright ownership. The ASF licenses this file +# to you under the Apache License, Version 2.0 (the +# "License"); you may not use this file except in compliance +# with the License. You may obtain a copy of the License at +# +# http://www.apache.org/licenses/LICENSE-2.0 +# +# Unless required by applicable law or agreed to in writing, +# software distributed under the License is distributed on an +# "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY +# KIND, either express or implied. See the License for the +# specific language governing permissions and limitations +# under the License. +# +Title: Deploying Polaris on Amazon Web Services (AWS) +type: docs +weight: 310 +--- + +Build and launch Polaris using the AWS Startup Script at the location provided in the command below. This script will start an [Amazon RDS for PostgreSQL](https://aws.amazon.com/rds/postgresql/) instance, which will be used as the backend Postgres instance holding all Polaris data. +Additionally, Polaris will be bootstrapped to use this database and Docker containers will be spun up for Spark SQL and Trino. + +The requirements to run the script below are: +* There must be at least two subnets created in the VPC and region in which your EC2 instance reside. The span of subnets MUST include at least 2 availability zones (AZs) within the same region. +* Your EC2 instance must be enabled with [IMDSv1 or IMDSv2 with 2+ hop limit](https://docs.aws.amazon.com/AWSEC2/latest/UserGuide/configuring-IMDS-new-instances.html#configure-IMDS-new-instances-instance-settings). +* The AWS identity that you will use to run this script must have the following AWS permissions: + * "ec2:DescribeInstances" + * "rds:CreateDBInstance" + * "rds:DescribeDBInstances" + * "rds:CreateDBSubnetGroup" + * "sts:AssumeRole" on the same role as the Instance Profile role of the EC2 instance on which you are running this script. Additionally, you should ensure that the Instance Profile contains a trust policy that allows the role to trust itself to be assumed. + +```shell +chmod +x getting-started/assets/cloud_providers/deploy-aws.sh +export ASSETS_PATH=$(pwd)/getting-started/assets/ +export CLIENT_ID=root +export CLIENT_SECRET=s3cr3t +./getting-started/assets/cloud_providers/deploy-aws.sh +``` + +## Next Steps +Congrats, you now have a running instance of1 Polaris! For details on how to use Polaris, check out the [Using Polaris]({{% relref "../using-polaris.md" %}}) page. + +## Cleanup Instructions +To shut down the Polaris server, run the following commands: + +```shell +export ASSETS_PATH=$(pwd)/getting-started/assets/ +docker compose -p polaris -f getting-started/eclipselink/docker-compose.yml down +``` + +To deploy Polaris in a production setting, please review further recommendations at the [Configuring Polaris for Production]({{% relref "../../configuring-polaris-for-production" %}}) page. \ No newline at end of file diff --git a/site/content/in-dev/1.0.0/getting-started/deploying-polaris/quickstart-deploy-azure.md b/site/content/in-dev/1.0.0/getting-started/deploying-polaris/quickstart-deploy-azure.md new file mode 100644 index 0000000000..74df725db0 --- /dev/null +++ b/site/content/in-dev/1.0.0/getting-started/deploying-polaris/quickstart-deploy-azure.md @@ -0,0 +1,52 @@ +--- +# +# Licensed to the Apache Software Foundation (ASF) under one +# or more contributor license agreements. See the NOTICE file +# distributed with this work for additional information +# regarding copyright ownership. The ASF licenses this file +# to you under the Apache License, Version 2.0 (the +# "License"); you may not use this file except in compliance +# with the License. You may obtain a copy of the License at +# +# http://www.apache.org/licenses/LICENSE-2.0 +# +# Unless required by applicable law or agreed to in writing, +# software distributed under the License is distributed on an +# "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY +# KIND, either express or implied. See the License for the +# specific language governing permissions and limitations +# under the License. +# +Title: Deploying Polaris on Azure +type: docs +weight: 320 +--- + +Build and launch Polaris using the AWS Startup Script at the location provided in the command below. This script will start an [Azure Database for PostgreSQL - Flexible Server](https://learn.microsoft.com/en-us/azure/postgresql/flexible-server/overview) instance, which will be used as the backend Postgres instance holding all Polaris data. +Additionally, Polaris will be bootstrapped to use this database and Docker containers will be spun up for Spark SQL and Trino. + +The requirements to run the script below are: +* Install the AZ CLI, if it is not already installed on the Azure VM. Instructions to download the AZ CLI can be found [here](https://learn.microsoft.com/en-us/cli/azure/install-azure-cli). +* You must be logged into the AZ CLI. Please run `az account show` to ensure that you are logged in prior to running this script. +* Assign a System-Assigned Managed Identity to the Azure VM. + +```shell +chmod +x getting-started/assets/cloud_providers/deploy-azure.sh +export ASSETS_PATH=$(pwd)/getting-started/assets/ +export CLIENT_ID=root +export CLIENT_SECRET=s3cr3t +./getting-started/assets/cloud_providers/deploy-azure.sh +``` + +## Next Steps +Congrats, you now have a running instance of Polaris! For further information regarding how to use Polaris, check out the [Using Polaris]({{% relref "../using-polaris" %}}) page. + +## Cleanup Instructions +To shut down the Polaris server, run the following commands: + +```shell +export ASSETS_PATH=$(pwd)/getting-started/assets/ +docker compose -p polaris -f getting-started/eclipselink/docker-compose.yml down +``` + +To deploy Polaris in a production setting, please review further recommendations at the [Configuring Polaris for Production]({{% relref "../../configuring-polaris-for-production" %}}) page. \ No newline at end of file diff --git a/site/content/in-dev/1.0.0/getting-started/deploying-polaris/quickstart-deploy-gcp.md b/site/content/in-dev/1.0.0/getting-started/deploying-polaris/quickstart-deploy-gcp.md new file mode 100644 index 0000000000..9641ad7282 --- /dev/null +++ b/site/content/in-dev/1.0.0/getting-started/deploying-polaris/quickstart-deploy-gcp.md @@ -0,0 +1,52 @@ +--- +# +# Licensed to the Apache Software Foundation (ASF) under one +# or more contributor license agreements. See the NOTICE file +# distributed with this work for additional information +# regarding copyright ownership. The ASF licenses this file +# to you under the Apache License, Version 2.0 (the +# "License"); you may not use this file except in compliance +# with the License. You may obtain a copy of the License at +# +# http://www.apache.org/licenses/LICENSE-2.0 +# +# Unless required by applicable law or agreed to in writing, +# software distributed under the License is distributed on an +# "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY +# KIND, either express or implied. See the License for the +# specific language governing permissions and limitations +# under the License. +# +Title: Deploying Polaris on Google Cloud Platform (GCP) +type: docs +weight: 330 +--- + +Build and launch Polaris using the AWS Startup Script at the location provided in the command below. This script will start a [Cloud SQL for PostgreSQL](https://cloud.google.com/sql/docs/postgres) instance, which will be used as the backend Postgres instance holding all Polaris data. +Additionally, Polaris will be bootstrapped to use this database and Docker containers will be spun up for Spark SQL and Trino. + +The requirements to run the script below are: +* Install the `gcloud` CLI, if it is not already installed on the GCP VM. Instructions to download the `gcloud` CLI can be found [here](https://cloud.google.com/sdk/docs/install). +* Ensure the `Cloud SQL Admin API` has been enabled in your project and that your VM's Principal has access to the correct role: `roles/cloudsql.admin`. +* Ensure the VM's Principal has access to at least Read-only scope on Compute Engine: `compute.readonly`. + +```shell +chmod +x getting-started/assets/cloud_providers/deploy-gcp.sh +export ASSETS_PATH=$(pwd)/getting-started/assets/ +export CLIENT_ID=root +export CLIENT_SECRET=s3cr3t +./getting-started/assets/cloud_providers/deploy-gcp.sh +``` + +## Next Steps +Congrats, you now have a running instance of Polaris! For further information regarding how to use Polaris, check out the [Using Polaris]({{% relref "../using-polaris" %}}) page. + +## Cleanup Instructions +To shut down the Polaris server, run the following commands: + +```shell +export ASSETS_PATH=$(pwd)/getting-started/assets/ +docker compose -p polaris -f getting-started/eclipselink/docker-compose.yml down +``` + +To deploy Polaris in a production setting, please review further recommendations at the [Configuring Polaris for Production]({{% relref "../../configuring-polaris-for-production" %}}) page. \ No newline at end of file diff --git a/site/content/in-dev/1.0.0/getting-started/install-dependencies.md b/site/content/in-dev/1.0.0/getting-started/install-dependencies.md new file mode 100644 index 0000000000..7341118868 --- /dev/null +++ b/site/content/in-dev/1.0.0/getting-started/install-dependencies.md @@ -0,0 +1,118 @@ +--- +# +# Licensed to the Apache Software Foundation (ASF) under one +# or more contributor license agreements. See the NOTICE file +# distributed with this work for additional information +# regarding copyright ownership. The ASF licenses this file +# to you under the Apache License, Version 2.0 (the +# "License"); you may not use this file except in compliance +# with the License. You may obtain a copy of the License at +# +# http://www.apache.org/licenses/LICENSE-2.0 +# +# Unless required by applicable law or agreed to in writing, +# software distributed under the License is distributed on an +# "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY +# KIND, either express or implied. See the License for the +# specific language governing permissions and limitations +# under the License. +# +Title: Installing Dependencies +type: docs +weight: 100 +--- + +This guide serves as an introduction to several key entities that can be managed with Apache Polaris (Incubating), describes how to build and deploy Polaris locally, and finally includes examples of how to use Polaris with Apache Spark™. + +# Prerequisites + +This guide covers building Polaris, deploying it locally or via [Docker](https://www.docker.com/), and interacting with it using the command-line interface and [Apache Spark](https://spark.apache.org/). Before proceeding with Polaris, be sure to satisfy the relevant prerequisites listed here. + +## Git + +To get the latest Polaris code, you'll need to clone the repository using [git](https://git-scm.com/). You can install git using [homebrew](https://brew.sh/) on MacOS: + +```shell +brew install git +``` + +Please follow instructions from the [Git Documentation](https://git-scm.com/book/en/v2/Getting-Started-Installing-Git) for instructions on installing Git on other platforms. + +Then, use git to clone the Polaris repo: + +```shell +cd ~ +git clone https://github.com/apache/polaris.git +``` + +## Docker + +It is recommended to deploy Polaris inside [Docker](https://www.docker.com/) for the Quickstart workflow. Instructions for deploying the Quickstart workflow on the supported Cloud Providers (AWS, Azure, GCP) will be provided only with Docker. However, non-Docker deployment instructions for local deployments can also be followed on Cloud Providers. + +Instructions to install Docker can be found on the [Docker website](https://docs.docker.com/engine/install/). Ensure that Docker and the Docker Compose plugin are both installed. + +### Docker on MacOS +Docker can be installed using [homebrew](https://brew.sh/): + +```shell +brew install --cask docker +``` + +There could be a [Docker permission issues](https://github.com/apache/polaris/pull/971) related to seccomp configuration. To resolve these issues, set the `seccomp` profile to "unconfined" when running a container. For example: + +```shell +docker run --security-opt seccomp=unconfined apache/polaris:latest +``` + +Note: Setting the seccomp profile to "unconfined" disables the default system call filtering, which may pose security risks. Use this configuration with caution, especially in production environments. + +### Docker on Amazon Linux +Docker can be installed using a modification to the CentOS instructions. For example: + +```shell +sudo dnf update -y +# Remove old version +sudo dnf remove -y docker docker-client docker-client-latest docker-common docker-latest docker-latest-logrotate docker-logrotate docker-engine +# Install dnf plugin +sudo dnf -y install dnf-plugins-core +# Add CentOS repository +sudo dnf config-manager --add-repo https://download.docker.com/linux/centos/docker-ce.repo +# Adjust release server version in the path as it will not match with Amazon Linux 2023 +sudo sed -i 's/$releasever/9/g' /etc/yum.repos.d/docker-ce.repo +# Install as usual +sudo dnf -y install docker-ce docker-ce-cli containerd.io docker-buildx-plugin docker-compose-plugin +``` + +### Confirm Docker Installation + +Once installed, make sure that both Docker and the Docker Compose plugin are installed: + +```shell +docker version +docker compose version +``` + +Also make sure Docker is running and is able to run a sample Docker container: + +```shell +docker run hello-world +``` + +## Java + +If you plan to build Polaris from source yourself or using this tutorial's instructions on a Cloud Provider, you will need to satisfy a few prerequisites first. + +Polaris is built using [gradle](https://gradle.org/) and is compatible with Java 21. We recommend the use of [jenv](https://www.jenv.be/) to manage multiple Java versions. For example, to install Java 21 via [homebrew](https://brew.sh/) and configure it with jenv: + +```shell +cd ~/polaris +brew install openjdk@21 jenv +jenv add $(brew --prefix openjdk@21) +jenv local 21 +``` + +Ensure that `java --version` and `javac` both return non-zero responses. + +## jq + +Most Polaris Quickstart scripts require `jq`. Follow the instructions from the [jq](https://jqlang.org/download/) website to download this tool. \ No newline at end of file diff --git a/site/content/in-dev/1.0.0/getting-started/quickstart.md b/site/content/in-dev/1.0.0/getting-started/quickstart.md new file mode 100644 index 0000000000..a9fd43f906 --- /dev/null +++ b/site/content/in-dev/1.0.0/getting-started/quickstart.md @@ -0,0 +1,116 @@ +--- +# +# Licensed to the Apache Software Foundation (ASF) under one +# or more contributor license agreements. See the NOTICE file +# distributed with this work for additional information +# regarding copyright ownership. The ASF licenses this file +# to you under the Apache License, Version 2.0 (the +# "License"); you may not use this file except in compliance +# with the License. You may obtain a copy of the License at +# +# http://www.apache.org/licenses/LICENSE-2.0 +# +# Unless required by applicable law or agreed to in writing, +# software distributed under the License is distributed on an +# "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY +# KIND, either express or implied. See the License for the +# specific language governing permissions and limitations +# under the License. +# +Title: Quickstart +type: docs +weight: 200 +--- + +Polaris can be deployed via a docker image or as a standalone process. Before starting, be sure that you've satisfied the relevant prerequisites detailed in the previous page. + +## Common Setup +Before running Polaris, ensure you have completed the following setup steps: + +1. **Build Polaris** +```shell +cd ~/polaris +./gradlew \ + :polaris-server:assemble \ + :polaris-server:quarkusAppPartsBuild \ + :polaris-admin:assemble --rerun \ + -Dquarkus.container-image.tag=postgres-latest \ + -Dquarkus.container-image.build=true +``` +- **For standalone**: Omit the `-Dquarkus.container-image.tag` and `-Dquarkus.container-image.build` options if you do not need to build a Docker image. + +## Running Polaris with Docker + +To start using Polaris in Docker and launch Polaris, which is packaged with a Postgres instance, Apache Spark, and Trino. + +```shell +export ASSETS_PATH=$(pwd)/getting-started/assets/ +export QUARKUS_DATASOURCE_JDBC_URL=jdbc:postgresql://postgres:5432/POLARIS +export QUARKUS_DATASOURCE_USERNAME=postgres +export QUARKUS_DATASOURCE_PASSWORD=postgres +export CLIENT_ID=root +export CLIENT_SECRET=s3cr3t +docker compose -p polaris -f getting-started/assets/postgres/docker-compose-postgres.yml \ + -f getting-started/jdbc/docker-compose-bootstrap-db.yml \ + -f getting-started/jdbc/docker-compose.yml up -d +``` + +You should see output for some time as Polaris, Spark, and Trino build and start up. Eventually, you won’t see any more logs and see some logs relating to Spark, resembling the following: + +``` +spark-sql-1 | Spark Web UI available at http://8bc4de8ed854:4040 +spark-sql-1 | Spark master: local[*], Application Id: local-1743745174604 +spark-sql-1 | 25/04/04 05:39:38 WARN SparkSQLCLIDriver: WARNING: Directory for Hive history file: /home/spark does not exist. History will not be available during this session. +spark-sql-1 | 25/04/04 05:39:39 WARN RESTSessionCatalog: Iceberg REST client is missing the OAuth2 server URI configuration and defaults to http://polaris:8181/api/catalogv1/oauth/tokens. This automatic fallback will be removed in a future Iceberg release.It is recommended to configure the OAuth2 endpoint using the 'oauth2-server-uri' property to be prepared. This warning will disappear if the OAuth2 endpoint is explicitly configured. See https://github.com/apache/iceberg/issues/10537 +``` + +The Docker image pre-configures a sample catalog called `quickstart_catalog` that uses a local file system. + +## Running Polaris as a Standalone Process + +You can also start Polaris through Gradle (packaged within the Polaris repository): + +1. **Start the Server** + +Run the following command to start Polaris: + +```shell +./gradlew run +``` + +You should see output for some time as Polaris builds and starts up. Eventually, you won’t see any more logs and should see messages that resemble the following: + +``` +INFO [io.quarkus] [,] [,,,] (Quarkus Main Thread) polaris-runtime-service on JVM (powered by Quarkus ) started in 2.656s. Listening on: http://localhost:8181. Management interface listening on http://0.0.0.0:8182. +INFO [io.quarkus] [,] [,,,] (Quarkus Main Thread) Profile prod activated. Live Coding activated. +INFO [io.quarkus] [,] [,,,] (Quarkus Main Thread) Installed features: [...] +``` + +At this point, Polaris is running. + +When using a Gradle-launched Polaris instance in this tutorial, we'll launch an instance of Polaris that stores entities only in-memory. This means that any entities that you define will be destroyed when Polaris is shut down. +For more information on how to configure Polaris for production usage, see the [docs]({{% relref "../configuring-polaris-for-production" %}}). + +When Polaris is run using the `./gradlew run` command, the root principal credentials are `root` and `secret` for the `CLIENT_ID` and `CLIENT_SECRET`, respectively. + +### Installing Apache Spark and Trino Locally for Testing + +#### Apache Spark + +If you want to connect to Polaris with [Apache Spark](https://spark.apache.org/), you'll need to start by cloning Spark. As in the [prerequisites]({{% ref "install-dependencies#git" %}}), make sure [git](https://git-scm.com/) is installed first. + +Then, clone Spark and check out a versioned branch. This guide uses [Spark 3.5](https://spark.apache.org/releases/spark-release-3-5-0.html). + +```shell +git clone --branch branch-3.5 https://github.com/apache/spark.git ~/spark +``` + +#### Trino +If you want to connect to Polaris with [Trino](https://trino.io/), it is recommended to set up a test instance of Trino using Docker. As in the [prerequisites]({{% ref "install-dependencies#docker" %}}), make sure [Docker](https://www.docker.com/) is installed first + +```shell +docker run --name trino -d -p 8080:8080 trinodb/trino +``` + +## Next Steps +Congrats, you now have a running instance of Polaris! For further information regarding how to use Polaris, check out the [Using Polaris]({{% ref "using-polaris" %}}) page. \ No newline at end of file diff --git a/site/content/in-dev/1.0.0/getting-started/using-polaris.md b/site/content/in-dev/1.0.0/getting-started/using-polaris.md new file mode 100644 index 0000000000..35f0bae336 --- /dev/null +++ b/site/content/in-dev/1.0.0/getting-started/using-polaris.md @@ -0,0 +1,315 @@ +--- +# +# Licensed to the Apache Software Foundation (ASF) under one +# or more contributor license agreements. See the NOTICE file +# distributed with this work for additional information +# regarding copyright ownership. The ASF licenses this file +# to you under the Apache License, Version 2.0 (the +# "License"); you may not use this file except in compliance +# with the License. You may obtain a copy of the License at +# +# http://www.apache.org/licenses/LICENSE-2.0 +# +# Unless required by applicable law or agreed to in writing, +# software distributed under the License is distributed on an +# "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY +# KIND, either express or implied. See the License for the +# specific language governing permissions and limitations +# under the License. +# +Title: Using Polaris +type: docs +weight: 400 +--- + +## Setup + +Ensure your `CLIENT_ID` & `CLIENT_SECRET` variables are already defined, as they were required for starting the Polaris server earlier. + +```shell +export CLIENT_ID=YOUR_CLIENT_ID +export CLIENT_SECRET=YOUR_CLIENT_SECRET +``` + +## Defining a Catalog + +In Polaris, the [catalog]({{% relref "../entities#catalog" %}}) is the top-level entity that objects like [tables]({{% relref "../entities#table" %}}) and [views]({{% relref "../entities#view" %}}) are organized under. With a Polaris service running, you can create a catalog like so: + +```shell +cd ~/polaris + +./polaris \ + --client-id ${CLIENT_ID} \ + --client-secret ${CLIENT_SECRET} \ + catalogs \ + create \ + --storage-type s3 \ + --default-base-location ${DEFAULT_BASE_LOCATION} \ + --role-arn ${ROLE_ARN} \ + quickstart_catalog +``` + +This will create a new catalog called **quickstart_catalog**. If you are using one of the Getting Started locally-built Docker images, we have already created a catalog named `quickstart_catalog` for you. + +The `DEFAULT_BASE_LOCATION` you provide will be the default location that objects in this catalog should be stored in, and the `ROLE_ARN` you provide should be a [Role ARN](https://docs.aws.amazon.com/IAM/latest/UserGuide/reference-arns.html) with access to read and write data in that location. These credentials will be provided to engines reading data from the catalog once they have authenticated with Polaris using credentials that have access to those resources. + +If you’re using a storage type other than S3, such as Azure, you’ll provide a different type of credential than a Role ARN. For more details on supported storage types, see the [docs]({{% relref "../entities#storage-type" %}}). + +Additionally, if Polaris is running somewhere other than `localhost:8181`, you can specify the correct hostname and port by providing `--host` and `--port` flags. For the full set of options supported by the CLI, please refer to the [docs]({{% relref "../command-line-interface" %}}). + + +### Creating a Principal and Assigning it Privileges + +With a catalog created, we can create a [principal]({{% relref "../entities#principal" %}}) that has access to manage that catalog. For details on how to configure the Polaris CLI, see [the section above](#defining-a-catalog) or refer to the [docs]({{% relref "../command-line-interface" %}}). + +```shell +./polaris \ + --client-id ${CLIENT_ID} \ + --client-secret ${CLIENT_SECRET} \ + principals \ + create \ + quickstart_user + +./polaris \ + --client-id ${CLIENT_ID} \ + --client-secret ${CLIENT_SECRET} \ + principal-roles \ + create \ + quickstart_user_role + +./polaris \ + --client-id ${CLIENT_ID} \ + --client-secret ${CLIENT_SECRET} \ + catalog-roles \ + create \ + --catalog quickstart_catalog \ + quickstart_catalog_role +``` + +Be sure to provide the necessary credentials, hostname, and port as before. + +When the `principals create` command completes successfully, it will return the credentials for this new principal. Export them for future use. For example: + +```shell +./polaris ... principals create example +{"clientId": "XXXX", "clientSecret": "YYYY"} +export USER_CLIENT_ID=XXXX +export USER_CLIENT_SECRET=YYYY +``` + +Now, we grant the principal the [principal role]({{% relref "../entities#principal-role" %}}) we created, and grant the [catalog role]({{% relref "../entities#catalog-role" %}}) the principal role we created. For more information on these entities, please refer to the linked documentation. + +```shell +./polaris \ + --client-id ${CLIENT_ID} \ + --client-secret ${CLIENT_SECRET} \ + principal-roles \ + grant \ + --principal quickstart_user \ + quickstart_user_role + +./polaris \ + --client-id ${CLIENT_ID} \ + --client-secret ${CLIENT_SECRET} \ + catalog-roles \ + grant \ + --catalog quickstart_catalog \ + --principal-role quickstart_user_role \ + quickstart_catalog_role +``` + +Now, we’ve linked our principal to the catalog via roles like so: + +![Principal to Catalog](/img/quickstart/privilege-illustration-1.png "Principal to Catalog") + +In order to give this principal the ability to interact with the catalog, we must assign some [privileges]({{% relref "../entities#privilege" %}}). For the time being, we will give this principal the ability to fully manage content in our new catalog. We can do this with the CLI like so: + +```shell +./polaris \ + --client-id ${CLIENT_ID} \ + --client-secret ${CLIENT_SECRET} \ + privileges \ + catalog \ + grant \ + --catalog quickstart_catalog \ + --catalog-role quickstart_catalog_role \ + CATALOG_MANAGE_CONTENT +``` + +This grants the [catalog privileges]({{% relref "../entities#privilege" %}}) `CATALOG_MANAGE_CONTENT` to our catalog role, linking everything together like so: + +![Principal to Catalog with Catalog Role](/img/quickstart/privilege-illustration-2.png "Principal to Catalog with Catalog Role") + +`CATALOG_MANAGE_CONTENT` has create/list/read/write privileges on all entities within the catalog. The same privilege could be granted to a namespace, in which case the principal could create/list/read/write any entity under that namespace. + +## Using Iceberg & Polaris + +At this point, we’ve created a principal and granted it the ability to manage a catalog. We can now use an external engine to assume that principal, access our catalog, and store data in that catalog using [Apache Iceberg](https://iceberg.apache.org/). Polaris is compatible with any [Apache Iceberg](https://iceberg.apache.org/) client that supports the REST API. Depending on the client you plan to use, refer to the respective examples below. + +### Connecting with Spark + +#### Using a Local Build of Spark + +To use a Polaris-managed catalog in [Apache Spark](https://spark.apache.org/), we can configure Spark to use the Iceberg catalog REST API. + +This guide uses [Apache Spark 3.5](https://spark.apache.org/releases/spark-release-3-5-0.html), but be sure to find [the appropriate iceberg-spark package for your Spark version](https://mvnrepository.com/artifact/org.apache.iceberg/iceberg-spark). From a local Spark clone on the `branch-3.5` branch we can run the following: + +_Note: the credentials provided here are those for our principal, not the root credentials._ + +```shell +bin/spark-sql \ +--packages org.apache.iceberg:iceberg-spark-runtime-3.5_2.12:1.9.0,org.apache.iceberg:iceberg-aws-bundle:1.9.0 \ +--conf spark.sql.extensions=org.apache.iceberg.spark.extensions.IcebergSparkSessionExtensions \ +--conf spark.sql.catalog.quickstart_catalog.warehouse=quickstart_catalog \ +--conf spark.sql.catalog.quickstart_catalog.header.X-Iceberg-Access-Delegation=vended-credentials \ +--conf spark.sql.catalog.quickstart_catalog=org.apache.iceberg.spark.SparkCatalog \ +--conf spark.sql.catalog.quickstart_catalog.catalog-impl=org.apache.iceberg.rest.RESTCatalog \ +--conf spark.sql.catalog.quickstart_catalog.uri=http://localhost:8181/api/catalog \ +--conf spark.sql.catalog.quickstart_catalog.credential='${USER_CLIENT_ID}:${USER_CLIENT_SECRET}' \ +--conf spark.sql.catalog.quickstart_catalog.scope='PRINCIPAL_ROLE:ALL' \ +--conf spark.sql.catalog.quickstart_catalog.token-refresh-enabled=true \ +--conf spark.sql.catalog.quickstart_catalog.client.region=us-west-2 +``` + +Similar to the CLI commands above, this configures Spark to use the Polaris running at `localhost:8181`. If your Polaris server is running elsewhere, but sure to update the configuration appropriately. + +Finally, note that we include the `iceberg-aws-bundle` package here. If your table is using a different filesystem, be sure to include the appropriate dependency. + +#### Using Spark SQL from a Docker container + +Refresh the Docker container with the user's credentials: +```shell +docker compose -p polaris -f getting-started/eclipselink/docker-compose.yml stop spark-sql +docker compose -p polaris -f getting-started/eclipselink/docker-compose.yml rm -f spark-sql +docker compose -p polaris -f getting-started/eclipselink/docker-compose.yml up -d --no-deps spark-sql +``` + +Attach to the running spark-sql container: + +```shell +docker attach $(docker ps -q --filter name=spark-sql) +``` + +#### Sample Commands + +Once the Spark session starts, we can create a namespace and table within the catalog: + +```sql +USE quickstart_catalog; +CREATE NAMESPACE IF NOT EXISTS quickstart_namespace; +CREATE NAMESPACE IF NOT EXISTS quickstart_namespace.schema; +USE NAMESPACE quickstart_namespace.schema; +CREATE TABLE IF NOT EXISTS quickstart_table (id BIGINT, data STRING) USING ICEBERG; +``` + +We can now use this table like any other: + +``` +INSERT INTO quickstart_table VALUES (1, 'some data'); +SELECT * FROM quickstart_table; +. . . ++---+---------+ +|id |data | ++---+---------+ +|1 |some data| ++---+---------+ +``` + +If at any time access is revoked... + +```shell +./polaris \ + --client-id ${CLIENT_ID} \ + --client-secret ${CLIENT_SECRET} \ + privileges \ + catalog \ + revoke \ + --catalog quickstart_catalog \ + --catalog-role quickstart_catalog_role \ + CATALOG_MANAGE_CONTENT +``` + +Spark will lose access to the table: + +``` +INSERT INTO quickstart_table VALUES (1, 'some data'); + +org.apache.iceberg.exceptions.ForbiddenException: Forbidden: Principal 'quickstart_user' with activated PrincipalRoles '[]' and activated grants via '[quickstart_catalog_role, quickstart_user_role]' is not authorized for op LOAD_TABLE_WITH_READ_DELEGATION +``` + +### Connecting with Trino + +Refresh the Docker container with the user's credentials: + +```shell +docker compose -p polaris -f getting-started/eclipselink/docker-compose.yml stop trino +docker compose -p polaris -f getting-started/eclipselink/docker-compose.yml rm -f trino +docker compose -p polaris -f getting-started/eclipselink/docker-compose.yml up -d --no-deps trino +``` + +Attach to the running Trino container: + +```shell +docker exec -it $(docker ps -q --filter name=trino) trino +``` + +You may not see Trino's prompt immediately, type ENTER to see it. A few commands that you can try: + +```sql +SHOW CATALOGS; +SHOW SCHEMAS FROM iceberg; +CREATE SCHEMA iceberg.quickstart_schema; +CREATE TABLE iceberg.quickstart_schema.quickstart_table AS SELECT 1 x; +SELECT * FROM iceberg.quickstart_schema.quickstart_table; +``` + +If at any time access is revoked... + +```shell +./polaris \ + --client-id ${CLIENT_ID} \ + --client-secret ${CLIENT_SECRET} \ + privileges \ + catalog \ + revoke \ + --catalog quickstart_catalog \ + --catalog-role quickstart_catalog_role \ + CATALOG_MANAGE_CONTENT +``` + +Trino will lose access to the table: + +```sql +SELECT * FROM iceberg.quickstart_schema.quickstart_table; + +org.apache.iceberg.exceptions.ForbiddenException: Forbidden: Principal 'quickstart_user' with activated PrincipalRoles '[]' and activated grants via '[quickstart_catalog_role, quickstart_user_role]' is not authorized for op LOAD_TABLE_WITH_READ_DELEGATION +``` + +### Connecting Using REST APIs + +To access Polaris from the host machine, first request an access token: + +```shell +export POLARIS_TOKEN=$(curl -s http://polaris:8181/api/catalog/v1/oauth/tokens \ + --resolve polaris:8181:127.0.0.1 \ + --user ${CLIENT_ID}:${CLIENT_SECRET} \ + -d 'grant_type=client_credentials' \ + -d 'scope=PRINCIPAL_ROLE:ALL' | jq -r .access_token) +``` + +Then, use the access token in the Authorization header when accessing Polaris: + +```shell +curl -v http://127.0.0.1:8181/api/management/v1/principal-roles -H "Authorization: Bearer $POLARIS_TOKEN" +curl -v http://127.0.0.1:8181/api/management/v1/catalogs/quickstart_catalog -H "Authorization: Bearer $POLARIS_TOKEN" +``` + +## Next Steps +* Visit [Configuring Polaris for Production]({{% relref "../configuring-polaris-for-production" %}}). +* A Getting Started experience for using Spark with Jupyter Notebooks is documented [here](https://github.com/apache/polaris/blob/main/getting-started/spark/README.md). +* To shut down a locally-deployed Polaris server and clean up all related Docker containers, run the command listed below. Cloud Deployments have their respective termination commands on their Deployment page, while Polaris running on Gradle will terminate when the Gradle process terminates. +```shell +docker compose -p polaris -f getting-started/assets/postgres/docker-compose-postgres.yml -f getting-started/jdbc/docker-compose-bootstrap-db.yml -f getting-started/jdbc/docker-compose.yml down +``` + + diff --git a/site/content/in-dev/1.0.0/metastores.md b/site/content/in-dev/1.0.0/metastores.md new file mode 100644 index 0000000000..4810b124a0 --- /dev/null +++ b/site/content/in-dev/1.0.0/metastores.md @@ -0,0 +1,151 @@ +--- +# +# Licensed to the Apache Software Foundation (ASF) under one +# or more contributor license agreements. See the NOTICE file +# distributed with this work for additional information +# regarding copyright ownership. The ASF licenses this file +# to you under the Apache License, Version 2.0 (the +# "License"); you may not use this file except in compliance +# with the License. You may obtain a copy of the License at +# +# http://www.apache.org/licenses/LICENSE-2.0 +# +# Unless required by applicable law or agreed to in writing, +# software distributed under the License is distributed on an +# "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY +# KIND, either express or implied. See the License for the +# specific language governing permissions and limitations +# under the License. +# +title: Metastores +type: docs +weight: 700 +--- + +This page explains how to configure and use Polaris metastores with either the recommended Relational JDBC or the +deprecated EclipseLink persistence backends. + +## Relational JDBC +This implementation leverages Quarkus for datasource management and supports configuration through +environment variables or JVM -D flags at startup. For more information, refer to the [Quarkus configuration reference](https://quarkus.io/guides/config-reference#env-file). + + +``` +POLARIS_PERSISTENCE_TYPE=relational-jdbc + +QUARKUS_DATASOURCE_DB_KIND=postgresql +QUARKUS_DATASOURCE_USERNAME= +QUARKUS_DATASOURCE_PASSWORD= +QUARKUS_DATASOURCE_JDBC_URL= +``` + +The Relational JDBC metastore currently relies on a Quarkus-managed datasource and supports only PostgresSQL and H2 databases. This limitation is similar to that of EclipseLink, primarily due to underlying schema differences. At this time, official documentation is provided exclusively for usage with PostgreSQL. +Please refer to the documentation here: +[Configure data sources in Quarkus](https://quarkus.io/guides/datasource) + +Additionally the retries can be configured via `polaris.persistence.relational.jdbc.*` properties please ref [configuration](./configuration.md) + +## EclipseLink (Deprecated) +> [!IMPORTANT] Eclipse link is deprecated, its recommend to use Relational JDBC as persistence instead. + +Polaris includes EclipseLink plugin by default with PostgresSQL driver. + +Configure the `polaris.persistence` section in your Polaris configuration file +(`application.properties`) as follows: + +``` +polaris.persistence.type=eclipse-link +polaris.persistence.eclipselink.configuration-file=/path/to/persistence.xml +polaris.persistence.eclipselink.persistence-unit=polaris +``` + +Alternatively, configuration can also be done with environment variables or system properties. Refer +to the [Quarkus Configuration Reference] for more information. + +The `configuration-file` option must point to an [EclipseLink configuration file]. This file, named +`persistence.xml`, is used to set up the database connection properties, which can differ depending +on the type of database and its configuration. + +> Note: You have to locate the `persistence.xml` at least two folders down to the root folder, e.g. `/deployments/config/persistence.xml` is OK, whereas `/deployments/persistence.xml` will cause an infinity loop. +[Quarkus Configuration Reference]: https://quarkus.io/guides/config-reference +[EclipseLink configuration file]: https://eclipse.dev/eclipselink/documentation/4.0/solutions/solutions.html#TESTINGJPA002 + +Polaris creates and connects to a separate database for each realm. Specifically, the `{realm}` placeholder in `jakarta.persistence.jdbc.url` is substituted with the actual realm name, allowing the Polaris server to connect to different databases based on the realm. + +> Note: some database systems such as Postgres don't create databases automatically. Database admins need to create them manually before running Polaris server. + +A single `persistence.xml` can describe multiple [persistence units](https://eclipse.dev/eclipselink/documentation/4.0/concepts/concepts.html#APPDEV001). For example, with both a `polaris-dev` and `polaris` persistence unit defined, you could use a single `persistence.xml` to easily switch between development and production databases. Use the `persistence-unit` option in the Polaris server configuration to easily switch between persistence units. + +### Using H2 + +> [!IMPORTANT] H2 is an in-memory database and is not suitable for production! + +The default [persistence.xml] in Polaris is already configured for H2, but you can easily customize +your H2 configuration using the persistence unit template below: + +[persistence.xml]: https://github.com/apache/polaris/blob/main/extension/persistence/eclipselink/src/main/resources/META-INF/persistence.xml + +```xml + + org.eclipse.persistence.jpa.PersistenceProvider + org.apache.polaris.extension.persistence.impl.eclipselink.models.ModelEntity + org.apache.polaris.extension.persistence.impl.eclipselink.models.ModelEntityActive + org.apache.polaris.extension.persistence.impl.eclipselink.models.ModelEntityChangeTracking + org.apache.polaris.extension.persistence.impl.eclipselink.models.ModelEntityDropped + org.apache.polaris.extension.persistence.impl.eclipselink.models.ModelGrantRecord + org.apache.polaris.extension.persistence.impl.eclipselink.models.ModelPrincipalSecrets + org.apache.polaris.extension.persistence.impl.eclipselink.models.ModelSequenceId + NONE + + + + + + + +``` + +To build Polaris with the necessary H2 dependency and start the Polaris service, run the following: + +```shell +./gradlew \ + :polaris-server:assemble \ + :polaris-server:quarkusAppPartsBuild --rerun \ + -PeclipseLinkDeps=com.h2database:h2:2.3.232 +java -Dpolaris.persistence.type=eclipse-link \ + -Dpolaris.persistence.eclipselink.configuration-file=/path/to/persistence.xml \ + -Dpolaris.persistence.eclipselink.persistence-unit=polaris \ + -jar runtime/server/build/quarkus-app/quarkus-run.jar +``` + +### Using Postgres + +PostgreSQL is included by default in the Polaris server distribution. + +The following shows a sample configuration for integrating Polaris with Postgres. + +```xml + + org.eclipse.persistence.jpa.PersistenceProvider + org.apache.polaris.extension.persistence.impl.eclipselink.models.ModelEntity + org.apache.polaris.extension.persistence.impl.eclipselink.models.ModelEntityActive + org.apache.polaris.extension.persistence.impl.eclipselink.models.ModelEntityChangeTracking + org.apache.polaris.extension.persistence.impl.eclipselink.models.ModelEntityDropped + org.apache.polaris.extension.persistence.impl.eclipselink.models.ModelGrantRecord + org.apache.polaris.extension.persistence.impl.eclipselink.models.ModelPrincipalSecrets + org.apache.polaris.extension.persistence.impl.eclipselink.models.ModelSequenceId + NONE + + + + + + + + + + +``` + diff --git a/site/content/in-dev/1.0.0/polaris-catalog-service.md b/site/content/in-dev/1.0.0/polaris-catalog-service.md new file mode 100644 index 0000000000..02fed63f46 --- /dev/null +++ b/site/content/in-dev/1.0.0/polaris-catalog-service.md @@ -0,0 +1,26 @@ +--- +# +# Licensed to the Apache Software Foundation (ASF) under one +# or more contributor license agreements. See the NOTICE file +# distributed with this work for additional information +# regarding copyright ownership. The ASF licenses this file +# to you under the Apache License, Version 2.0 (the +# "License"); you may not use this file except in compliance +# with the License. You may obtain a copy of the License at +# +# http://www.apache.org/licenses/LICENSE-2.0 +# +# Unless required by applicable law or agreed to in writing, +# software distributed under the License is distributed on an +# "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY +# KIND, either express or implied. See the License for the +# specific language governing permissions and limitations +# under the License. +# +linkTitle: 'Catalog API Spec' +weight: 900 +params: + show_page_toc: false +--- + +{{< redoc-polaris "generated/bundled-polaris-catalog-service.yaml" >}} diff --git a/site/content/in-dev/1.0.0/polaris-management-service.md b/site/content/in-dev/1.0.0/polaris-management-service.md new file mode 100644 index 0000000000..0b66b9daa4 --- /dev/null +++ b/site/content/in-dev/1.0.0/polaris-management-service.md @@ -0,0 +1,27 @@ +--- +# +# Licensed to the Apache Software Foundation (ASF) under one +# or more contributor license agreements. See the NOTICE file +# distributed with this work for additional information +# regarding copyright ownership. The ASF licenses this file +# to you under the Apache License, Version 2.0 (the +# "License"); you may not use this file except in compliance +# with the License. You may obtain a copy of the License at +# +# http://www.apache.org/licenses/LICENSE-2.0 +# +# Unless required by applicable law or agreed to in writing, +# software distributed under the License is distributed on an +# "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY +# KIND, either express or implied. See the License for the +# specific language governing permissions and limitations +# under the License. +# +title: 'Apache Polaris Management Service OpenAPI' +linkTitle: 'Management OpenAPI' +weight: 800 +params: + show_page_toc: false +--- + +{{< redoc-polaris "polaris-management-service.yml" >}} diff --git a/site/content/in-dev/1.0.0/polaris-spark-client.md b/site/content/in-dev/1.0.0/polaris-spark-client.md new file mode 100644 index 0000000000..a34bceeced --- /dev/null +++ b/site/content/in-dev/1.0.0/polaris-spark-client.md @@ -0,0 +1,141 @@ +--- +# +# Licensed to the Apache Software Foundation (ASF) under one +# or more contributor license agreements. See the NOTICE file +# distributed with this work for additional information +# regarding copyright ownership. The ASF licenses this file +# to you under the Apache License, Version 2.0 (the +# "License"); you may not use this file except in compliance +# with the License. You may obtain a copy of the License at +# +# http://www.apache.org/licenses/LICENSE-2.0 +# +# Unless required by applicable law or agreed to in writing, +# software distributed under the License is distributed on an +# "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY +# KIND, either express or implied. See the License for the +# specific language governing permissions and limitations +# under the License. +# +Title: Polaris Spark Client +type: docs +weight: 650 +--- + +Apache Polaris now provides Catalog support for Generic Tables (non-Iceberg tables), please check out +the [Catalog API Spec]({{% ref "polaris-catalog-service" %}}) for Generic Table API specs. + +Along with the Generic Table Catalog support, Polaris is also releasing a Spark client, which helps to +provide an end-to-end solution for Apache Spark to manage Delta tables using Polaris. + +Note the Polaris Spark client is able to handle both Iceberg and Delta tables, not just Delta. + +This page documents how to connect Spark with Polaris Service using the Polaris Spark client. + +## Quick Start with Local Polaris service +If you want to quickly try out the functionality with a local Polaris service, simply check out the Polaris repo +and follow the instructions in the Spark plugin getting-started +[README](https://github.com/apache/polaris/blob/main/plugins/spark/v3.5/getting-started/README.md). + +Check out the Polaris repo: +```shell +cd ~ +git clone https://github.com/apache/polaris.git +``` + +## Start Spark against a deployed Polaris service +Before starting, ensure that the deployed Polaris service supports Generic Tables, and that Spark 3.5(version 3.5.3 or later is installed). +Spark 3.5.5 is recommended, and you can follow the instructions below to get a Spark 3.5.5 distribution. +```shell +cd ~ +wget https://archive.apache.org/dist/spark/spark-3.5.5/spark-3.5.5-bin-hadoop3.tgz +mkdir spark-3.5 +tar xzvf spark-3.5.5-bin-hadoop3.tgz -C spark-3.5 --strip-components=1 +cd spark-3.5 +``` + +### Connecting with Spark using the Polaris Spark client +The following CLI command can be used to start the Spark with connection to the deployed Polaris service using +a released Polaris Spark client. + +```shell +bin/spark-shell \ +--packages ,org.apache.iceberg:iceberg-aws-bundle:1.9.0,io.delta:delta-spark_2.12:3.3.1 \ +--conf spark.sql.extensions=org.apache.iceberg.spark.extensions.IcebergSparkSessionExtensions,io.delta.sql.DeltaSparkSessionExtension \ +--conf spark.sql.catalog.spark_catalog=org.apache.spark.sql.delta.catalog.DeltaCatalog \ +--conf spark.sql.catalog..warehouse= \ +--conf spark.sql.catalog..header.X-Iceberg-Access-Delegation=vended-credentials \ +--conf spark.sql.catalog.=org.apache.polaris.spark.SparkCatalog \ +--conf spark.sql.catalog..uri= \ +--conf spark.sql.catalog..credential=':' \ +--conf spark.sql.catalog..scope='PRINCIPAL_ROLE:ALL' \ +--conf spark.sql.catalog..token-refresh-enabled=true +``` +Assume the released Polaris Spark client you want to use is `org.apache.polaris:polaris-spark-3.5_2.12:1.0.0`, +replace the `polaris-spark-client-package` field with the release. + +The `spark-catalog-name` is the catalog name you will use with Spark, and `polaris-catalog-name` is the catalog name used +by Polaris service, for simplicity, you can use the same name. + +Replace the `polaris-service-uri` with the uri of the deployed Polaris service. For example, with a locally deployed +Polaris service, the uri would be `http://localhost:8181/api/catalog`. + +For `client-id` and `client-secret` values, you can refer to [Using Polaris]({{% ref "getting-started/using-polaris" %}}) +for more details. + +You can also start the connection by programmatically initialize a SparkSession, following is an example with PySpark: +```python +from pyspark.sql import SparkSession + +spark = SparkSession.builder + .config("spark.jars.packages", ",org.apache.iceberg:iceberg-aws-bundle:1.9.0,io.delta:delta-spark_2.12:3.3.1") + .config("spark.sql.catalog.spark_catalog", "org.apache.spark.sql.delta.catalog.DeltaCatalog") + .config("spark.sql.extensions", "org.apache.iceberg.spark.extensions.IcebergSparkSessionExtensions,io.delta.sql.DeltaSparkSessionExtension") + .config("spark.sql.catalog.", "org.apache.polaris.spark.SparkCatalog") + .config("spark.sql.catalog..uri", ) + .config("spark.sql.catalog..token-refresh-enabled", "true") + .config("spark.sql.catalog..credential", ":") + .config("spark.sql.catalog..warehouse", ) + .config("spark.sql.catalog.polaris.scope", 'PRINCIPAL_ROLE:ALL') + .config("spark.sql.catalog.polaris.header.X-Iceberg-Access-Delegation", 'vended-credentials') + .getOrCreate() +``` +Similar as the CLI command, make sure the corresponding fields are replaced correctly. + +### Create tables with Spark +After Spark is started, you can use it to create and access Iceberg and Delta tables, for example: +```python +spark.sql("USE polaris") +spark.sql("CREATE NAMESPACE IF NOT EXISTS DELTA_NS") +spark.sql("CREATE NAMESPACE IF NOT EXISTS DELTA_NS.PUBLIC") +spark.sql("USE NAMESPACE DELTA_NS.PUBLIC") +spark.sql("""CREATE TABLE IF NOT EXISTS PEOPLE ( + id int, name string) +USING delta LOCATION 'file:///tmp/var/delta_tables/people'; +""") +``` + +## Connecting with Spark using local Polaris Spark client jar +If you would like to use a version of the Spark client that is currently not yet released, you can +build a Spark client jar locally from source. Please check out the Polaris repo and refer to the Spark plugin +[README](https://github.com/apache/polaris/blob/main/plugins/spark/README.md) for detailed instructions. + +## Limitations +The Polaris Spark client has the following functionality limitations: +1) Create table as select (CTAS) is not supported for Delta tables. As a result, the `saveAsTable` method of `Dataframe` + is also not supported, since it relies on the CTAS support. +2) Create a Delta table without explicit location is not supported. +3) Rename a Delta table is not supported. +4) ALTER TABLE ... SET LOCATION is not supported for DELTA table. +5) For other non-Iceberg tables like csv, it is not supported. + +## Iceberg Spark Client compatibility with Polaris Spark Client +The Polaris Spark client today depends on a specific Iceberg client version, and the version dependency is described +in the following table: + +| Spark Client Version | Iceberg Spark Client Version | +|----------------------|------------------------------| +| 1.0.0 | 1.9.0 | + +The Iceberg dependency is automatically downloaded when the Polaris package is downloaded, so there is no need to +add the Iceberg Spark client in the `packages` configuration. diff --git a/site/content/in-dev/1.0.0/policy.md b/site/content/in-dev/1.0.0/policy.md new file mode 100644 index 0000000000..3f49353884 --- /dev/null +++ b/site/content/in-dev/1.0.0/policy.md @@ -0,0 +1,197 @@ +--- +# +# Licensed to the Apache Software Foundation (ASF) under one +# or more contributor license agreements. See the NOTICE file +# distributed with this work for additional information +# regarding copyright ownership. The ASF licenses this file +# to you under the Apache License, Version 2.0 (the +# "License"); you may not use this file except in compliance +# with the License. You may obtain a copy of the License at +# +# http://www.apache.org/licenses/LICENSE-2.0 +# +# Unless required by applicable law or agreed to in writing, +# software distributed under the License is distributed on an +# "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY +# KIND, either express or implied. See the License for the +# specific language governing permissions and limitations +# under the License. +# +title: Policy +type: docs +weight: 425 +--- + +The Polaris Policy framework empowers organizations to centrally define, manage, and enforce fine-grained governance, lifecycle, and operational rules across all data resources in the catalog. + +With the policy API, you can: +- Create and manage policies +- Attach policies to specific resources (catalogs, namespaces, tables, or views) +- Check applicable policies for any given resource + +## What is a Policy? + +A policy in Apache Polaris is a structured entity that defines rules governing actions on specified resources under +predefined conditions. Each policy contains: + +- **Name**: A unique identifier within a namespace +- **Type**: Determines the semantics and expected format of the policy content +- **Description**: Explains the purpose of the policy +- **Content**: Contains the actual rules defining the policy behavior +- **Version**: An automatically tracked revision number +- **Inheritable**: Whether the policy can be inherited by child resources, decided by its type + +### Policy Types + +Polaris supports several predefined system policy types (prefixed with `system.`): + +| Policy Type | Purpose | JSON-Schema | Applies To | +|-------------|-------------------------------------------------------|-------------|------------| +| **`system.data-compaction`** | Defines rules for data file compaction operations | [`data-compaction/2025-02-03.json`](https://polaris.apache.org/schemas/policies/system/data-compaction/2025-02-03.json) | Iceberg **table**, **namespace**, **catalog** | +| **`system.metadata-compaction`** | Defines rules for metadata file compaction operations | [`metadata-compaction/2025-02-03.json`](https://polaris.apache.org/schemas/policies/system/metadata-compaction/2025-02-03.json) | Iceberg **table**, **namespace**, **catalog** | +| **`system.orphan-file-removal`** | Defines rules for removing orphaned files | [`orphan-file-removal/2025-02-03.json`](https://polaris.apache.org/schemas/policies/system/orphan-file-removal/2025-02-03.json) | Iceberg **table**, **namespace**, **catalog** | +| **`system.snapshot-expiry`** | Defines rules for snapshot expiration | [`snapshot-expiry/2025-02-03.json`](https://polaris.apache.org/schemas/policies/system/snapshot-expiry/2025-02-03.json) | Iceberg **table**, **namespace**, **catalog** | + +Support for additional predefined system policy types and custom policy type definitions is in progress. +For more details, please refer to the [roadmap](https://github.com/apache/polaris/discussions/1028). + +### Policy Inheritance + +The entity hierarchy in Polaris is structured as follows: + +``` + Catalog + | + Namespace + | + +-----------+----------+ + | | | +Iceberg Iceberg Generic + Table View Table +``` + +Policies can be attached at any level, and inheritance flows from catalog down to namespace, then to tables and views. + +Policies can be inheritable or non-inheritable: + +- **Inheritable policies**: Apply to the target resource and all its applicable child resources +- **Non-inheritable policies**: Apply only to the specific target resource + +The inheritance follows an override mechanism: +1. Table-level policies override namespace and catalog policies +2. Namespace-level policies override parent namespace and catalog policies + +> **Important:** Because an override completely replaces the same policy type at higher levels, +> **only one instance of a given policy type can be attached to (and therefore affect) a resource**. + +## Working with Policies + +### Creating a Policy + +To create a policy, you need to provide a name, type, and optionally a description and content: + +```json +POST /polaris/v1/{prefix}/namespaces/{namespace}/policies +{ + "name": "compaction-policy", + "type": "system.data-compaction", + "description": "Policy for optimizing table storage", + "content": "{\"version\": \"2025-02-03\", \"enable\": true, \"config\": {\"target_file_size_bytes\": 134217728}}" +} +``` + +The policy content is validated against a schema specific to its type. Here are a few policy content examples: +- Data Compaction Policy +```json +{ + "version": "2025-02-03", + "enable": true, + "config": { + "target_file_size_bytes": 134217728, + "compaction_strategy": "bin-pack", + "max-concurrent-file-group-rewrites": 5 + } +} +``` +- Orphan File Removal Policy +```json +{ + "version": "2025-02-03", + "enable": true, + "max_orphan_file_age_in_days": 30, + "locations": ["s3://my-bucket/my-table-location"], + "config": { + "prefix_mismatch_mode": "ignore" + } +} +``` + +### Attaching Policies to Resources + +Policies can be attached to different resource levels: + +1. **Catalog level**: Applies to the entire catalog +2. **Namespace level**: Applies to a specific namespace +3. **Table-like level**: Applies to individual tables or views + +Example of attaching a policy to a table: + +```json +PUT /polaris/v1/{prefix}/namespaces/{namespace}/policies/{policy-name}/mappings +{ + "target": { + "type": "table-like", + "path": ["NS1", "NS2", "test_table_1"] + } +} +``` + +For inheritable policies, only one policy of a given type can be attached to a resource. For non-inheritable policies, +multiple policies of the same type can be attached. + +### Retrieving Applicable Policies +A user can view applicable policies on a resource (e.g., table, namespace, or catalog) as long as they have +read permission on that resource. + +Here is an example to find all policies that apply to a specific resource (including inherited policies): +``` +GET /polaris/v1/catalog/applicable-policies?namespace=finance%1Fquarterly&target-name=transactions +``` + +**Sample response:** +```json +{ + "policies": [ + { + "name": "snapshot-expiry-policy", + "type": "system.snapshot-expiry", + "appliedAt": "namespace", + "content": { + "version": "2025-02-03", + "enable": true, + "config": { + "min_snapshot_to_keep": 1, + "max_snapshot_age_days": 2, + "max_ref_age_days": 3 + } + } + }, + { + "name": "compaction-policy", + "type": "system.data-compaction", + "appliedAt": "catalog", + "content": { + "version": "2025-02-03", + "enable": true, + "config": { + "target_file_size_bytes": 134217728 + } + } + } + ] +} +``` + +### API Reference + +For the complete and up-to-date API specification, see the [policy-api.yaml](https://github.com/apache/polaris/blob/main/spec/polaris-catalog-apis/policy-apis.yaml). \ No newline at end of file diff --git a/site/content/in-dev/1.0.0/realm.md b/site/content/in-dev/1.0.0/realm.md new file mode 100644 index 0000000000..9da5e7e25b --- /dev/null +++ b/site/content/in-dev/1.0.0/realm.md @@ -0,0 +1,53 @@ +--- +# +# Licensed to the Apache Software Foundation (ASF) under one +# or more contributor license agreements. See the NOTICE file +# distributed with this work for additional information +# regarding copyright ownership. The ASF licenses this file +# to you under the Apache License, Version 2.0 (the +# "License"); you may not use this file except in compliance +# with the License. You may obtain a copy of the License at +# +# http://www.apache.org/licenses/LICENSE-2.0 +# +# Unless required by applicable law or agreed to in writing, +# software distributed under the License is distributed on an +# "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY +# KIND, either express or implied. See the License for the +# specific language governing permissions and limitations +# under the License. +# +Title: Realm +type: docs +weight: 350 +--- + +This page explains what a realm is and what it is used for in Polaris. + +### What is it? + +A realm in Polaris serves as logical partitioning mechanism within the catalog system. This isolation allows for multitenancy, enabling different teams, environments or organizations to operate independently within the same Polaris deployment. + +### Key Characteristics + +**Isolation:** Each realm encapsulates its own set of resources, ensuring that operations, policies in one realm do not affect others. + +**Authentication Context:** When configuring Polaris, principals credentials are associated with a specific realm. This allows for the separation of security concerns across different realms. + +**Configuration Scope:** Realm identifiers are used in various configurations, such as connection strings feature configurations, etc. + +An example of this is: + +`jdbc:postgresql://localhost:5432/{realm} +` +This ensures that each realm's data is stored separately. + +### How is it used in the system? + +**RealmContext:** It is a key concept used to identify and resolve the context in which operations are performed. For example `DefaultRealmContextResolver`, a realm is resolved from request headers, and operations are performed based on the resolved realm identifier. + +**Authentication and Authorization:** For example, in `BasePolarisAuthenticator`, `RealmContext` is used to provide context about the current security domain, which is used to retrieve the correct `PolarisMetastoreManager` that manages all Polaris entities and associated grant records metadata for +authorization. + +**Isolation:** In methods like `createEntityManagerFactory(@Nonnull RealmContext realmContext)` from `PolarisEclipseLinkPersistenceUnit` interface, the realm context influence how resources are created or managed based on the security policies of that realm. +An example of this is the way a realm name can be used to create a database connection url so that you have one database instance per realm, when applicable. Or it can be more granular and applied at primary key level (within the same database instance). \ No newline at end of file diff --git a/site/content/in-dev/1.0.0/telemetry.md b/site/content/in-dev/1.0.0/telemetry.md new file mode 100644 index 0000000000..8df97f505d --- /dev/null +++ b/site/content/in-dev/1.0.0/telemetry.md @@ -0,0 +1,192 @@ +--- +# +# Licensed to the Apache Software Foundation (ASF) under one +# or more contributor license agreements. See the NOTICE file +# distributed with this work for additional information +# regarding copyright ownership. The ASF licenses this file +# to you under the Apache License, Version 2.0 (the +# "License"); you may not use this file except in compliance +# with the License. You may obtain a copy of the License at +# +# http://www.apache.org/licenses/LICENSE-2.0 +# +# Unless required by applicable law or agreed to in writing, +# software distributed under the License is distributed on an +# "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY +# KIND, either express or implied. See the License for the +# specific language governing permissions and limitations +# under the License. +# +title: Telemetry +type: docs +weight: 450 +--- + +## Metrics + +Metrics are published using [Micrometer]; they are available from Polaris's management interface +(port 8282 by default) under the path `/q/metrics`. For example, if the server is running on +localhost, the metrics can be accessed via http://localhost:8282/q/metrics. + +[Micrometer]: https://quarkus.io/guides/telemetry-micrometer + +Metrics can be scraped by Prometheus or any compatible metrics scraping server. See: +[Prometheus](https://prometheus.io) for more information. + +Additional tags can be added to the metrics by setting the `polaris.metrics.tags.*` property. Each +tag is a key-value pair, where the key is the tag name and the value is the tag value. For example, +to add a tag `environment=prod` to all metrics, set `polaris.metrics.tags.environment=prod`. Many +tags can be added, such as below: + +```properties +polaris.metrics.tags.service=polaris +polaris.metrics.tags.environment=prod +polaris.metrics.tags.region=us-west-2 +``` + +Note that by default Polaris adds one tag: `application=Polaris`. You can override this tag by +setting the `polaris.metrics.tags.application=` property. + +### Realm ID Tag + +Polaris can add the realm ID as a tag to all API and HTTP request metrics. This is disabled by +default to prevent high cardinality issues, but can be enabled by setting the following properties: + +```properties +polaris.metrics.realm-id-tag.enable-in-api-metrics=true +polaris.metrics.realm-id-tag.enable-in-http-metrics=true +``` + +You should be particularly careful when enabling the realm ID tag in HTTP request metrics, as these +metrics typically have a much higher cardinality than API request metrics. + +In order to prevent the number of tags from growing indefinitely and causing performance issues or +crashing the server, the number of unique realm IDs in HTTP request metrics is limited to 100 by +default. If the number of unique realm IDs exceeds this value, a warning will be logged and no more +HTTP request metrics will be recorded. This threshold can be changed by setting the +`polaris.metrics.realm-id-tag.http-metrics-max-cardinality` property. + +## Traces + +Traces are published using [OpenTelemetry]. + +[OpenTelemetry]: https://quarkus.io/guides/opentelemetry-tracing + +By default OpenTelemetry is disabled in Polaris, because there is no reasonable default +for the collector endpoint for all cases. + +To enable OpenTelemetry and publish traces for Polaris set `quarkus.otel.sdk.disabled=false` +and configure a valid collector endpoint URL with `http://` or `https://` as the server property +`quarkus.otel.exporter.otlp.traces.endpoint`. + +_If these properties are not set, the server will not publish traces._ + +The collector must talk the OpenTelemetry protocol (OTLP) and the port must be its gRPC port +(by default 4317), e.g. "http://otlp-collector:4317". + +By default, Polaris adds a few attributes to the [OpenTelemetry Resource] to identify the server, +and notably: + +- `service.name`: set to `Apache Polaris Server (incubating)`; +- `service.version`: set to the Polaris version. + +[OpenTelemetry Resource]: https://opentelemetry.io/docs/languages/js/resources/ + +You can override the default resource attributes or add additional ones by setting the +`quarkus.otel.resource.attributes` property. + +This property expects a comma-separated list of key-value pairs, where the key is the attribute name +and the value is the attribute value. For example, to change the service name to `Polaris` and add +an attribute `deployment.environment=dev`, set the following property: + +```properties +quarkus.otel.resource.attributes=service.name=Polaris,deployment.environment=dev +``` + +The alternative syntax below can also be used: + +```properties +quarkus.otel.resource.attributes[0]=service.name=Polaris +quarkus.otel.resource.attributes[1]=deployment.environment=dev +``` + +Finally, two additional span attributes are added to all request parent spans: + +- `polaris.request.id`: The unique identifier of the request, if set by the caller through the + `Polaris-Request-Id` header. +- `polaris.realm`: The unique identifier of the realm. Always set (unless the request failed because + of a realm resolution error). + +### Troubleshooting Traces + +If the server is unable to publish traces, check first for a log warning message like the following: + +``` +SEVERE [io.ope.exp.int.grp.OkHttpGrpcExporter] (OkHttp http://localhost:4317/...) Failed to export spans. +The request could not be executed. Full error message: Failed to connect to localhost/0:0:0:0:0:0:0:1:4317 +``` + +This means that the server is unable to connect to the collector. Check that the collector is +running and that the URL is correct. + +## Logging + +Polaris relies on [Quarkus](https://quarkus.io/guides/logging) for logging. + +By default, logs are written to the console and to a file located in the `./logs` directory. The log +file is rotated daily and compressed. The maximum size of the log file is 10MB, and the maximum +number of backup files is 14. + +JSON logging can be enabled by setting the `quarkus.log.console.json` and `quarkus.log.file.json` +properties to `true`. By default, JSON logging is disabled. + +The log level can be set for the entire application or for specific packages. The default log level +is `INFO`. To set the log level for the entire application, use the `quarkus.log.level` property. + +To set the log level for a specific package, use the `quarkus.log.category."package-name".level`, +where `package-name` is the name of the package. For example, the package `io.smallrye.config` has a +useful logger to help debugging configuration issues; but it needs to be set to the `DEBUG` level. +This can be done by setting the following property: + +```properties +quarkus.log.category."io.smallrye.config".level=DEBUG +``` + +The log message format for both console and file output is highly configurable. The default format +is: + +``` +%d{yyyy-MM-dd HH:mm:ss,SSS} %-5p [%c{3.}] [%X{requestId},%X{realmId}] [%X{traceId},%X{parentId},%X{spanId},%X{sampled}] (%t) %s%e%n +``` + +Refer to the [Logging format](https://quarkus.io/guides/logging#logging-format) guide for more +information on placeholders and how to customize the log message format. + +### MDC Logging + +Polaris uses Mapped Diagnostic Context (MDC) to enrich log messages with additional context. The +following MDC keys are available: + +- `requestId`: The unique identifier of the request, if set by the caller through the + `Polaris-Request-Id` header. +- `realmId`: The unique identifier of the realm. Always set. +- `traceId`: The unique identifier of the trace. Present if tracing is enabled and the message is + originating from a traced context. +- `parentId`: The unique identifier of the parent span. Present if tracing is enabled and the + message is originating from a traced context. +- `spanId`: The unique identifier of the span. Present if tracing is enabled and the message is + originating from a traced context. +- `sampled`: Whether the trace has been sampled. Present if tracing is enabled and the message is + originating from a traced context. + +Other MDC keys can be added by setting the `polaris.log.mdc.*` property. Each property is a +key-value pair, where the key is the MDC key name and the value is the MDC key value. For example, +to add the MDC keys `environment=prod` and `region=us-west-2` to all log messages, set the following +properties: + +```properties +polaris.log.mdc.environment=prod +polaris.log.mdc.region=us-west-2 +``` + +MDC context is propagated across threads, including in `TaskExecutor` threads. \ No newline at end of file diff --git a/site/content/in-dev/unreleased/getting-started/deploying-polaris/quickstart-deploy-aws.md b/site/content/in-dev/unreleased/getting-started/deploying-polaris/quickstart-deploy-aws.md index 8aa3b34a78..919fc971c8 100644 --- a/site/content/in-dev/unreleased/getting-started/deploying-polaris/quickstart-deploy-aws.md +++ b/site/content/in-dev/unreleased/getting-started/deploying-polaris/quickstart-deploy-aws.md @@ -44,7 +44,7 @@ export CLIENT_SECRET=s3cr3t ``` ## Next Steps -Congrats, you now have a running instance of1 Polaris! For details on how to use Polaris, check out the [Using Polaris]({{% ref "using-polaris" %}}) page. +Congrats, you now have a running instance of1 Polaris! For details on how to use Polaris, check out the [Using Polaris]({{% relref "../using-polaris" %}}) page. ## Cleanup Instructions To shut down the Polaris server, run the following commands: diff --git a/site/content/in-dev/unreleased/getting-started/deploying-polaris/quickstart-deploy-azure.md b/site/content/in-dev/unreleased/getting-started/deploying-polaris/quickstart-deploy-azure.md index ff1f2c6472..74df725db0 100644 --- a/site/content/in-dev/unreleased/getting-started/deploying-polaris/quickstart-deploy-azure.md +++ b/site/content/in-dev/unreleased/getting-started/deploying-polaris/quickstart-deploy-azure.md @@ -39,7 +39,7 @@ export CLIENT_SECRET=s3cr3t ``` ## Next Steps -Congrats, you now have a running instance of Polaris! For further information regarding how to use Polaris, check out the [Using Polaris]({{% ref "using-polaris" %}}) page. +Congrats, you now have a running instance of Polaris! For further information regarding how to use Polaris, check out the [Using Polaris]({{% relref "../using-polaris" %}}) page. ## Cleanup Instructions To shut down the Polaris server, run the following commands: diff --git a/site/content/in-dev/unreleased/getting-started/deploying-polaris/quickstart-deploy-gcp.md b/site/content/in-dev/unreleased/getting-started/deploying-polaris/quickstart-deploy-gcp.md index cbf15a8761..9641ad7282 100644 --- a/site/content/in-dev/unreleased/getting-started/deploying-polaris/quickstart-deploy-gcp.md +++ b/site/content/in-dev/unreleased/getting-started/deploying-polaris/quickstart-deploy-gcp.md @@ -39,7 +39,7 @@ export CLIENT_SECRET=s3cr3t ``` ## Next Steps -Congrats, you now have a running instance of Polaris! For further information regarding how to use Polaris, check out the [Using Polaris]({{% ref "using-polaris" %}}) page. +Congrats, you now have a running instance of Polaris! For further information regarding how to use Polaris, check out the [Using Polaris]({{% relref "../using-polaris" %}}) page. ## Cleanup Instructions To shut down the Polaris server, run the following commands: diff --git a/site/hugo.yaml b/site/hugo.yaml index c6a299d026..5a33b0406a 100644 --- a/site/hugo.yaml +++ b/site/hugo.yaml @@ -108,6 +108,9 @@ menu: - name: "0.9.0" url: "/in-dev/0.9.0/" parent: "releases" + - name: "1.0.0" + url: "/in-dev/1.0.0/" + parent: "releases" - name: "In Development" url: "/in-dev/unreleased/" parent: "releases" From a81c26846654a1bd8ee0bb01e46f2b80798ccd65 Mon Sep 17 00:00:00 2001 From: Yufei Gu Date: Tue, 8 Jul 2025 16:39:44 -0700 Subject: [PATCH 2/3] Resolve comments --- site/content/in-dev/1.0.0/_index.md | 186 --- site/content/in-dev/1.0.0/access-control.md | 212 --- site/content/in-dev/1.0.0/admin-tool.md | 142 -- .../in-dev/1.0.0/command-line-interface.md | 1224 ----------------- site/content/in-dev/1.0.0/configuration.md | 187 --- .../configuring-polaris-for-production.md | 222 --- site/content/in-dev/1.0.0/entities.md | 95 -- site/content/in-dev/1.0.0/evolution.md | 115 -- site/content/in-dev/1.0.0/generic-table.md | 169 --- .../in-dev/1.0.0/getting-started/_index.md | 25 - .../deploying-polaris/_index.md | 27 - .../quickstart-deploy-aws.md | 57 - .../quickstart-deploy-azure.md | 52 - .../quickstart-deploy-gcp.md | 52 - .../getting-started/install-dependencies.md | 118 -- .../1.0.0/getting-started/quickstart.md | 116 -- .../1.0.0/getting-started/using-polaris.md | 315 ----- site/content/in-dev/1.0.0/metastores.md | 151 -- .../in-dev/1.0.0/polaris-catalog-service.md | 26 - .../1.0.0/polaris-management-service.md | 27 - .../in-dev/1.0.0/polaris-spark-client.md | 141 -- site/content/in-dev/1.0.0/policy.md | 197 --- site/content/in-dev/1.0.0/realm.md | 53 - site/content/in-dev/1.0.0/telemetry.md | 192 --- site/hugo.yaml | 2 +- 25 files changed, 1 insertion(+), 4102 deletions(-) delete mode 100644 site/content/in-dev/1.0.0/_index.md delete mode 100644 site/content/in-dev/1.0.0/access-control.md delete mode 100644 site/content/in-dev/1.0.0/admin-tool.md delete mode 100644 site/content/in-dev/1.0.0/command-line-interface.md delete mode 100644 site/content/in-dev/1.0.0/configuration.md delete mode 100644 site/content/in-dev/1.0.0/configuring-polaris-for-production.md delete mode 100644 site/content/in-dev/1.0.0/entities.md delete mode 100644 site/content/in-dev/1.0.0/evolution.md delete mode 100644 site/content/in-dev/1.0.0/generic-table.md delete mode 100644 site/content/in-dev/1.0.0/getting-started/_index.md delete mode 100644 site/content/in-dev/1.0.0/getting-started/deploying-polaris/_index.md delete mode 100644 site/content/in-dev/1.0.0/getting-started/deploying-polaris/quickstart-deploy-aws.md delete mode 100644 site/content/in-dev/1.0.0/getting-started/deploying-polaris/quickstart-deploy-azure.md delete mode 100644 site/content/in-dev/1.0.0/getting-started/deploying-polaris/quickstart-deploy-gcp.md delete mode 100644 site/content/in-dev/1.0.0/getting-started/install-dependencies.md delete mode 100644 site/content/in-dev/1.0.0/getting-started/quickstart.md delete mode 100644 site/content/in-dev/1.0.0/getting-started/using-polaris.md delete mode 100644 site/content/in-dev/1.0.0/metastores.md delete mode 100644 site/content/in-dev/1.0.0/polaris-catalog-service.md delete mode 100644 site/content/in-dev/1.0.0/polaris-management-service.md delete mode 100644 site/content/in-dev/1.0.0/polaris-spark-client.md delete mode 100644 site/content/in-dev/1.0.0/policy.md delete mode 100644 site/content/in-dev/1.0.0/realm.md delete mode 100644 site/content/in-dev/1.0.0/telemetry.md diff --git a/site/content/in-dev/1.0.0/_index.md b/site/content/in-dev/1.0.0/_index.md deleted file mode 100644 index b82c9366c2..0000000000 --- a/site/content/in-dev/1.0.0/_index.md +++ /dev/null @@ -1,186 +0,0 @@ ---- -# -# Licensed to the Apache Software Foundation (ASF) under one -# or more contributor license agreements. See the NOTICE file -# distributed with this work for additional information -# regarding copyright ownership. The ASF licenses this file -# to you under the Apache License, Version 2.0 (the -# "License"); you may not use this file except in compliance -# with the License. You may obtain a copy of the License at -# -# http://www.apache.org/licenses/LICENSE-2.0 -# -# Unless required by applicable law or agreed to in writing, -# software distributed under the License is distributed on an -# "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY -# KIND, either express or implied. See the License for the -# specific language governing permissions and limitations -# under the License. -# -linkTitle: '1.0.0' -title: 'Overview' -type: docs -weight: 200 -params: - top_hidden: true - show_page_toc: false -cascade: - type: docs - params: - show_page_toc: true -# This file will NOT be copied into a new release's versioned docs folder. ---- - -{{< alert title="Warning" color="warning" >}} -These pages refer to the current state of the main branch, which is still under active development. - -Functionalities can be changed, removed or added without prior notice. -{{< /alert >}} - -Apache Polaris (Incubating) is a catalog implementation for Apache Iceberg™ tables and is built on the open source Apache Iceberg™ REST protocol. - -With Polaris, you can provide centralized, secure read and write access to your Iceberg tables across different REST-compatible query engines. - -![Conceptual diagram of Apache Polaris (Incubating).](/img/overview.svg "Apache Polaris (Incubating) overview") - -## Key concepts - -This section introduces key concepts associated with using Apache Polaris (Incubating). - -In the following diagram, a sample [Apache Polaris (Incubating) structure](#catalog) with nested [namespaces](#namespace) is shown for Catalog1. No tables -or namespaces have been created yet for Catalog2 or Catalog3. - -![Diagram that shows an example Apache Polaris (Incubating) structure.](/img/sample-catalog-structure.svg "Sample Apache Polaris (Incubating) structure") - -### Catalog - -In Polaris, you can create one or more catalog resources to organize Iceberg tables. - -Configure your catalog by setting values in the storage configuration for S3, Azure, or Google Cloud Storage. An Iceberg catalog enables a -query engine to manage and organize tables. The catalog forms the first architectural layer in the [Apache Iceberg™ table specification](https://iceberg.apache.org/spec/#overview) and must support the following tasks: - -- Storing the current metadata pointer for one or more Iceberg tables. A metadata pointer maps a table name to the location of that table's - current metadata file. - -- Performing atomic operations so that you can update the current metadata pointer for a table to the metadata pointer of a new version of - the table. - -To learn more about Iceberg catalogs, see the [Apache Iceberg™ documentation](https://iceberg.apache.org/concepts/catalog/). - -#### Catalog types - -A catalog can be one of the following two types: - -- Internal: The catalog is managed by Polaris. Tables from this catalog can be read and written in Polaris. - -- External: The catalog is externally managed by another Iceberg catalog provider (for example, Snowflake, Glue, Dremio Arctic). Tables from - this catalog are synced to Polaris. These tables are read-only in Polaris. - -A catalog is configured with a storage configuration that can point to S3, Azure storage, or GCS. - -### Namespace - -You create *namespaces* to logically group Iceberg tables within a catalog. A catalog can have multiple namespaces. You can also create -nested namespaces. Iceberg tables belong to namespaces. - -> **Important** -> -> For the access privileges defined for a catalog to be enforced correctly, the following conditions must be met: -> -> - The directory only contains the data files that belong to a single table. -> - The directory hierarchy matches the namespace hierarchy for the catalog. -> -> For example, if a catalog includes the following items: -> -> - Top-level namespace namespace1 -> - Nested namespace namespace1a -> - A customers table, which is grouped under nested namespace namespace1a -> - An orders table, which is grouped under nested namespace namespace1a -> -> The directory hierarchy for the catalog must follow this structure: -> -> - /namespace1/namespace1a/customers/ -> - /namespace1/namespace1a/orders/ - -### Storage configuration - -A storage configuration stores a generated identity and access management (IAM) entity for your cloud storage and is created -when you create a catalog. The storage configuration is used to set the values to connect Polaris to your cloud storage. During the -catalog creation process, an IAM entity is generated and used to create a trust relationship between the cloud storage provider and Polaris -Catalog. - -When you create a catalog, you supply the following information about your cloud storage: - -| Cloud storage provider | Information | -| -----------------------| ----------- | -| Amazon S3 |
  • Default base location for your Amazon S3 bucket
  • Locations for your Amazon S3 bucket
  • S3 role ARN
  • External ID (optional)
| -| Google Cloud Storage (GCS) |
  • Default base location for your GCS bucket
  • Locations for your GCS bucket
| -| Azure |
  • Default base location for your Microsoft Azure container
  • Locations for your Microsoft Azure container
  • Azure tenant ID
| - -## Example workflow - -In the following example workflow, Bob creates an Apache Iceberg™ table named Table1 and Alice reads data from Table1. - -1. Bob uses Apache Spark™ to create the Table1 table under the - Namespace1 namespace in the Catalog1 catalog and insert values into - Table1. - - Bob can create Table1 and insert data into it because he is using a - service connection with a service principal that has - the privileges to perform these actions. - -2. Alice uses Snowflake to read data from Table1. - - Alice can read data from Table1 because she is using a service - connection with a service principal with a catalog integration that - has the privileges to perform this action. Alice - creates an unmanaged table in Snowflake to read data from Table1. - -![Diagram that shows an example workflow for Apache Polaris (Incubating)](/img/example-workflow.svg "Example workflow for Apache Polaris (Incubating)") - -## Security and access control - -### Credential vending - -To secure interactions with service connections, Polaris vends temporary storage credentials to the query engine during query -execution. These credentials allow the query engine to run the query without requiring access to your cloud storage for -Iceberg tables. This process is called credential vending. - -As of now, the following limitation is known regarding Apache Iceberg support: - -- **remove_orphan_files:** Apache Spark can't use credential vending - for this due to a known issue. See [apache/iceberg#7914](https://github.com/apache/iceberg/pull/7914) for details. - -### Identity and access management (IAM) - -Polaris uses the identity and access management (IAM) entity to securely connect to your storage for accessing table data, Iceberg -metadata, and manifest files that store the table schema, partitions, and other metadata. Polaris retains the IAM entity for your -storage location. - -### Access control - -Polaris enforces the access control that you configure across all tables registered with the service and governs security for all -queries from query engines in a consistent manner. - -Polaris uses a role-based access control (RBAC) model that lets you centrally configure access for Polaris service principals to catalogs, -namespaces, and tables. - -Polaris RBAC uses two different role types to delegate privileges: - -- **Principal roles:** Granted to Polaris service principals and - analogous to roles in other access control systems that you grant to - service principals. - -- **Catalog roles:** Configured with certain privileges on Polaris - catalog resources and granted to principal roles. - -For more information, see [Access control]({{% ref "access-control" %}}). - -## Legal Notices - -Apache®, Apache Iceberg™, Apache Spark™, Apache Flink®, and Flink® are either registered trademarks or trademarks of the Apache Software Foundation in the United States and/or other countries. - - - diff --git a/site/content/in-dev/1.0.0/access-control.md b/site/content/in-dev/1.0.0/access-control.md deleted file mode 100644 index f8c21ab781..0000000000 --- a/site/content/in-dev/1.0.0/access-control.md +++ /dev/null @@ -1,212 +0,0 @@ ---- -# -# Licensed to the Apache Software Foundation (ASF) under one -# or more contributor license agreements. See the NOTICE file -# distributed with this work for additional information -# regarding copyright ownership. The ASF licenses this file -# to you under the Apache License, Version 2.0 (the -# "License"); you may not use this file except in compliance -# with the License. You may obtain a copy of the License at -# -# http://www.apache.org/licenses/LICENSE-2.0 -# -# Unless required by applicable law or agreed to in writing, -# software distributed under the License is distributed on an -# "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY -# KIND, either express or implied. See the License for the -# specific language governing permissions and limitations -# under the License. -# -Title: Access Control -type: docs -weight: 500 ---- - -This section provides information about how access control works for Apache Polaris (Incubating). - -Polaris uses a role-based access control (RBAC) model in which the Polaris administrator assigns access privileges to catalog roles -and then grants access to resources to service principals by assigning catalog roles to principal roles. - -These are the key concepts to understanding access control in Polaris: - -- **Securable object** -- **Principal role** -- **Catalog role** -- **Privilege** - -## Securable object - -A securable object is an object to which access can be granted. Polaris -has the following securable objects: - -- Catalog -- Namespace -- Iceberg table -- View - -## Principal role - -A principal role is a resource in Polaris that you can use to logically group Polaris service principals together and grant privileges on -securable objects. - -Polaris supports a many-to-one relationship between service principals and principal roles. For example, to grant the same privileges to -multiple service principals, you can grant a single principal role to those service principals. A service principal can be granted one -principal role. When registering a service connection, the Polaris administrator specifies the principal role that is granted to the -service principal. - -You don't grant privileges directly to a principal role. Instead, you configure object permissions at the catalog role level, and then grant -catalog roles to a principal role. - -The following table shows examples of principal roles that you might configure in Polaris: - -| Principal role name | Description | -| -----------------------| ----------- | -| Data_engineer | A role that is granted to multiple service principals for running data engineering jobs. | -| Data_scientist | A role that is granted to multiple service principals for running data science or AI jobs. | - -## Catalog role - -A catalog role belongs to a particular catalog resource in Polaris and specifies a set of permissions for actions on the catalog or objects -in the catalog, such as catalog namespaces or tables. You can create one or more catalog roles for a catalog. - -You grant privileges to a catalog role and then grant the catalog role to a principal role to bestow the privileges to one or more service -principals. - -> **Note** -> -> If you update the privileges bestowed to a service principal, the updates won't take effect for up to one hour. This means that if you -> revoke or grant some privileges for a catalog, the updated privileges won't take effect on any service principal with access to that catalog -> for up to one hour. - -Polaris also supports a many-to-many relationship between catalog roles and principal roles. You can grant the same catalog role to one or more -principal roles. Likewise, a principal role can be granted to one or more catalog roles. - -The following table displays examples of catalog roles that you might -configure in Polaris: - -| Example Catalog role | Description| -| -----------------------|-----------| -| Catalog administrators | A role that has been granted multiple privileges to emulate full access to the catalog.
Principal roles that have been granted this role are permitted to create, alter, read, write, and drop tables in the catalog. | -| Catalog readers | A role that has been granted read-only privileges to tables in the catalog.
Principal roles that have been granted this role are allowed to read from tables in the catalog. | -| Catalog contributor | A role that has been granted read and write access privileges to all tables that belong to the catalog.
Principal roles that have been granted this role are allowed to perform read and write operations on tables in the catalog. | - -## RBAC model - -The following diagram illustrates the RBAC model used by Polaris. For each catalog, the Polaris administrator assigns access -privileges to catalog roles and then grants service principals access to resources by assigning catalog roles to principal roles. Polaris -supports a many-to-one relationship between service principals and principal roles. - -![Diagram that shows the RBAC model for Apache Polaris.](/img/rbac-model.svg "Apache Polaris RBAC model") - -## Access control privileges - -This section describes the privileges that are available in the Polaris access control model. Privileges are granted to catalog roles, catalog -roles are granted to principal roles, and principal roles are granted to service principals to specify the operations that service principals can -perform on objects in Polaris. - -> **Important** -> -> You can only grant privileges at the catalog level. Fine-grained access controls are not available. For example, you can grant read -> privileges to all tables in a catalog but not to an individual table in the catalog. - -To grant the full set of privileges (drop, list, read, write, etc.) on an object, you can use the *full privilege* option. - -### Table privileges - -| Privilege | Description | -| --------- | ----------- | -| TABLE_CREATE | Enables registering a table with the catalog. | -| TABLE_DROP | Enables dropping a table from the catalog. | -| TABLE_LIST | Enables listing any table in the catalog. | -| TABLE_READ_PROPERTIES | Enables reading properties of the table. | -| TABLE_WRITE_PROPERTIES | Enables configuring properties for the table. | -| TABLE_READ_DATA | Enables reading data from the table by receiving short-lived read-only storage credentials from the catalog. | -| TABLE_WRITE_DATA | Enables writing data to the table by receiving short-lived read+write storage credentials from the catalog. | -| TABLE_FULL_METADATA | Grants all table privileges, except TABLE_READ_DATA and TABLE_WRITE_DATA, which need to be granted individually. | -| TABLE_ATTACH_POLICY | Enables attaching policy to a table. | -| TABLE_DETACH_POLICY | Enables detaching policy from a table. | - -### View privileges - -| Privilege | Description | -| --------- | ----------- | -| VIEW_CREATE | Enables registering a view with the catalog. | -| VIEW_DROP | Enables dropping a view from the catalog. | -| VIEW_LIST | Enables listing any views in the catalog. | -| VIEW_READ_PROPERTIES | Enables reading all the view properties. | -| VIEW_WRITE_PROPERTIES | Enables configuring view properties. | -| VIEW_FULL_METADATA | Grants all view privileges. | - -### Namespace privileges - -| Privilege | Description | -| --------- | ----------- | -| NAMESPACE_CREATE | Enables creating a namespace in a catalog. | -| NAMESPACE_DROP | Enables dropping the namespace from the catalog. | -| NAMESPACE_LIST | Enables listing any object in the namespace, including nested namespaces and tables. | -| NAMESPACE_READ_PROPERTIES | Enables reading all the namespace properties. | -| NAMESPACE_WRITE_PROPERTIES | Enables configuring namespace properties. | -| NAMESPACE_FULL_METADATA | Grants all namespace privileges. | -| NAMESPACE_ATTACH_POLICY | Enables attaching policy to a namespace. | -| NAMESPACE_DETACH_POLICY | Enables detaching policy from a namespace. | - -### Catalog privileges - -| Privilege | Description | -| -----------------------| ----------- | -| CATALOG_MANAGE_ACCESS | Includes the ability to grant or revoke privileges on objects in a catalog to catalog roles, and the ability to grant or revoke catalog roles to or from principal roles. | -| CATALOG_MANAGE_CONTENT | Enables full management of content for the catalog. This privilege encompasses the following privileges:
  • CATALOG_MANAGE_METADATA
  • TABLE_FULL_METADATA
  • NAMESPACE_FULL_METADATA
  • VIEW_FULL_METADATA
  • TABLE_WRITE_DATA
  • TABLE_READ_DATA
  • CATALOG_READ_PROPERTIES
  • CATALOG_WRITE_PROPERTIES
| -| CATALOG_MANAGE_METADATA | Enables full management of the catalog, catalog roles, namespaces, and tables. | -| CATALOG_READ_PROPERTIES | Enables listing catalogs and reading properties of the catalog. | -| CATALOG_WRITE_PROPERTIES | Enables configuring catalog properties. | -| CATALOG_ATTACH_POLICY | Enables attaching policy to a catalog. | -| CATALOG_DETACH_POLICY | Enables detaching policy from a catalog. | - -### Policy privileges - -| Privilege | Description | -| -----------------------| ----------- | -| POLICY_CREATE | Enables creating a policy under specified namespace. | -| POLICY_READ | Enables reading policy content and metadata. | -| POLICY_WRITE | Enables updating the policy details such as its content or description. | -| POLICY_LIST | Enables listing any policy from the catalog. | -| POLICY_DROP | Enables dropping a policy if it is not attached to any resource entity. | -| POLICY_FULL_METADATA | Grants all policy privileges. | -| POLICY_ATTACH | Enables policy to be attached to entities. | -| POLICY_DETACH | Enables policy to be detached from entities. | - -## RBAC example - -The following diagram illustrates how RBAC works in Polaris and -includes the following users: - -- **Alice:** A service admin who signs up for Polaris. Alice can - create service principals. She can also create catalogs and - namespaces and configure access control for Polaris resources. - -- **Bob:** A data engineer who uses Apache Spark™ to - interact with Polaris. - - - Alice has created a service principal for Bob. It has been - granted the Data_engineer principal role, which in turn has been - granted the following catalog roles: Catalog contributor and - Data administrator (for both the Silver and Gold zone catalogs - in the following diagram). - - - The Catalog contributor role grants permission to create - namespaces and tables in the Bronze zone catalog. - - - The Data administrator roles grant full administrative rights to - the Silver zone catalog and Gold zone catalog. - -- **Mark:** A data scientist who uses trains models with data managed - by Polaris. - - - Alice has created a service principal for Mark. It has been - granted the Data_scientist principal role, which in turn has - been granted the catalog role named Catalog reader. - - - The Catalog reader role grants read-only access for a catalog - named Gold zone catalog. - -![Diagram that shows an example of how RBAC works in Apache Polaris.](/img/rbac-example.svg "Apache Polaris RBAC example") diff --git a/site/content/in-dev/1.0.0/admin-tool.md b/site/content/in-dev/1.0.0/admin-tool.md deleted file mode 100644 index 14f37b6f0f..0000000000 --- a/site/content/in-dev/1.0.0/admin-tool.md +++ /dev/null @@ -1,142 +0,0 @@ ---- -# -# Licensed to the Apache Software Foundation (ASF) under one -# or more contributor license agreements. See the NOTICE file -# distributed with this work for additional information -# regarding copyright ownership. The ASF licenses this file -# to you under the Apache License, Version 2.0 (the -# "License"); you may not use this file except in compliance -# with the License. You may obtain a copy of the License at -# -# http://www.apache.org/licenses/LICENSE-2.0 -# -# Unless required by applicable law or agreed to in writing, -# software distributed under the License is distributed on an -# "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY -# KIND, either express or implied. See the License for the -# specific language governing permissions and limitations -# under the License. -# -title: Admin Tool -type: docs -weight: 300 ---- - -Polaris includes a tool for administrators to manage the metastore. - -The tool must be built with the necessary JDBC drivers to access the metastore database. For -example, to build the tool with support for Postgres, run the following: - -```shell -./gradlew \ - :polaris-admin:assemble \ - :polaris-admin:quarkusAppPartsBuild --rerun \ - -Dquarkus.container-image.build=true -``` - -The above command will generate: - -- One standalone JAR in `runtime/admin/build/polaris-admin-*-runner.jar` -- Two distribution archives in `runtime/admin/build/distributions` -- Two Docker images named `apache/polaris-admin-tool:latest` and `apache/polaris-admin-tool:` - -## Usage - -Please make sure the admin tool and Polaris server are with the same version before using it. -To run the standalone JAR, use the following command: - -```shell -java -jar runtime/admin/build/polaris-admin-*-runner.jar --help -``` - -To run the Docker image, use the following command: - -```shell -docker run apache/polaris-admin-tool:latest --help -``` - -The basic usage of the Polaris Admin Tool is outlined below: - -``` -Usage: polaris-admin-runner.jar [-hV] [COMMAND] -Polaris Admin Tool - -h, --help Show this help message and exit. - -V, --version Print version information and exit. -Commands: - help Display help information about the specified command. - bootstrap Bootstraps realms and principal credentials. - purge Purge principal credentials. -``` - -## Configuration - -The Polaris Admin Tool must be executed with the same configuration as the Polaris server. The -configuration can be done via environment variables or system properties. - -At a minimum, it is necessary to configure the Polaris Admin Tool to connect to the same database -used by the Polaris server. - -See the [metastore documentation]({{% ref "metastores" %}}) for more information on configuring the -database connection. - -Note: Polaris will always create schema 'polaris_schema' during bootstrap under the configured database. - -## Bootstrapping Realms and Principal Credentials - -The `bootstrap` command is used to bootstrap realms and create the necessary principal credentials -for the Polaris server. This command is idempotent and can be run multiple times without causing any -issues. If a realm is already bootstrapped, running the `bootstrap` command again will not have any -effect on that realm. - -```shell -java -jar runtime/admin/build/polaris-admin-*-runner.jar bootstrap --help -``` - -The basic usage of the `bootstrap` command is outlined below: - -``` -Usage: polaris-admin-runner.jar bootstrap [-hV] [-c=]... -r= [-r=]... -Bootstraps realms and root principal credentials. - -c, --credential= - Root principal credentials to bootstrap. Must be of the form - 'realm,clientId,clientSecret'. - -h, --help Show this help message and exit. - -r, --realm= The name of a realm to bootstrap. - -V, --version Print version information and exit. -``` - -For example, to bootstrap the `realm1` realm and create its root principal credential with the -client ID `admin` and client secret `admin`, you can run the following command: - -```shell -java -jar runtime/admin/build/polaris-admin-*-runner.jar bootstrap -r realm1 -c realm1,admin,admin -``` - -## Purging Realms and Principal Credentials - -The `purge` command is used to remove realms and principal credentials from the Polaris server. - -> Warning: Running the `purge` command will remove all data associated with the specified realms! - This includes all entities (catalogs, namespaces, tables, views, roles), all principal - credentials, grants, and any other data associated with the realms. - -```shell -java -jar runtime/admin/build/polaris-admin-*-runner.jar purge --help -``` - -The basic usage of the `purge` command is outlined below: - -``` -Usage: polaris-admin-runner.jar purge [-hV] -r= [-r=]... -Purge realms and all associated entities. - -h, --help Show this help message and exit. - -r, --realm= The name of a realm to purge. - -V, --version Print version information and exit. -``` - -For example, to purge the `realm1` realm, you can run the following command: - -```shell -java -jar runtime/admin/build/polaris-admin-*-runner.jar purge -r realm1 -``` \ No newline at end of file diff --git a/site/content/in-dev/1.0.0/command-line-interface.md b/site/content/in-dev/1.0.0/command-line-interface.md deleted file mode 100644 index f20210e2c6..0000000000 --- a/site/content/in-dev/1.0.0/command-line-interface.md +++ /dev/null @@ -1,1224 +0,0 @@ ---- -# -# Licensed to the Apache Software Foundation (ASF) under one -# or more contributor license agreements. See the NOTICE file -# distributed with this work for additional information -# regarding copyright ownership. The ASF licenses this file -# to you under the Apache License, Version 2.0 (the -# "License"); you may not use this file except in compliance -# with the License. You may obtain a copy of the License at -# -# http://www.apache.org/licenses/LICENSE-2.0 -# -# Unless required by applicable law or agreed to in writing, -# software distributed under the License is distributed on an -# "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY -# KIND, either express or implied. See the License for the -# specific language governing permissions and limitations -# under the License. -# -title: Command Line Interface -type: docs -weight: 300 ---- - -In order to help administrators quickly set up and manage their Polaris server, Polaris provides a simple command-line interface (CLI) for common tasks. - -The basic syntax of the Polaris CLI is outlined below: - -``` -polaris [options] COMMAND ... - -options: ---host ---port ---base-url ---client-id ---client-secret ---access-token ---profile -``` - -`COMMAND` must be one of the following: -1. catalogs -2. principals -3. principal-roles -4. catalog-roles -5. namespaces -6. privileges -7. profiles - -Each _command_ supports several _subcommands_, and some _subcommands_ have _actions_ that come after the subcommand in turn. Finally, _arguments_ follow to form a full invocation. Within a set of named arguments at the end of an invocation ordering is generally not important. Many invocations also have a required positional argument of the type that the _command_ refers to. Again, the ordering of this positional argument relative to named arguments is not important. - -Some example full invocations: - -``` -polaris principals list -polaris catalogs delete some_catalog_name -polaris catalogs update --property foo=bar some_other_catalog -polaris catalogs update another_catalog --property k=v -polaris privileges namespace grant --namespace some.schema --catalog fourth_catalog --catalog-role some_catalog_role TABLE_READ_DATA -polaris profiles list -``` - -### Authentication - -As outlined above, the Polaris CLI may take credentials using the `--client-id` and `--client-secret` options. For example: - -``` -polaris --client-id 4b5ed1ca908c3cc2 --client-secret 07ea8e4edefb9a9e57c247e8d1a4f51c principals ... -``` - -If `--client-id` and `--client-secret` are not provided, the Polaris CLI will try to read the client ID and client secret from environment variables called `CLIENT_ID` and `CLIENT_SECRET` respectively. If these flags are not provided and the environment variables are not set, the CLI will fail. - -Alternatively, the `--access-token` option can be used instead of `--client-id` and `--client-secret`, but both authentication methods cannot be used simultaneously. - -Additionally, the `--profile` option can be used to specify a saved profile instead of providing authentication details directly. If `--profile` is not provided, the CLI will check the `CLIENT_PROFILE` environment variable. Profiles store authentication details and connection settings, simplifying repeated CLI usage. - -If the `--host` and `--port` options are not provided, the CLI will default to communicating with `localhost:8181`. - -Alternatively, the `--base-url` option can be used instead of `--host` and `--port`, but both options cannot be used simultaneously. This allows specifying arbitrary Polaris URLs, including HTTPS ones, that have additional base prefixes before the `/api/*/v1` subpaths. - -### PATH - -These examples assume the Polaris CLI is on the PATH and so can be invoked just by the command `polaris`. You can add the CLI to your PATH environment variable with a command like the following: - -``` -export PATH="~/polaris:$PATH" -``` - -Alternatively, you can run the CLI by providing a path to it, such as with the following invocation: - -``` -~/polaris principals list -``` - -## Commands - -Each of the commands `catalogs`, `principals`, `principal-roles`, `catalog-roles`, and `privileges` is used to manage a different type of entity within Polaris. - -In addition to these, the `profiles` command is available for managing stored authentication profiles, allowing login credentials to be configured for reuse. This provides an alternative to passing authentication details with every command. - -To find details on the options that can be provided to a particular command or subcommand ad-hoc, you may wish to use the `--help` flag. For example: - -``` -polaris catalogs --help -polaris principals create --help -polaris profiles --help -``` - -### catalogs - -The `catalogs` command is used to create, discover, and otherwise manage catalogs within Polaris. - -`catalogs` supports the following subcommands: - -1. create -2. delete -3. get -4. list -5. update - -#### create - -The `create` subcommand is used to create a catalog. - -``` -input: polaris catalogs create --help -options: - create - Named arguments: - --type The type of catalog to create in [INTERNAL, EXTERNAL]. INTERNAL by default. - --storage-type (Required) The type of storage to use for the catalog - --default-base-location (Required) Default base location of the catalog - --allowed-location An allowed location for files tracked by the catalog. Multiple locations can be provided by specifying this option more than once. - --role-arn (Required for S3) A role ARN to use when connecting to S3 - --external-id (Only for S3) The external ID to use when connecting to S3 - --tenant-id (Required for Azure) A tenant ID to use when connecting to Azure Storage - --multi-tenant-app-name (Only for Azure) The app name to use when connecting to Azure Storage - --consent-url (Only for Azure) A consent URL granting permissions for the Azure Storage location - --service-account (Only for GCS) The service account to use when connecting to GCS - --property A key/value pair such as: tag=value. Multiple can be provided by specifying this option more than once - Positional arguments: - catalog -``` - -##### Examples - -``` -polaris catalogs create \ - --storage-type s3 \ - --default-base-location s3://example-bucket/my_data \ - --role-arn ${ROLE_ARN} \ - my_catalog - -polaris catalogs create \ - --storage-type s3 \ - --default-base-location s3://example-bucket/my_other_data \ - --allowed-location s3://example-bucket/second_location \ - --allowed-location s3://other-bucket/third_location \ - --role-arn ${ROLE_ARN} \ - my_other_catalog - -polaris catalogs create \ - --storage-type file \ - --default-base-location file:///example/tmp \ - quickstart_catalog -``` - -#### delete - -The `delete` subcommand is used to delete a catalog. - -``` -input: polaris catalogs delete --help -options: - delete - Positional arguments: - catalog -``` - -##### Examples - -``` -polaris catalogs delete some_catalog -``` - -#### get - -The `get` subcommand is used to retrieve details about a catalog. - -``` -input: polaris catalogs get --help -options: - get - Positional arguments: - catalog -``` - -##### Examples - -``` -polaris catalogs get some_catalog - -polaris catalogs get another_catalog -``` - -#### list - -The `list` subcommand is used to show details about all catalogs, or those that a certain principal role has access to. The principal used to perform this operation must have the `CATALOG_LIST` privilege. - -``` -input: polaris catalogs list --help -options: - list - Named arguments: - --principal-role The name of a principal role -``` - -##### Examples - -``` -polaris catalogs list - -polaris catalogs list --principal-role some_user -``` - -#### update - -The `update` subcommand is used to update a catalog. Currently, this command supports changing the properties of a catalog or updating its storage configuration. - -``` -input: polaris catalogs update --help -options: - update - Named arguments: - --default-base-location (Required) Default base location of the catalog - --allowed-location An allowed location for files tracked by the catalog. Multiple locations can be provided by specifying this option more than once. - --property A key/value pair such as: tag=value. Multiple can be provided by specifying this option more than once - Positional arguments: - catalog -``` - -##### Examples - -``` -polaris catalogs update --property tag=new_value my_catalog - -polaris catalogs update --default-base-location s3://new-bucket/my_data my_catalog -``` - -### Principals - -The `principals` command is used to manage principals within Polaris. - -`principals` supports the following subcommands: - -1. create -2. delete -3. get -4. list -5. rotate-credentials -6. update -7. access - -#### create - -The `create` subcommand is used to create a new principal. - -``` -input: polaris principals create --help -options: - create - Named arguments: - --type The type of principal to create in [SERVICE] - --property A key/value pair such as: tag=value. Multiple can be provided by specifying this option more than once - Positional arguments: - principal -``` - -##### Examples - -``` -polaris principals create some_user - -polaris principals create --client-id ${CLIENT_ID} --property admin=true some_admin_user -``` - -#### delete - -The `delete` subcommand is used to delete a principal. - -``` -input: polaris principals delete --help -options: - delete - Positional arguments: - principal -``` - -##### Examples - -``` -polaris principals delete some_user - -polaris principals delete some_admin_user -``` - -#### get - -The `get` subcommand retrieves details about a principal. - -``` -input: polaris principals get --help -options: - get - Positional arguments: - principal -``` - -##### Examples - -``` -polaris principals get some_user - -polaris principals get some_admin_user -``` - -#### list - -The `list` subcommand shows details about all principals. - -##### Examples - -``` -polaris principals list -``` - -#### rotate-credentials - -The `rotate-credentials` subcommand is used to update the credentials used by a principal. After this command runs successfully, the new credentials will be printed to stdout. - -``` -input: polaris principals rotate-credentials --help -options: - rotate-credentials - Positional arguments: - principal -``` - -##### Examples - -``` -polaris principals rotate-credentials some_user - -polaris principals rotate-credentials some_admin_user -``` - -#### update - -The `update` subcommand is used to update a principal. Currently, this supports rewriting the properties associated with a principal. - -``` -input: polaris principals update --help -options: - update - Named arguments: - --property A key/value pair such as: tag=value. Multiple can be provided by specifying this option more than once - Positional arguments: - principal -``` - -##### Examples - -``` -polaris principals update --property key=value --property other_key=other_value some_user - -polaris principals update --property are_other_keys_removed=yes some_user -``` - -#### access - -The `access` subcommand retrieves entities relation about a principal. - -``` -input: polaris principals access --help -options: - access - Positional arguments: - principal -``` - -##### Examples - -``` -polaris principals access quickstart_user -``` - -### Principal Roles - -The `principal-roles` command is used to create, discover, and manage principal roles within Polaris. Additionally, this command can identify principals or catalog roles associated with a principal role, and can be used to grant a principal role to a principal. - -`principal-roles` supports the following subcommands: - -1. create -2. delete -3. get -4. list -5. update -6. grant -7. revoke - -#### create - -The `create` subcommand is used to create a new principal role. - -``` -input: polaris principal-roles create --help -options: - create - Named arguments: - --property A key/value pair such as: tag=value. Multiple can be provided by specifying this option more than once - Positional arguments: - principal_role -``` - -##### Examples - -``` -polaris principal-roles create data_engineer - -polaris principal-roles create --property key=value data_analyst -``` - -#### delete - -The `delete` subcommand is used to delete a principal role. - -``` -input: polaris principal-roles delete --help -options: - delete - Positional arguments: - principal_role -``` - -##### Examples - -``` -polaris principal-roles delete data_engineer - -polaris principal-roles delete data_analyst -``` - -#### get - -The `get` subcommand retrieves details about a principal role. - -``` -input: polaris principal-roles get --help -options: - get - Positional arguments: - principal_role -``` - -##### Examples - -``` -polaris principal-roles get data_engineer - -polaris principal-roles get data_analyst -``` - -#### list - -The list subcommand is used to print out all principal roles or, alternatively, to list all principal roles associated with a given principal or with a given catalog role. - -``` -input: polaris principal-roles list --help -options: - list - Named arguments: - --catalog-role The name of a catalog role. If provided, show only principal roles assigned to this catalog role. - --principal The name of a principal. If provided, show only principal roles assigned to this principal. -``` - -##### Examples - -``` -polaris principal-roles list - -polaris principal-roles --principal d.knuth - -polaris principal-roles --catalog-role super_secret_data -``` - -#### update - -The `update` subcommand is used to update a principal role. Currently, this supports updating the properties tied to a principal role. - -``` -input: polaris principal-roles update --help -options: - update - Named arguments: - --property A key/value pair such as: tag=value. Multiple can be provided by specifying this option more than once - Positional arguments: - principal_role -``` - -##### Examples - -``` -polaris principal-roles update --property key=value2 data_engineer - -polaris principal-roles update data_analyst --property key=value3 -``` - -#### grant - -The `grant` subcommand is used to grant a principal role to a principal. - -``` -input: polaris principal-roles grant --help -options: - grant - Named arguments: - --principal A principal to grant this principal role to - Positional arguments: - principal_role -``` - -##### Examples - -``` -polaris principal-roles grant --principal d.knuth data_engineer - -polaris principal-roles grant data_scientist --principal a.ng -``` - -#### revoke - -The `revoke` subcommand is used to revoke a principal role from a principal. - -``` -input: polaris principal-roles revoke --help -options: - revoke - Named arguments: - --principal A principal to revoke this principal role from - Positional arguments: - principal_role -``` - -##### Examples - -``` -polaris principal-roles revoke --principal former.employee data_engineer - -polaris principal-roles revoke data_scientist --principal changed.role -``` - -### Catalog Roles - -The catalog-roles command is used to create, discover, and manage catalog roles within Polaris. Additionally, this command can be used to grant a catalog role to a principal role. - -`catalog-roles` supports the following subcommands: - -1. create -2. delete -3. get -4. list -5. update -6. grant -7. revoke - -#### create - -The `create` subcommand is used to create a new catalog role. - -``` -input: polaris catalog-roles create --help -options: - create - Named arguments: - --catalog The name of an existing catalog - --property A key/value pair such as: tag=value. Multiple can be provided by specifying this option more than once - Positional arguments: - catalog_role -``` - -##### Examples - -``` -polaris catalog-roles create --property key=value --catalog some_catalog sales_data - -polaris catalog-roles create --catalog other_catalog sales_data -``` - -#### delete - -The `delete` subcommand is used to delete a catalog role. - -``` -input: polaris catalog-roles delete --help -options: - delete - Named arguments: - --catalog The name of an existing catalog - Positional arguments: - catalog_role -``` - -##### Examples - -``` -polaris catalog-roles delete --catalog some_catalog sales_data - -polaris catalog-roles delete --catalog other_catalog sales_data -``` - -#### get - -The `get` subcommand retrieves details about a catalog role. - -``` -input: polaris catalog-roles get --help -options: - get - Named arguments: - --catalog The name of an existing catalog - Positional arguments: - catalog_role -``` - -##### Examples - -``` -polaris catalog-roles get --catalog some_catalog inventory_data - -polaris catalog-roles get --catalog other_catalog inventory_data -``` - -#### list - -The `list` subcommand is used to print all catalog roles. Alternatively, if a principal role is provided, only catalog roles associated with that principal are shown. - -``` -input: polaris catalog-roles list --help -options: - list - Named arguments: - --principal-role The name of a principal role - Positional arguments: - catalog -``` - -##### Examples - -``` -polaris catalog-roles list - -polaris catalog-roles list --principal-role data_engineer -``` - -#### update - -The `update` subcommand is used to update a catalog role. Currently, only updating properties associated with the catalog role is supported. - -``` -input: polaris catalog-roles update --help -options: - update - Named arguments: - --catalog The name of an existing catalog - --property A key/value pair such as: tag=value. Multiple can be provided by specifying this option more than once - Positional arguments: - catalog_role -``` - -##### Examples - -``` -polaris catalog-roles update --property contains_pii=true --catalog some_catalog sales_data - -polaris catalog-roles update sales_data --catalog some_catalog --property key=value -``` - -#### grant - -The `grant` subcommand is used to grant a catalog role to a principal role. - -``` -input: polaris catalog-roles grant --help -options: - grant - Named arguments: - --catalog The name of an existing catalog - --principal-role The name of a catalog role - Positional arguments: - catalog_role -``` - -##### Examples - -``` -polaris catalog-roles grant sensitive_data --catalog some_catalog --principal-role power_user - -polaris catalog-roles grant --catalog sales_data contains_cc_info_catalog_role --principal-role financial_analyst_role -``` - -#### revoke - -The `revoke` subcommand is used to revoke a catalog role from a principal role. - -``` -input: polaris catalog-roles revoke --help -options: - revoke - Named arguments: - --catalog The name of an existing catalog - --principal-role The name of a catalog role - Positional arguments: - catalog_role -``` - -##### Examples - -``` -polaris catalog-roles revoke sensitive_data --catalog some_catalog --principal-role power_user - -polaris catalog-roles revoke --catalog sales_data contains_cc_info_catalog_role --principal-role financial_analyst_role -``` - -### Namespaces - -The `namespaces` command is used to manage namespaces within Polaris. - -`namespaces` supports the following subcommands: - -1. create -2. delete -3. get -4. list - -#### create - -The `create` subcommand is used to create a new namespace. - -When creating a namespace with an explicit location, that location must reside within the parent catalog or namespace. - -``` -input: polaris namespaces create --help -options: - create - Named arguments: - --catalog The name of an existing catalog - --location If specified, the location at which to store the namespace and entities inside it - --property A key/value pair such as: tag=value. Multiple can be provided by specifying this option more than once - Positional arguments: - namespace -``` - -##### Examples - -``` -polaris namespaces create --catalog my_catalog outer - -polaris namespaces create --catalog my_catalog --location 's3://bucket/outer/inner_SUFFIX' outer.inner -``` - -#### delete - -The `delete` subcommand is used to delete a namespace. - -``` -input: polaris namespaces delete --help -options: - delete - Named arguments: - --catalog The name of an existing catalog - Positional arguments: - namespace -``` - -##### Examples - -``` -polaris namespaces delete outer_namespace.inner_namespace --catalog my_catalog - -polaris namespaces delete --catalog my_catalog outer_namespace -``` - -#### get - -The `get` subcommand retrieves details about a namespace. - -``` -input: polaris namespaces get --help -options: - get - Named arguments: - --catalog The name of an existing catalog - Positional arguments: - namespace -``` - -##### Examples - -``` -polaris namespaces get --catalog some_catalog a.b - -polaris namespaces get a.b.c --catalog some_catalog -``` - -#### list - -The `list` subcommand shows details about all namespaces directly within a catalog or, optionally, within some parent prefix in that catalog. - -``` -input: polaris namespaces list --help -options: - list - Named arguments: - --catalog The name of an existing catalog - --parent If specified, list namespaces inside this parent namespace -``` - -##### Examples - -``` -polaris namespaces list --catalog my_catalog - -polaris namespaces list --catalog my_catalog --parent a - -polaris namespaces list --catalog my_catalog --parent a.b -``` - -### Privileges - -The `privileges` command is used to grant various privileges to a catalog role, or to revoke those privileges. Privileges can be on the level of a catalog, a namespace, a table, or a view. For more information on privileges, please refer to the [docs]({{% ref "entities#privilege" %}}). - -Note that when using the `privileges` command, the user specifies the relevant catalog and catalog role before selecting a subcommand. - -`privileges` supports the following subcommands: - -1. list -2. catalog -3. namespace -4. table -5. view - -Each of these subcommands, except `list`, supports the `grant` and `revoke` actions and requires an action to be specified. - -Note that each subcommand's `revoke` action always accepts the same options that the corresponding `grant` action does, but with the addition of the `cascade` option. `cascade` is used to revoke all other privileges that depend on the specified privilege. - -#### list - -The `list` subcommand shows details about all privileges for a catalog role. - -``` -input: polaris privileges list --help -options: - list - Named arguments: - --catalog The name of an existing catalog - --catalog-role The name of a catalog role -``` - -##### Examples - -``` -polaris privileges list --catalog my_catalog --catalog-role my_role - -polaris privileges my_role list --catalog-role my_other_role --catalog my_catalog -``` - -#### catalog - -The `catalog` subcommand manages privileges at the catalog level. `grant` is used to grant catalog privileges to the specified catalog role, and `revoke` is used to revoke them. - -``` -input: polaris privileges catalog --help -options: - catalog - grant - Named arguments: - --catalog The name of an existing catalog - --catalog-role The name of a catalog role - Positional arguments: - privilege - revoke - Named arguments: - --cascade When revoking privileges, additionally revoke privileges that depend on the specified privilege - --catalog The name of an existing catalog - --catalog-role The name of a catalog role - Positional arguments: - privilege -``` - -##### Examples - -``` -polaris privileges \ - catalog \ - grant \ - --catalog my_catalog \ - --catalog-role catalog_role \ - TABLE_CREATE - -polaris privileges \ - catalog \ - revoke \ - --catalog my_catalog \ - --catalog-role catalog_role \ - --cascade \ - TABLE_CREATE -``` - -#### namespace - -The `namespace` subcommand manages privileges at the namespace level. - -``` -input: polaris privileges namespace --help -options: - namespace - grant - Named arguments: - --namespace A period-delimited namespace - --catalog The name of an existing catalog - --catalog-role The name of a catalog role - Positional arguments: - privilege - revoke - Named arguments: - --namespace A period-delimited namespace - --cascade When revoking privileges, additionally revoke privileges that depend on the specified privilege - --catalog The name of an existing catalog - --catalog-role The name of a catalog role - Positional arguments: - privilege -``` - -##### Examples - -``` -polaris privileges \ - namespace \ - grant \ - --catalog my_catalog \ - --catalog-role catalog_role \ - --namespace a.b \ - TABLE_LIST - -polaris privileges \ - namespace \ - revoke \ - --catalog my_catalog \ - --catalog-role catalog_role \ - --namespace a.b \ - TABLE_LIST -``` - -#### table - -The `table` subcommand manages privileges at the table level. - -``` -input: polaris privileges table --help -options: - table - grant - Named arguments: - --namespace A period-delimited namespace - --table The name of a table - --catalog The name of an existing catalog - --catalog-role The name of a catalog role - Positional arguments: - privilege - revoke - Named arguments: - --namespace A period-delimited namespace - --table The name of a table - --cascade When revoking privileges, additionally revoke privileges that depend on the specified privilege - --catalog The name of an existing catalog - --catalog-role The name of a catalog role - Positional arguments: - privilege -``` - -##### Examples - -``` -polaris privileges \ - table \ - grant \ - --catalog my_catalog \ - --catalog-role catalog_role \ - --namespace a.b \ - --table t \ - TABLE_DROP - -polaris privileges \ - table \ - grant \ - --catalog my_catalog \ - --catalog-role catalog_role \ - --namespace a.b \ - --table t \ - --cascade \ - TABLE_DROP -``` - -#### view - -The `view` subcommand manages privileges at the view level. - -``` -input: polaris privileges view --help -options: - view - grant - Named arguments: - --namespace A period-delimited namespace - --view The name of a view - --catalog The name of an existing catalog - --catalog-role The name of a catalog role - Positional arguments: - privilege - revoke - Named arguments: - --namespace A period-delimited namespace - --view The name of a view - --cascade When revoking privileges, additionally revoke privileges that depend on the specified privilege - --catalog The name of an existing catalog - --catalog-role The name of a catalog role - Positional arguments: - privilege -``` - -##### Examples - -``` -polaris privileges \ - view \ - grant \ - --catalog my_catalog \ - --catalog-role catalog_role \ - --namespace a.b.c \ - --view v \ - VIEW_FULL_METADATA - -polaris privileges \ - view \ - grant \ - --catalog my_catalog \ - --catalog-role catalog_role \ - --namespace a.b.c \ - --view v \ - --cascade \ - VIEW_FULL_METADATA -``` - -### profiles - -The `profiles` command is used to manage stored authentication profiles in Polaris. Profiles allow authentication credentials to be saved and reused, eliminating the need to pass credentials with every command. - -`profiles` supports the following subcommands: - -1. create -2. delete -3. get -4. list -5. update - -#### create - -The `create` subcommand is used to create a new authentication profile. - -``` -input: polaris profiles create --help -options: - create - Positional arguments: - profile -``` - -##### Examples - -``` -polaris profiles create dev -``` - -#### delete - -The `delete` subcommand removes a stored profile. - -``` -input: polaris profiles delete --help -options: - delete - Positional arguments: - profile -``` - -##### Examples - -``` -polaris profiles delete dev -``` - -#### get - -The `get` subcommand removes a stored profile. - -``` -input: polaris profiles get --help -options: - get - Positional arguments: - profile -``` - -##### Examples - -``` -polaris profiles get dev -``` - -#### list - -The `list` subcommand displays all stored profiles. - -``` -input: polaris profiles list --help -options: - list -``` - -##### Examples - -``` -polaris profiles list -``` - -#### update - -The `update` subcommand modifies an existing profile. - -``` -input: polaris profiles update --help -options: - update - Positional arguments: - profile -``` - -##### Examples - -``` -polaris profiles update dev -``` - -## Examples - -This section outlines example code for a few common operations as well as for some more complex ones. - -For especially complex operations, you may wish to instead directly use the Python API. - -### Creating a principal and a catalog - -``` -polaris principals create my_user - -polaris catalogs create \ - --type internal \ - --storage-type s3 \ - --default-base-location s3://iceberg-bucket/polaris-base \ - --role-arn arn:aws:iam::111122223333:role/ExampleCorpRole \ - --allowed-location s3://iceberg-bucket/polaris-alt-location-1 \ - --allowed-location s3://iceberg-bucket/polaris-alt-location-2 \ - my_catalog -``` - -### Granting a principal the ability to manage the content of a catalog - -``` -polaris principal-roles create power_user -polaris principal-roles grant --principal my_user power_user - -polaris catalog-roles create --catalog my_catalog my_catalog_role -polaris catalog-roles grant \ - --catalog my_catalog \ - --principal-role power_user \ - my_catalog_role - -polaris privileges \ - catalog \ - --catalog my_catalog \ - --catalog-role my_catalog_role \ - grant \ - CATALOG_MANAGE_CONTENT -``` - -### Identifying the tables a given principal has been granted explicit access to read - -_Note that some other privileges, such as `CATALOG_MANAGE_CONTENT`, subsume `TABLE_READ_DATA` and would not be discovered here._ - -``` -principal_roles=$(polaris principal-roles list --principal my_principal) -for principal_role in ${principal_roles}; do - catalog_roles=$(polaris catalog-roles --list --principal-role "${principal_role}") - for catalog_role in ${catalog_roles}; do - grants=$(polaris privileges list --catalog-role "${catalog_role}" --catalog "${catalog}") - for grant in $(echo "${grants}" | jq -c '.[] | select(.privilege == "TABLE_READ_DATA")'); do - echo "${grant}" - fi - done - done -done -``` - - diff --git a/site/content/in-dev/1.0.0/configuration.md b/site/content/in-dev/1.0.0/configuration.md deleted file mode 100644 index 95d77230f9..0000000000 --- a/site/content/in-dev/1.0.0/configuration.md +++ /dev/null @@ -1,187 +0,0 @@ ---- -# -# Licensed to the Apache Software Foundation (ASF) under one -# or more contributor license agreements. See the NOTICE file -# distributed with this work for additional information -# regarding copyright ownership. The ASF licenses this file -# to you under the Apache License, Version 2.0 (the -# "License"); you may not use this file except in compliance -# with the License. You may obtain a copy of the License at -# -# http://www.apache.org/licenses/LICENSE-2.0 -# -# Unless required by applicable law or agreed to in writing, -# software distributed under the License is distributed on an -# "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY -# KIND, either express or implied. See the License for the -# specific language governing permissions and limitations -# under the License. -# -title: Configuring Polaris -type: docs -weight: 550 ---- - -## Overview - -This page provides information on how to configure Apache Polaris (Incubating). Unless stated -otherwise, this information is valid both for Polaris Docker images (and Kubernetes deployments) as -well as for Polaris binary distributions. - -> Note: for Production tips and best practices, refer to [Configuring Polaris for Production]({{% ref "configuring-polaris-for-production.md" %}}). - -First off, Polaris server runs on Quarkus, and uses its configuration mechanisms. Read Quarkus -[configuration guide](https://quarkus.io/guides/config) to get familiar with the basics. - -Quarkus aggregates configuration properties from multiple sources, applying them in a specific order -of precedence. When a property is defined in multiple sources, the value from the source with the -higher priority overrides those from lower-priority sources. - -The sources are listed below, from highest to lowest priority: - -1. System properties: properties set via the Java command line using `-Dproperty.name=value`. -2. Environment variables (see below for important details). -3. Settings in `$PWD/config/application.properties` file. -4. The `application.properties` files packaged in Polaris. -5. Default values: hardcoded defaults within the application. - -When using environment variables, there are two naming conventions: - -1. If possible, just use the property name as the environment variable name. This works fine in most - cases, e.g. in Kubernetes deployments. For example, `polaris.realm-context.realms` can be - included as is in a container YAML definition: - ```yaml - env: - - name: "polaris.realm-context.realms" - value: "realm1,realm2" - ``` - -2. If running from a script or shell prompt, however, stricter naming rules apply: variable names - can consist solely of uppercase letters, digits, and the `_` (underscore) sign. In such - situations, the environment variable name must be derived from the property name, by using - uppercase letters, and replacing all dots, dashes and quotes by underscores. For example, - `polaris.realm-context.realms` becomes `POLARIS_REALM_CONTEXT_REALMS`. See - [here](https://smallrye.io/smallrye-config/Main/config/environment-variables/) for more details. - -> [!IMPORTANT] -> While convenient, uppercase-only environment variables can be problematic for complex property -> names. In these situations, it's preferable to use system properties or a configuration file. - -As stated above, a configuration file can also be provided at runtime; it should be available -(mounted) at `$PWD/config/application.properties` for Polaris server to recognize it. In Polaris -official Docker images, this location is `/deployment/config/application.properties`. - -For Kubernetes deployments, the configuration file is typically defined as a `ConfigMap`, then -mounted in the container at `/deployment/config/application.properties`. It can be mounted in -read-only mode, as Polaris only reads the configuration file once, at startup. - -## Polaris Configuration Options Reference - -| Configuration Property | Default Value | Description | -|----------------------------------------------------------------------------------------|-----------------------|--------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------| -| `polaris.persistence.type` | `relational-jdbc` | Define the persistence backend used by Polaris (`in-memory`, `relational-jdbc`, `eclipse-link` (deprecated)). See [Configuring Apache Polaris for Production)[{{% ref "configuring-polaris-for-production.md" %}}) | -| `polaris.persistence.relational.jdbc.max-retries` | `1` | Total number of retries JDBC persistence will attempt on connection resets or serialization failures before giving up. | -| `polaris.persistence.relational.jdbc.max_duaration_in_ms` | `5000 ms` | Max time interval (ms) since the start of a transaction when retries can be attempted. | -| `polaris.persistence.relational.jdbc.initial_delay_in_ms` | `100 ms` | Initial delay before retrying. The delay is doubled after each retry. | -| `polaris.persistence.eclipselink.configurationFile` | | Define the location of the `persistence.xml`. By default, it's the built-in `persistence.xml` in use. | -| `polaris.persistence.eclipselink.persistenceUnit` | `polaris` | Define the name of the persistence unit to use, as defined in the `persistence.xml`. | -| `polaris.realm-context.type` | `default` | Define the type of the Polaris realm to use. | -| `polaris.realm-context.realms` | `POLARIS` | Define the list of realms to use. | -| `polaris.realm-context.header-name` | `Polaris-Realm` | Define the header name defining the realm context. | -| `polaris.features."ENFORCE_PRINCIPAL_CREDENTIAL_ROTATION_REQUIRED_CHECKING"` | `false` | Flag to enforce check if credential rotation. | -| `polaris.features."SUPPORTED_CATALOG_STORAGE_TYPES"` | `FILE` | Define the catalog supported storage. Supported values are `S3`, `GCS`, `AZURE`, `FILE`. | -| `polaris.features.realm-overrides."my-realm"."SKIP_CREDENTIAL_SUBSCOPING_INDIRECTION"` | `true` | "Override" realm features, here the skip credential subscoping indirection flag. | -| `polaris.authentication.authenticator.type` | `default` | Define the Polaris authenticator type. | -| `polaris.authentication.token-service.type` | `default` | Define the Polaris token service type. | -| `polaris.authentication.token-broker.type` | `rsa-key-pair` | Define the Polaris token broker type. Also configure the location of the key files. For RSA: if the locations of the key files are not configured, an ephemeral key-pair will be created on each Polaris server instance startup, which breaks existing tokens after server restarts and is also incompatible with running multiple Polaris server instances. | -| `polaris.authentication.token-broker.max-token-generation` | `PT1H` | Define the max token generation policy on the token broker. | -| `polaris.authentication.token-broker.rsa-key-pair.private-key-file` | | Define the location of the RSA-256 private key file, if present the `public-key` file must be specified, too. | -| `polaris.authentication.token-broker.rsa-key-pair.public-key-file` | | Define the location of the RSA-256 public key file, if present the `private-key` file must be specified, too. | -| `polaris.authentication.token-broker.symmetric-key.secret` | `secret` | Define the secret of the symmetric key. | -| `polaris.authentication.token-broker.symmetric-key.file` | `/tmp/symmetric.key` | Define the location of the symmetric key file. | -| `polaris.storage.aws.access-key` | `accessKey` | Define the AWS S3 access key. If unset, the default credential provider chain will be used. | -| `polaris.storage.aws.secret-key` | `secretKey` | Define the AWS S3 secret key. If unset, the default credential provider chain will be used. | -| `polaris.storage.gcp.token` | `token` | Define the Google Cloud Storage token. If unset, the default credential provider chain will be used. | -| `polaris.storage.gcp.lifespan` | `PT1H` | Define the Google Cloud Storage lifespan type. If unset, the default credential provider chain will be used. | -| `polaris.log.request-id-header-name` | `Polaris-Request-Id` | Define the header name to match request ID in the log. | -| `polaris.log.mdc.aid` | `polaris` | Define the log context (e.g. MDC) AID. | -| `polaris.log.mdc.sid` | `polaris-service` | Define the log context (e.g. MDC) SID. | -| `polaris.rate-limiter.filter.type` | `no-op` | Define the Polaris rate limiter. Supported values are `no-op`, `token-bucket`. | -| `polaris.rate-limiter.token-bucket.type` | `default` | Define the token bucket rate limiter. | -| `polaris.rate-limiter.token-bucket.requests-per-second` | `9999` | Define the number of requests per second for the token bucket rate limiter. | -| `polaris.rate-limiter.token-bucket.window` | `PT10S` | Define the window type for the token bucket rate limiter. | -| `polaris.metrics.tags.=` | `application=Polaris` | Define arbitrary metric tags to include in every request. | -| `polaris.metrics.realm-id-tag.api-metrics-enabled` | `false` | Whether to enable the `realm_id` metric tag in API metrics. | -| `polaris.metrics.realm-id-tag.http-metrics-enabled` | `false` | Whether to enable the `realm_id` metric tag in HTTP request metrics. | -| `polaris.metrics.realm-id-tag.http-metrics-max-cardinality` | `100` | The maximum cardinality for the `realm_id` tag in HTTP request metrics. | -| `polaris.tasks.max-concurrent-tasks` | `100` | Define the max number of concurrent tasks. | -| `polaris.tasks.max-queued-tasks` | `1000` | Define the max number of tasks in queue. | - -There are non Polaris configuration properties that can be useful: - -| Configuration Property | Default Value | Description | -|------------------------------------------------------|---------------------------------|-----------------------------------------------------------------------------| -| `quarkus.log.level` | `INFO` | Define the root log level. | -| `quarkus.log.category."org.apache.polaris".level` | | Define the log level for a specific category. | -| `quarkus.default-locale` | System locale | Force the use of a specific locale, for instance `en_US`. | -| `quarkus.http.port` | `8181` | Define the HTTP port number. | -| `quarkus.http.auth.basic` | `false` | Enable the HTTP basic authentication. | -| `quarkus.http.limits.max-body-size` | `10240K` | Define the HTTP max body size limit. | -| `quarkus.http.cors.origins` | | Define the HTTP CORS origins. | -| `quarkus.http.cors.methods` | `PATCH, POST, DELETE, GET, PUT` | Define the HTTP CORS covered methods. | -| `quarkus.http.cors.headers` | `*` | Define the HTTP CORS covered headers. | -| `quarkus.http.cors.exposed-headers` | `*` | Define the HTTP CORS covered exposed headers. | -| `quarkus.http.cors.access-control-max-age` | `PT10M` | Define the HTTP CORS access control max age. | -| `quarkus.http.cors.access-control-allow-credentials` | `true` | Define the HTTP CORS access control allow credentials flag. | -| `quarkus.management.enabled` | `true` | Enable the management server. | -| `quarkus.management.port` | `8182` | Define the port number of the Polaris management server. | -| `quarkus.management.root-path` | | Define the root path where `/metrics` and `/health` endpoints are based on. | -| `quarkus.otel.sdk.disabled` | `true` | Enable the OpenTelemetry layer. | - -> Note: This section is only relevant for Polaris Docker images and Kubernetes deployments. - -There are many other actionable environment variables available in the official Polaris Docker -image; they come from the base image used by Polaris, [ubi9/openjdk-21-runtime]. They should be used -to fine-tune the Java runtime directly, e.g. to enable debugging or to set the heap size. These -variables are not specific to Polaris, but are inherited from the base image. If in doubt, leave -everything at its default! - -[ubi9/openjdk-21-runtime]: https://catalog.redhat.com/software/containers/ubi9/openjdk-21-runtime/6501ce769a0d86945c422d5f - -| Environment variable | Description | -|----------------------------------|----------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------| -| `JAVA_OPTS` or `JAVA_OPTIONS` | **NOT RECOMMENDED**. JVM options passed to the `java` command (example: "-verbose:class"). Setting this variable will override all options set by any of the other variables in this table. To pass extra settings, use `JAVA_OPTS_APPEND` instead. | -| `JAVA_OPTS_APPEND` | User specified Java options to be appended to generated options in `JAVA_OPTS` (example: "-Dsome.property=foo"). | -| `JAVA_TOOL_OPTIONS` | This variable is defined and honored by all OpenJDK distros, see [here](https://bugs.openjdk.org/browse/JDK-4971166). Options defined here take precedence over all else; using this variable is generally not necessary, but can be useful e.g. to enforce JVM startup parameters, to set up remote debug, or to define JVM agents. | -| `JAVA_MAX_MEM_RATIO` | Is used to calculate a default maximal heap memory based on a containers restriction. If used in a container without any memory constraints for the container then this option has no effect. If there is a memory constraint then `-XX:MaxRAMPercentage` is set to a ratio of the container available memory as set here. The default is `80` which means 80% of the available memory is used as an upper boundary. You can skip this mechanism by setting this value to `0` in which case no `-XX:MaxRAMPercentage` option is added. | -| `JAVA_DEBUG` | If set remote debugging will be switched on. Disabled by default (example: true"). | -| `JAVA_DEBUG_PORT` | Port used for remote debugging. Defaults to "5005" (tip: use "*:5005" to enable debugging on all network interfaces). | -| `GC_MIN_HEAP_FREE_RATIO` | Minimum percentage of heap free after GC to avoid expansion. Default is 10. | -| `GC_MAX_HEAP_FREE_RATIO` | Maximum percentage of heap free after GC to avoid shrinking. Default is 20. | -| `GC_TIME_RATIO` | Specifies the ratio of the time spent outside the garbage collection. Default is 4. | -| `GC_ADAPTIVE_SIZE_POLICY_WEIGHT` | The weighting given to the current GC time versus previous GC times. Default is 90. | -| `GC_METASPACE_SIZE` | The initial metaspace size. There is no default (example: "20"). | -| `GC_MAX_METASPACE_SIZE` | The maximum metaspace size. There is no default (example: "100"). | -| `GC_CONTAINER_OPTIONS` | Specify Java GC to use. The value of this variable should contain the necessary JRE command-line options to specify the required GC, which will override the default of `-XX:+UseParallelGC` (example: `-XX:+UseG1GC`). | -Here are some examples: - -| Example | `docker run` option | -|--------------------------------------------|---------------------------------------------------------------------------------------------------------------------| -| Using another GC | `-e GC_CONTAINER_OPTIONS="-XX:+UseShenandoahGC"` lets Polaris use Shenandoah GC instead of the default parallel GC. | -| Set the Java heap size to a _fixed_ amount | `-e JAVA_OPTS_APPEND="-Xms8g -Xmx8g"` lets Polaris use a Java heap of 8g. | -| Set the maximum heap percentage | `-e JAVA_MAX_MEM_RATIO="70"` lets Polaris use 70% percent of the available memory. | - - -## Troubleshooting Configuration Issues - -If you encounter issues with the configuration, you can ask Polaris to print out the configuration it -is using. To do this, set the log level for the `io.smallrye.config` category to `DEBUG`, and also -set the console appender level to `DEBUG`: - -```properties -quarkus.log.console.level=DEBUG -quarkus.log.category."io.smallrye.config".level=DEBUG -``` - -> [!IMPORTANT] This will print out all configuration values, including sensitive ones like -> passwords. Don't do this in production, and don't share this output with anyone you don't trust! diff --git a/site/content/in-dev/1.0.0/configuring-polaris-for-production.md b/site/content/in-dev/1.0.0/configuring-polaris-for-production.md deleted file mode 100644 index fac51b40f9..0000000000 --- a/site/content/in-dev/1.0.0/configuring-polaris-for-production.md +++ /dev/null @@ -1,222 +0,0 @@ ---- -# -# Licensed to the Apache Software Foundation (ASF) under one -# or more contributor license agreements. See the NOTICE file -# distributed with this work for additional information -# regarding copyright ownership. The ASF licenses this file -# to you under the Apache License, Version 2.0 (the -# "License"); you may not use this file except in compliance -# with the License. You may obtain a copy of the License at -# -# http://www.apache.org/licenses/LICENSE-2.0 -# -# Unless required by applicable law or agreed to in writing, -# software distributed under the License is distributed on an -# "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY -# KIND, either express or implied. See the License for the -# specific language governing permissions and limitations -# under the License. -# -title: Configuring Polaris for Production -linkTitle: Production Configuration -type: docs -weight: 600 ---- - -The default server configuration is intended for development and testing. When you deploy Polaris in production, -review and apply the following checklist: -- [ ] Configure OAuth2 keys -- [ ] Enforce realm header validation (`require-header=true`) -- [ ] Use a durable metastore (JDBC + PostgreSQL) -- [ ] Bootstrap valid realms in the metastore -- [ ] Disable local FILE storage - -### Configure OAuth2 - -Polaris authentication requires specifying a token broker factory type. Two implementations are -supported out of the box: - -- [rsa-key-pair] uses a pair of public and private keys; -- [symmetric-key] uses a shared secret. - -[rsa-key-pair]: https://github.com/apache/polaris/blob/390f1fa57bb1af24a21aa95fdbff49a46e31add7/service/common/src/main/java/org/apache/polaris/service/auth/JWTRSAKeyPairFactory.java -[symmetric-key]: https://github.com/apache/polaris/blob/390f1fa57bb1af24a21aa95fdbff49a46e31add7/service/common/src/main/java/org/apache/polaris/service/auth/JWTSymmetricKeyFactory.java - -By default, Polaris uses `rsa-key-pair`, with randomly generated keys. - -> [!IMPORTANT] -> The default `rsa-key-pair` configuration is not suitable when deploying many replicas of Polaris, -> as each replica will have its own set of keys. This will cause token validation to fail when a -> request is routed to a different replica than the one that issued the token. - -It is highly recommended to configure Polaris with previously-generated RSA keys. This can be done -by setting the following properties: - -```properties -polaris.authentication.token-broker.type=rsa-key-pair -polaris.authentication.token-broker.rsa-key-pair.public-key-file=/tmp/public.key -polaris.authentication.token-broker.rsa-key-pair.private-key-file=/tmp/private.key -``` - -To generate an RSA key pair, you can use the following commands: - -```shell -openssl genrsa -out private.key 2048 -openssl rsa -in private.key -pubout -out public.key -``` - -Alternatively, you can use a symmetric key by setting the following properties: - -```properties -polaris.authentication.token-broker.type=symmetric-key -polaris.authentication.token-broker.symmetric-key.file=/tmp/symmetric.key -``` - -Note: it is also possible to set the symmetric key secret directly in the configuration file. If -possible, pass the secret as an environment variable to avoid storing sensitive information in the -configuration file: - -```properties -polaris.authentication.token-broker.symmetric-key.secret=${POLARIS_SYMMETRIC_KEY_SECRET} -``` - -Finally, you can also configure the token broker to use a maximum lifespan by setting the following -property: - -```properties -polaris.authentication.token-broker.max-token-generation=PT1H -``` - -Typically, in Kubernetes, you would define the keys as a `Secret` and mount them as files in the -container. - -### Realm Context Resolver - -By default, Polaris resolves realms based on incoming request headers. You can configure the realm -context resolver by setting the following properties in `application.properties`: - -```properties -polaris.realm-context.realms=POLARIS,MY-REALM -polaris.realm-context.header-name=Polaris-Realm -``` - -Where: - -- `realms` is a comma-separated list of allowed realms. This setting _must_ be correctly configured. - At least one realm must be specified. -- `header-name` is the name of the header used to resolve the realm; by default, it is - `Polaris-Realm`. - -If a request contains the specified header, Polaris will use the realm specified in the header. If -the realm is not in the list of allowed realms, Polaris will return a `404 Not Found` response. - -If a request _does not_ contain the specified header, however, by default Polaris will use the first -realm in the list as the default realm. In the above example, `POLARIS` is the default realm and -would be used if the `Polaris-Realm` header is not present in the request. - -This is not recommended for production use, as it may lead to security vulnerabilities. To avoid -this, set the following property to `true`: - -```properties -polaris.realm-context.require-header=true -``` - -This will cause Polaris to also return a `404 Not Found` response if the realm header is not present -in the request. - -### Metastore Configuration - -A metastore should be configured with an implementation that durably persists Polaris entities. By -default, Polaris uses an in-memory metastore. - -> [!IMPORTANT] -> The default in-memory metastore is not suitable for production use, as it will lose all data -> when the server is restarted; it is also unusable when multiple Polaris replicas are used. - -To enable a durable metastore, configure your system to use the Relational JDBC-backed metastore. -This implementation leverages Quarkus for datasource management and supports configuration through -environment variables or JVM -D flags at startup. For more information, refer to the [Quarkus configuration reference](https://quarkus.io/guides/config-reference#env-file). - -Configure the metastore by setting the following ENV variables: - -``` -POLARIS_PERSISTENCE_TYPE=relational-jdbc - -QUARKUS_DATASOURCE_DB_KIND=postgresql -QUARKUS_DATASOURCE_USERNAME= -QUARKUS_DATASOURCE_PASSWORD= -QUARKUS_DATASOURCE_JDBC_URL= -``` - - -The relational JDBC metastore is a Quarkus-managed datasource and only supports Postgres and H2 as of now. -Please refer to the documentation here: -[Configure data sources in Quarkus](https://quarkus.io/guides/datasource) - -> [!IMPORTANT] -> Be sure to secure your metastore backend since it will be storing sensitive data and catalog -> metadata. - -Note: Polaris will always create schema 'polaris_schema' during bootstrap under the configured database. - -### Bootstrapping - -Before using Polaris, you must **bootstrap** the metastore. This is a manual operation that must be -performed **only once** for each realm in order to prepare the metastore to integrate with Polaris. - -By default, when bootstrapping a new realm, Polaris will create randomised `CLIENT_ID` and -`CLIENT_SECRET` for the `root` principal and store their hashes in the metastore backend. - -Depending on your database, this may not be convenient as the generated credentials are not stored -in clear text in the database. - -In order to provide your own credentials for `root` principal (so you can request tokens via -`api/catalog/v1/oauth/tokens`), use the [Polaris Admin Tool]({{% ref "admin-tool" %}}) - -You can verify the setup by attempting a token issue for the `root` principal: - -```bash -curl -X POST http://localhost:8181/api/catalog/v1/oauth/tokens \ - -d "grant_type=client_credentials" \ - -d "client_id=my-client-id" \ - -d "client_secret=my-client-secret" \ - -d "scope=PRINCIPAL_ROLE:ALL" -``` - -Which should return an access token: - -```json -{ - "access_token": "...", - "token_type": "bearer", - "issued_token_type": "urn:ietf:params:oauth:token-type:access_token", - "expires_in": 3600 -} -``` - -If you used a non-default realm name, add the appropriate request header to the `curl` command, -otherwise Polaris will resolve the realm to the first one in the configuration -`polaris.realm-context.realms`. Here is an example to set realm header: - -```bash -curl -X POST http://localhost:8181/api/catalog/v1/oauth/tokens \ - -H "Polaris-Realm: my-realm" \ - -d "grant_type=client_credentials" \ - -d "client_id=my-client-id" \ - -d "client_secret=my-client-secret" \ - -d "scope=PRINCIPAL_ROLE:ALL" -``` - -### Disable FILE Storage Type -By default, Polaris allows using the local file system (`FILE`) for catalog storage. This is fine for testing, -but **not recommended for production**. To disable it, set the supported storage types like this: -```hocon -polaris.features."SUPPORTED_CATALOG_STORAGE_TYPES" = [ "S3", "Azure" ] -``` -Leave out `FILE` to prevent its use. Only include the storage types your setup needs. - -### Upgrade Considerations - -The [Polaris Evolution](../evolution) page discusses backward compatibility and -upgrade concerns. - diff --git a/site/content/in-dev/1.0.0/entities.md b/site/content/in-dev/1.0.0/entities.md deleted file mode 100644 index 04d625bb94..0000000000 --- a/site/content/in-dev/1.0.0/entities.md +++ /dev/null @@ -1,95 +0,0 @@ ---- -# -# Licensed to the Apache Software Foundation (ASF) under one -# or more contributor license agreements. See the NOTICE file -# distributed with this work for additional information -# regarding copyright ownership. The ASF licenses this file -# to you under the Apache License, Version 2.0 (the -# "License"); you may not use this file except in compliance -# with the License. You may obtain a copy of the License at -# -# http://www.apache.org/licenses/LICENSE-2.0 -# -# Unless required by applicable law or agreed to in writing, -# software distributed under the License is distributed on an -# "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY -# KIND, either express or implied. See the License for the -# specific language governing permissions and limitations -# under the License. -# -Title: Entities -type: docs -weight: 400 ---- - -This page documents various entities that can be managed in Apache Polaris (Incubating). - -## Catalog - -A catalog is a top-level entity in Polaris that may contain other entities like [namespaces](#namespace) and [tables](#table). These map directly to [Apache Iceberg catalogs](https://iceberg.apache.org/concepts/catalog/). - -For information on managing catalogs with the REST API or for more information on what data can be associated with a catalog, see [the API docs]({{% github-polaris "client/python/docs/CreateCatalogRequest.md" %}}). - -### Storage Type - -All catalogs in Polaris are associated with a _storage type_. Valid Storage Types are `S3`, `Azure`, and `GCS`. The `FILE` type is also additionally available for testing. Each of these types relates to a different storage provider where data within the catalog may reside. Depending on the storage type, various other configurations may be set for a catalog including credentials to be used when accessing data inside the catalog. - -For details on how to use Storage Types in the REST API, see [the API docs]({{% github-polaris "client/python/docs/StorageConfigInfo.md" %}}). - -For usage examples of storage types, see [docs]({{% ref "command-line-interface" %}}). - -## Namespace - -A namespace is a logical entity that resides within a [catalog](#catalog) and can contain other entities such as [tables](#table) or [views](#view). Some other systems may refer to namespaces as _schemas_ or _databases_. - -In Polaris, namespaces can be nested. For example, `a.b.c.d.e.f.g` is a valid namespace. `b` is said to reside within `a`, and so on. - -For information on managing namespaces with the REST API or for more information on what data can be associated with a namespace, see [the API docs]({{% github-polaris "client/python/docs/CreateNamespaceRequest.md" %}}). - -## Table - -Polaris tables are entities that map to [Apache Iceberg tables](https://iceberg.apache.org/docs/nightly/configuration/), [Delta tables](https://docs.databricks.com/aws/en/delta/table-properties), or [Hudi tables](https://hudi.apache.org/docs/next/configurations#TABLE_CONFIG). - -For information on managing tables with the REST API or for more information on what data can be associated with a table, see [the API docs]({{% github-polaris "client/python/docs/CreateTableRequest.md" %}}). - -## View - -Polaris views are entities that map to [Apache Iceberg views](https://iceberg.apache.org/view-spec/). - -For information on managing views with the REST API or for more information on what data can be associated with a view, see [the API docs]({{% github-polaris "client/python/docs/CreateViewRequest.md" %}}). - -## Principal - -Polaris principals are unique identities that can be used to represent users or services. Each principal may have one or more [principal roles](#principal-role) assigned to it for the purpose of accessing catalogs and the entities within them. - -For information on managing principals with the REST API or for more information on what data can be associated with a principal, see [the API docs]({{% github-polaris "client/python/docs/CreatePrincipalRequest.md" %}}). - -## Principal Role - -Polaris principal roles are labels that may be granted to [principals](#principal). Each principal may have one or more principal roles, and the same principal role may be granted to multiple principals. Principal roles may be assigned based on the persona or responsibilities of a given principal, or on how that principal will need to access different entities within Polaris. - -For information on managing principal roles with the REST API or for more information on what data can be associated with a principal role, see [the API docs]({{% github-polaris "client/python/docs/CreatePrincipalRoleRequest.md" %}}). - -## Catalog Role - -Polaris catalog roles are labels that may be granted to [catalogs](#catalog). Each catalog may have one or more catalog roles, and the same catalog role may be granted to multiple catalogs. Catalog roles may be assigned based on the nature of data that will reside in a catalog, or by the groups of users and services that might need to access that data. - -Each catalog role may have multiple [privileges](#privilege) granted to it, and each catalog role can be granted to one or more [principal roles](#principal-role). This is the mechanism by which principals are granted access to entities inside a catalog such as namespaces and tables. - -## Policy - -Polaris policy is a set of rules governing actions on specified resources under predefined conditions. Polaris support policy for Iceberg table compaction, snapshot expiry, row-level access control, and custom policy definitions. - -Policy can be applied at catalog level, namespace level, or table level. Policy inheritance can be achieved by attaching one to a higher-level scope, such as namespace or catalog. As a result, tables registered under those entities do not need to be declared individually for the same policy. If a table or a namespace requires a different policy, user can assign a different policy, hence overriding policy of the same type declared at the higher level entities. - -## Privilege - -Polaris privileges are granted to [catalog roles](#catalog-role) in order to grant principals with a given principal role some degree of access to catalogs with a given catalog role. When a privilege is granted to a catalog role, any principal roles granted that catalog role receive the privilege. In turn, any principals who are granted that principal role receive it. - -A privilege can be scoped to any entity inside a catalog, including the catalog itself. - -For a list of supported privileges for each privilege class, see the API docs: -* [Table Privileges]({{% github-polaris "client/python/docs/TablePrivilege.md" %}}) -* [View Privileges]({{% github-polaris "client/python/docs/ViewPrivilege.md" %}}) -* [Namespace Privileges]({{% github-polaris "client/python/docs/NamespacePrivilege.md" %}}) -* [Catalog Privileges]({{% github-polaris "client/python/docs/CatalogPrivilege.md" %}}) diff --git a/site/content/in-dev/1.0.0/evolution.md b/site/content/in-dev/1.0.0/evolution.md deleted file mode 100644 index ea29badc84..0000000000 --- a/site/content/in-dev/1.0.0/evolution.md +++ /dev/null @@ -1,115 +0,0 @@ ---- -# -# Licensed to the Apache Software Foundation (ASF) under one -# or more contributor license agreements. See the NOTICE file -# distributed with this work for additional information -# regarding copyright ownership. The ASF licenses this file -# to you under the Apache License, Version 2.0 (the -# "License"); you may not use this file except in compliance -# with the License. You may obtain a copy of the License at -# -# http://www.apache.org/licenses/LICENSE-2.0 -# -# Unless required by applicable law or agreed to in writing, -# software distributed under the License is distributed on an -# "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY -# KIND, either express or implied. See the License for the -# specific language governing permissions and limitations -# under the License. -# -title: Polaris Evolution -type: docs -weight: 1000 ---- - -This page discusses what can be expected from Apache Polaris as the project evolves. - -## Using Polaris as a Catalog - -Polaris is primarily intended to be used as a Catalog of Tables and Views. As such, -it implements the Iceberg REST Catalog API and its own REST APIs. - -Revisions of the Iceberg REST Catalog API are controlled by the [Apache Iceberg](https://iceberg.apache.org/) -community. Polaris attempts to accurately implement this specification. Nonetheless, -optional REST Catalog features may or may not be supported immediately. In general, -there is no guarantee that Polaris releases always implement the latest version of -the Iceberg REST Catalog API. - -Any API under Polaris control that is not in an "experimental" or "beta" state -(e.g. the Management API) is maintained as a versioned REST API. New releases of Polaris -may include changes to the current version of the API. When that happens those changes -are intended to be compatible with prior versions of Polaris clients. Certain endpoints -and parameters may be deprecated. - -In case a major change is required to an API that cannot be implemented in a -backward-compatible way, new endpoints (URI paths) may be introduced. New URI "roots" may -be introduced too (e.g. `api/catalog/v2`). - -Note that those "v1", "v2", etc. URI path segments are not meant to be 1:1 with Polaris -releases or Polaris project version numbers (e.g. a "v2" path segment does not mean that -it is added in Polaris 2.0). - -Polaris servers will support deprecated API endpoints / parameters / versions / etc. -for some transition period to allow clients to migrate. - -### Managing Polaris Database - -Polaris stores its data in a database, which is sometimes referred to as "Metastore" or -"Persistence" in other docs. - -Each Polaris release may support multiple Persistence [implementations](../metastores), -for example, "EclipseLink" (deprecated) and "JDBC" (current). - -Each type of Persistence evolves individually. Within each Persistence type, Polaris -attempts to support rolling upgrades (both version X and X + 1 servers running at the -same time). - -However, migrating between different Persistence types is not supported in a rolling -upgrade manner (for example, migrating from "EclipseLink" to "JDBC"). Polaris provides -[tools](https://github.com/apache/polaris-tools/) for migrating between different -catalogs and those tools may be used to migrate between different Persistence types -as well. Service interruption (downtime) should be expected in those cases. - -## Using Polaris as a Build-Time Dependency - -Polaris produces several jars. These jars or custom builds of Polaris code may be used in -downstream projects according to the terms of the license included into Polaris distributions. - -The minimal version of the JRE required by Polaris code (compilation target) may be updated in -any release. Different Polaris jars may have different minimal JRE version requirements. - -Changes in Java class should be expected at any time regardless of the module name or -whether the class / method is `public` or not. - -This approach is not meant to discourage the use of Polaris code in downstream projects, but -to allow more flexibility in evolving the codebase to support new catalog-level features -and improve code efficiency. Maintainers of downstream projects are encouraged to join Polaris -mailing lists to monitor project changes, suggest improvements, and engage with the Polaris -community in case of specific compatibility concerns. - -## Semantic Versioning - -Polaris strives to follow [Semantic Versioning](https://semver.org/) conventions both with -respect to REST APIs (beta and experimental APIs excepted), [Polaris Policies](../policy/) -and user-facing [configuration](../configuration/). - -The following are some examples of Polaris approach to SemVer in REST APIs / configuration. -These examples are for illustration purposes and should not be considered to be -exhaustive. - -* Polaris implementing an optional Iceberg REST Catalog feature that was unimplemented -in the previous release is not considered a major change. - -* Supporting a new revision of the Iceberg REST Catalog spec in a backward-compatible way -is not considered a major change. Specifically, supporting new REST API prefixes (e.g. `v2`) -is not a major change because it does not affect older clients. - -* Changing the implementation of an Iceberg REST Catalog feature / endpoint in a non-backward -compatible way (e.g. removing or renaming a request parameter) is a major change. - -* Dropping support for a configuration property with the `polaris.` name prefix is a major change. - -* Dropping support for any previously defined [Policy](../policy/) type or property is a major change. - -* Upgrading Quarkus Runtime to its next major version is a major change (because -Quarkus-managed configuration may change). diff --git a/site/content/in-dev/1.0.0/generic-table.md b/site/content/in-dev/1.0.0/generic-table.md deleted file mode 100644 index 2e0e3fe8e6..0000000000 --- a/site/content/in-dev/1.0.0/generic-table.md +++ /dev/null @@ -1,169 +0,0 @@ ---- -# -# Licensed to the Apache Software Foundation (ASF) under one -# or more contributor license agreements. See the NOTICE file -# distributed with this work for additional information -# regarding copyright ownership. The ASF licenses this file -# to you under the Apache License, Version 2.0 (the -# "License"); you may not use this file except in compliance -# with the License. You may obtain a copy of the License at -# -# http://www.apache.org/licenses/LICENSE-2.0 -# -# Unless required by applicable law or agreed to in writing, -# software distributed under the License is distributed on an -# "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY -# KIND, either express or implied. See the License for the -# specific language governing permissions and limitations -# under the License. -# -title: Generic Table (Beta) -type: docs -weight: 435 ---- - -The Generic Table in Apache Polaris is designed to provide support for non-Iceberg tables across different table formats includes delta, csv etc. It currently provides the following capabilities: -- Create a generic table under a namespace -- Load a generic table -- Drop a generic table -- List all generic tables under a namespace - -**NOTE** The current generic table is in beta release. Please use it with caution and report any issue if encountered. - -## What is a Generic Table? - -A generic table in Polaris is an entity that defines the following fields: - -- **name** (required): A unique identifier for the table within a namespace -- **format** (required): The format for the generic table, i.e. "delta", "csv" -- **base-location** (optional): Table base location in URI format. For example: s3:///path/to/table - - The table base location is a location that includes all files for the table - - A table with multiple disjoint locations (i.e. containing files that are outside the configured base location) is not compliant with the current generic table support in Polaris. - - If no location is provided, clients or users are responsible for managing the location. -- **properties** (optional): Properties for the generic table passed on creation. - - Currently, there is no reserved property key defined. - - The property definition and interpretation is delegated to client or engine implementations. -- **doc** (optional): Comment or description for the table - -## Generic Table API Vs. Iceberg Table API - -Generic Table provides a different set of APIs to operate on the generic table entities while Iceberg APIs operates on -the Iceberg table entities. - -| Operations | **Iceberg Table API** | **Generic Table API** | -|--------------|---------------------------------------------------------------------------------------------------------------------------------------------------------------------|-------------------------------------------------------------------------------------------------------------------------------| -| Create Table | Create an Iceberg table | Create a generic table | -| Load Table | Load an Iceberg table. If the table to load is a generic table, you need to call the Generic Table loadTable API, otherwise a TableNotFoundException will be thrown | Load a generic table. Similarly, try to load an Iceberg table through Generic Table API will thrown a TableNotFoundException. | -| Drop Table | Drop an Iceberg table. Similar as load table, if the table to drop is a Generic table, a tableNotFoundException will be thrown. | Drop a generic table. Drop an Iceberg table through Generic table endpoint will thrown an TableNotFound Exception | -| List Table | List all Iceberg tables | List all generic tables | - -Note that generic table shares the same namespace with Iceberg tables, the table name has to be unique under the same namespace. Furthermore, since -there is currently no support for Update Generic Table, any update to the existing table requires a drop and re-create. - -## Working with Generic Table - -There are two ways to work with Polaris Generic Tables today: -1) Directly communicate with Polaris through REST API calls using tools such as `curl`. Details will be described in the later section. -2) Use the Spark client provided if you are working with Spark. Please refer to [Polaris Spark Client]({{% ref "polaris-spark-client" %}}) for detailed instructions. - -### Create a Generic Table - -To create a generic table, you need to provide the corresponding fields as described in [What is a Generic Table](#what-is-a-generic-table). - -The REST API for creating a generic Table is `POST /polaris/v1/{prefix}/namespaces/{namespace}/generic-tables`, and the -request body looks like the following: - -```json -{ - "name": "", - "format": "", - "base-location": "", - "doc": "", - "properties": { - "": "" - } -} -``` - -Here is an example to create a generic table with name `delta_table` and format as `delta` under a namespace `delta_ns` -for catalog `delta_catalog` using curl: - -```shell -curl -X POST http://localhost:8181/api/catalog/polaris/v1/delta_catalog/namespaces/delta_ns/generic-tables \ - -H "Content-Type: application/json" \ - -d '{ - "name": "delta_table", - "format": "delta", - "base-location": "s3:///path/to/table", - "doc": "delta table example", - "properties": { - "key1": "value1" - } - }' -``` - -### Load a Generic Table -The REST endpoint for load a generic table is `GET /polaris/v1/{prefix}/namespaces/{namespace}/generic-tables/{generic-table}`. - -Here is an example to load the table `delta_table` using curl: -```shell -curl -X GET http://localhost:8181/api/catalog/polaris/v1/delta_catalog/namespaces/delta_ns/generic-tables/delta_table -``` -And the response looks like the following: -```json -{ - "table": { - "name": "delta_table", - "format": "delta", - "base-location": "s3:///path/to/table", - "doc": "delta table example", - "properties": { - "key1": "value1" - } - } -} -``` - -### List Generic Tables -The REST endpoint for listing the generic tables under a given -namespace is `GET /polaris/v1/{prefix}/namespaces/{namespace}/generic-tables/`. - -Following curl command lists all tables under namespace delta_namespace: -```shell -curl -X GET http://localhost:8181/api/catalog/polaris/v1/delta_catalog/namespaces/delta_ns/generic-tables/ -``` -Example Response: -```json -{ - "identifiers": [ - { - "namespace": ["delta_ns"], - "name": "delta_table" - } - ], - "next-page-token": null -} -``` - -### Drop a Generic Table -The drop generic table REST endpoint is `DELETE /polaris/v1/{prefix}/namespaces/{namespace}/generic-tables/{generic-table}` - -The following curl call drops the table `delat_table`: -```shell -curl -X DELETE http://localhost:8181/api/catalog/polaris/v1/delta_catalog/namespaces/delta_ns/generic-tables/{generic-table} -``` - -### API Reference - -For the complete and up-to-date API specification, see the [Catalog API Spec](https://editor-next.swagger.io/?url=https://raw.githubusercontent.com/apache/polaris/refs/heads/main/spec/generated/bundled-polaris-catalog-service.yaml). - -## Limitations - -Current limitations of Generic Table support: -1) Limited spec information. Currently, there is no spec for information like Schema, Partition etc. -2) No commit coordination or update capability provided at the catalog service level. - -Therefore, the catalog itself is unaware of anything about the underlying table except some of the loosely defined metadata. -It is the responsibility of the engine (and plugins used by the engine) to determine exactly how loading or commiting data -should look like based on the metadata. For example, with the delta support, th delta log serialization, deserialization -and update all happens at client side. diff --git a/site/content/in-dev/1.0.0/getting-started/_index.md b/site/content/in-dev/1.0.0/getting-started/_index.md deleted file mode 100644 index d4f13e6f63..0000000000 --- a/site/content/in-dev/1.0.0/getting-started/_index.md +++ /dev/null @@ -1,25 +0,0 @@ ---- -# -# Licensed to the Apache Software Foundation (ASF) under one -# or more contributor license agreements. See the NOTICE file -# distributed with this work for additional information -# regarding copyright ownership. The ASF licenses this file -# to you under the Apache License, Version 2.0 (the -# "License"); you may not use this file except in compliance -# with the License. You may obtain a copy of the License at -# -# http://www.apache.org/licenses/LICENSE-2.0 -# -# Unless required by applicable law or agreed to in writing, -# software distributed under the License is distributed on an -# "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY -# KIND, either express or implied. See the License for the -# specific language governing permissions and limitations -# under the License. -# -title: 'Getting Started' -type: docs -weight: 101 -build: - render: never ---- \ No newline at end of file diff --git a/site/content/in-dev/1.0.0/getting-started/deploying-polaris/_index.md b/site/content/in-dev/1.0.0/getting-started/deploying-polaris/_index.md deleted file mode 100644 index 32fd5dafd6..0000000000 --- a/site/content/in-dev/1.0.0/getting-started/deploying-polaris/_index.md +++ /dev/null @@ -1,27 +0,0 @@ ---- -# -# Licensed to the Apache Software Foundation (ASF) under one -# or more contributor license agreements. See the NOTICE file -# distributed with this work for additional information -# regarding copyright ownership. The ASF licenses this file -# to you under the Apache License, Version 2.0 (the -# "License"); you may not use this file except in compliance -# with the License. You may obtain a copy of the License at -# -# http://www.apache.org/licenses/LICENSE-2.0 -# -# Unless required by applicable law or agreed to in writing, -# software distributed under the License is distributed on an -# "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY -# KIND, either express or implied. See the License for the -# specific language governing permissions and limitations -# under the License. -# -Title: Deploying Polaris on Cloud Providers -type: docs -weight: 300 ---- - -We will now demonstrate how to deploy Polaris locally, as well as with all supported Cloud Providers: Amazon Web Services (AWS), Azure, and Google Cloud Platform (GCP). - -Locally, Polaris can be deployed using both Docker and local build. On the cloud, this tutorial will deploy Polaris using Docker only - but local builds can also be executed. \ No newline at end of file diff --git a/site/content/in-dev/1.0.0/getting-started/deploying-polaris/quickstart-deploy-aws.md b/site/content/in-dev/1.0.0/getting-started/deploying-polaris/quickstart-deploy-aws.md deleted file mode 100644 index fd95b72b0c..0000000000 --- a/site/content/in-dev/1.0.0/getting-started/deploying-polaris/quickstart-deploy-aws.md +++ /dev/null @@ -1,57 +0,0 @@ ---- -# -# Licensed to the Apache Software Foundation (ASF) under one -# or more contributor license agreements. See the NOTICE file -# distributed with this work for additional information -# regarding copyright ownership. The ASF licenses this file -# to you under the Apache License, Version 2.0 (the -# "License"); you may not use this file except in compliance -# with the License. You may obtain a copy of the License at -# -# http://www.apache.org/licenses/LICENSE-2.0 -# -# Unless required by applicable law or agreed to in writing, -# software distributed under the License is distributed on an -# "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY -# KIND, either express or implied. See the License for the -# specific language governing permissions and limitations -# under the License. -# -Title: Deploying Polaris on Amazon Web Services (AWS) -type: docs -weight: 310 ---- - -Build and launch Polaris using the AWS Startup Script at the location provided in the command below. This script will start an [Amazon RDS for PostgreSQL](https://aws.amazon.com/rds/postgresql/) instance, which will be used as the backend Postgres instance holding all Polaris data. -Additionally, Polaris will be bootstrapped to use this database and Docker containers will be spun up for Spark SQL and Trino. - -The requirements to run the script below are: -* There must be at least two subnets created in the VPC and region in which your EC2 instance reside. The span of subnets MUST include at least 2 availability zones (AZs) within the same region. -* Your EC2 instance must be enabled with [IMDSv1 or IMDSv2 with 2+ hop limit](https://docs.aws.amazon.com/AWSEC2/latest/UserGuide/configuring-IMDS-new-instances.html#configure-IMDS-new-instances-instance-settings). -* The AWS identity that you will use to run this script must have the following AWS permissions: - * "ec2:DescribeInstances" - * "rds:CreateDBInstance" - * "rds:DescribeDBInstances" - * "rds:CreateDBSubnetGroup" - * "sts:AssumeRole" on the same role as the Instance Profile role of the EC2 instance on which you are running this script. Additionally, you should ensure that the Instance Profile contains a trust policy that allows the role to trust itself to be assumed. - -```shell -chmod +x getting-started/assets/cloud_providers/deploy-aws.sh -export ASSETS_PATH=$(pwd)/getting-started/assets/ -export CLIENT_ID=root -export CLIENT_SECRET=s3cr3t -./getting-started/assets/cloud_providers/deploy-aws.sh -``` - -## Next Steps -Congrats, you now have a running instance of1 Polaris! For details on how to use Polaris, check out the [Using Polaris]({{% relref "../using-polaris.md" %}}) page. - -## Cleanup Instructions -To shut down the Polaris server, run the following commands: - -```shell -export ASSETS_PATH=$(pwd)/getting-started/assets/ -docker compose -p polaris -f getting-started/eclipselink/docker-compose.yml down -``` - -To deploy Polaris in a production setting, please review further recommendations at the [Configuring Polaris for Production]({{% relref "../../configuring-polaris-for-production" %}}) page. \ No newline at end of file diff --git a/site/content/in-dev/1.0.0/getting-started/deploying-polaris/quickstart-deploy-azure.md b/site/content/in-dev/1.0.0/getting-started/deploying-polaris/quickstart-deploy-azure.md deleted file mode 100644 index 74df725db0..0000000000 --- a/site/content/in-dev/1.0.0/getting-started/deploying-polaris/quickstart-deploy-azure.md +++ /dev/null @@ -1,52 +0,0 @@ ---- -# -# Licensed to the Apache Software Foundation (ASF) under one -# or more contributor license agreements. See the NOTICE file -# distributed with this work for additional information -# regarding copyright ownership. The ASF licenses this file -# to you under the Apache License, Version 2.0 (the -# "License"); you may not use this file except in compliance -# with the License. You may obtain a copy of the License at -# -# http://www.apache.org/licenses/LICENSE-2.0 -# -# Unless required by applicable law or agreed to in writing, -# software distributed under the License is distributed on an -# "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY -# KIND, either express or implied. See the License for the -# specific language governing permissions and limitations -# under the License. -# -Title: Deploying Polaris on Azure -type: docs -weight: 320 ---- - -Build and launch Polaris using the AWS Startup Script at the location provided in the command below. This script will start an [Azure Database for PostgreSQL - Flexible Server](https://learn.microsoft.com/en-us/azure/postgresql/flexible-server/overview) instance, which will be used as the backend Postgres instance holding all Polaris data. -Additionally, Polaris will be bootstrapped to use this database and Docker containers will be spun up for Spark SQL and Trino. - -The requirements to run the script below are: -* Install the AZ CLI, if it is not already installed on the Azure VM. Instructions to download the AZ CLI can be found [here](https://learn.microsoft.com/en-us/cli/azure/install-azure-cli). -* You must be logged into the AZ CLI. Please run `az account show` to ensure that you are logged in prior to running this script. -* Assign a System-Assigned Managed Identity to the Azure VM. - -```shell -chmod +x getting-started/assets/cloud_providers/deploy-azure.sh -export ASSETS_PATH=$(pwd)/getting-started/assets/ -export CLIENT_ID=root -export CLIENT_SECRET=s3cr3t -./getting-started/assets/cloud_providers/deploy-azure.sh -``` - -## Next Steps -Congrats, you now have a running instance of Polaris! For further information regarding how to use Polaris, check out the [Using Polaris]({{% relref "../using-polaris" %}}) page. - -## Cleanup Instructions -To shut down the Polaris server, run the following commands: - -```shell -export ASSETS_PATH=$(pwd)/getting-started/assets/ -docker compose -p polaris -f getting-started/eclipselink/docker-compose.yml down -``` - -To deploy Polaris in a production setting, please review further recommendations at the [Configuring Polaris for Production]({{% relref "../../configuring-polaris-for-production" %}}) page. \ No newline at end of file diff --git a/site/content/in-dev/1.0.0/getting-started/deploying-polaris/quickstart-deploy-gcp.md b/site/content/in-dev/1.0.0/getting-started/deploying-polaris/quickstart-deploy-gcp.md deleted file mode 100644 index 9641ad7282..0000000000 --- a/site/content/in-dev/1.0.0/getting-started/deploying-polaris/quickstart-deploy-gcp.md +++ /dev/null @@ -1,52 +0,0 @@ ---- -# -# Licensed to the Apache Software Foundation (ASF) under one -# or more contributor license agreements. See the NOTICE file -# distributed with this work for additional information -# regarding copyright ownership. The ASF licenses this file -# to you under the Apache License, Version 2.0 (the -# "License"); you may not use this file except in compliance -# with the License. You may obtain a copy of the License at -# -# http://www.apache.org/licenses/LICENSE-2.0 -# -# Unless required by applicable law or agreed to in writing, -# software distributed under the License is distributed on an -# "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY -# KIND, either express or implied. See the License for the -# specific language governing permissions and limitations -# under the License. -# -Title: Deploying Polaris on Google Cloud Platform (GCP) -type: docs -weight: 330 ---- - -Build and launch Polaris using the AWS Startup Script at the location provided in the command below. This script will start a [Cloud SQL for PostgreSQL](https://cloud.google.com/sql/docs/postgres) instance, which will be used as the backend Postgres instance holding all Polaris data. -Additionally, Polaris will be bootstrapped to use this database and Docker containers will be spun up for Spark SQL and Trino. - -The requirements to run the script below are: -* Install the `gcloud` CLI, if it is not already installed on the GCP VM. Instructions to download the `gcloud` CLI can be found [here](https://cloud.google.com/sdk/docs/install). -* Ensure the `Cloud SQL Admin API` has been enabled in your project and that your VM's Principal has access to the correct role: `roles/cloudsql.admin`. -* Ensure the VM's Principal has access to at least Read-only scope on Compute Engine: `compute.readonly`. - -```shell -chmod +x getting-started/assets/cloud_providers/deploy-gcp.sh -export ASSETS_PATH=$(pwd)/getting-started/assets/ -export CLIENT_ID=root -export CLIENT_SECRET=s3cr3t -./getting-started/assets/cloud_providers/deploy-gcp.sh -``` - -## Next Steps -Congrats, you now have a running instance of Polaris! For further information regarding how to use Polaris, check out the [Using Polaris]({{% relref "../using-polaris" %}}) page. - -## Cleanup Instructions -To shut down the Polaris server, run the following commands: - -```shell -export ASSETS_PATH=$(pwd)/getting-started/assets/ -docker compose -p polaris -f getting-started/eclipselink/docker-compose.yml down -``` - -To deploy Polaris in a production setting, please review further recommendations at the [Configuring Polaris for Production]({{% relref "../../configuring-polaris-for-production" %}}) page. \ No newline at end of file diff --git a/site/content/in-dev/1.0.0/getting-started/install-dependencies.md b/site/content/in-dev/1.0.0/getting-started/install-dependencies.md deleted file mode 100644 index 7341118868..0000000000 --- a/site/content/in-dev/1.0.0/getting-started/install-dependencies.md +++ /dev/null @@ -1,118 +0,0 @@ ---- -# -# Licensed to the Apache Software Foundation (ASF) under one -# or more contributor license agreements. See the NOTICE file -# distributed with this work for additional information -# regarding copyright ownership. The ASF licenses this file -# to you under the Apache License, Version 2.0 (the -# "License"); you may not use this file except in compliance -# with the License. You may obtain a copy of the License at -# -# http://www.apache.org/licenses/LICENSE-2.0 -# -# Unless required by applicable law or agreed to in writing, -# software distributed under the License is distributed on an -# "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY -# KIND, either express or implied. See the License for the -# specific language governing permissions and limitations -# under the License. -# -Title: Installing Dependencies -type: docs -weight: 100 ---- - -This guide serves as an introduction to several key entities that can be managed with Apache Polaris (Incubating), describes how to build and deploy Polaris locally, and finally includes examples of how to use Polaris with Apache Spark™. - -# Prerequisites - -This guide covers building Polaris, deploying it locally or via [Docker](https://www.docker.com/), and interacting with it using the command-line interface and [Apache Spark](https://spark.apache.org/). Before proceeding with Polaris, be sure to satisfy the relevant prerequisites listed here. - -## Git - -To get the latest Polaris code, you'll need to clone the repository using [git](https://git-scm.com/). You can install git using [homebrew](https://brew.sh/) on MacOS: - -```shell -brew install git -``` - -Please follow instructions from the [Git Documentation](https://git-scm.com/book/en/v2/Getting-Started-Installing-Git) for instructions on installing Git on other platforms. - -Then, use git to clone the Polaris repo: - -```shell -cd ~ -git clone https://github.com/apache/polaris.git -``` - -## Docker - -It is recommended to deploy Polaris inside [Docker](https://www.docker.com/) for the Quickstart workflow. Instructions for deploying the Quickstart workflow on the supported Cloud Providers (AWS, Azure, GCP) will be provided only with Docker. However, non-Docker deployment instructions for local deployments can also be followed on Cloud Providers. - -Instructions to install Docker can be found on the [Docker website](https://docs.docker.com/engine/install/). Ensure that Docker and the Docker Compose plugin are both installed. - -### Docker on MacOS -Docker can be installed using [homebrew](https://brew.sh/): - -```shell -brew install --cask docker -``` - -There could be a [Docker permission issues](https://github.com/apache/polaris/pull/971) related to seccomp configuration. To resolve these issues, set the `seccomp` profile to "unconfined" when running a container. For example: - -```shell -docker run --security-opt seccomp=unconfined apache/polaris:latest -``` - -Note: Setting the seccomp profile to "unconfined" disables the default system call filtering, which may pose security risks. Use this configuration with caution, especially in production environments. - -### Docker on Amazon Linux -Docker can be installed using a modification to the CentOS instructions. For example: - -```shell -sudo dnf update -y -# Remove old version -sudo dnf remove -y docker docker-client docker-client-latest docker-common docker-latest docker-latest-logrotate docker-logrotate docker-engine -# Install dnf plugin -sudo dnf -y install dnf-plugins-core -# Add CentOS repository -sudo dnf config-manager --add-repo https://download.docker.com/linux/centos/docker-ce.repo -# Adjust release server version in the path as it will not match with Amazon Linux 2023 -sudo sed -i 's/$releasever/9/g' /etc/yum.repos.d/docker-ce.repo -# Install as usual -sudo dnf -y install docker-ce docker-ce-cli containerd.io docker-buildx-plugin docker-compose-plugin -``` - -### Confirm Docker Installation - -Once installed, make sure that both Docker and the Docker Compose plugin are installed: - -```shell -docker version -docker compose version -``` - -Also make sure Docker is running and is able to run a sample Docker container: - -```shell -docker run hello-world -``` - -## Java - -If you plan to build Polaris from source yourself or using this tutorial's instructions on a Cloud Provider, you will need to satisfy a few prerequisites first. - -Polaris is built using [gradle](https://gradle.org/) and is compatible with Java 21. We recommend the use of [jenv](https://www.jenv.be/) to manage multiple Java versions. For example, to install Java 21 via [homebrew](https://brew.sh/) and configure it with jenv: - -```shell -cd ~/polaris -brew install openjdk@21 jenv -jenv add $(brew --prefix openjdk@21) -jenv local 21 -``` - -Ensure that `java --version` and `javac` both return non-zero responses. - -## jq - -Most Polaris Quickstart scripts require `jq`. Follow the instructions from the [jq](https://jqlang.org/download/) website to download this tool. \ No newline at end of file diff --git a/site/content/in-dev/1.0.0/getting-started/quickstart.md b/site/content/in-dev/1.0.0/getting-started/quickstart.md deleted file mode 100644 index a9fd43f906..0000000000 --- a/site/content/in-dev/1.0.0/getting-started/quickstart.md +++ /dev/null @@ -1,116 +0,0 @@ ---- -# -# Licensed to the Apache Software Foundation (ASF) under one -# or more contributor license agreements. See the NOTICE file -# distributed with this work for additional information -# regarding copyright ownership. The ASF licenses this file -# to you under the Apache License, Version 2.0 (the -# "License"); you may not use this file except in compliance -# with the License. You may obtain a copy of the License at -# -# http://www.apache.org/licenses/LICENSE-2.0 -# -# Unless required by applicable law or agreed to in writing, -# software distributed under the License is distributed on an -# "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY -# KIND, either express or implied. See the License for the -# specific language governing permissions and limitations -# under the License. -# -Title: Quickstart -type: docs -weight: 200 ---- - -Polaris can be deployed via a docker image or as a standalone process. Before starting, be sure that you've satisfied the relevant prerequisites detailed in the previous page. - -## Common Setup -Before running Polaris, ensure you have completed the following setup steps: - -1. **Build Polaris** -```shell -cd ~/polaris -./gradlew \ - :polaris-server:assemble \ - :polaris-server:quarkusAppPartsBuild \ - :polaris-admin:assemble --rerun \ - -Dquarkus.container-image.tag=postgres-latest \ - -Dquarkus.container-image.build=true -``` -- **For standalone**: Omit the `-Dquarkus.container-image.tag` and `-Dquarkus.container-image.build` options if you do not need to build a Docker image. - -## Running Polaris with Docker - -To start using Polaris in Docker and launch Polaris, which is packaged with a Postgres instance, Apache Spark, and Trino. - -```shell -export ASSETS_PATH=$(pwd)/getting-started/assets/ -export QUARKUS_DATASOURCE_JDBC_URL=jdbc:postgresql://postgres:5432/POLARIS -export QUARKUS_DATASOURCE_USERNAME=postgres -export QUARKUS_DATASOURCE_PASSWORD=postgres -export CLIENT_ID=root -export CLIENT_SECRET=s3cr3t -docker compose -p polaris -f getting-started/assets/postgres/docker-compose-postgres.yml \ - -f getting-started/jdbc/docker-compose-bootstrap-db.yml \ - -f getting-started/jdbc/docker-compose.yml up -d -``` - -You should see output for some time as Polaris, Spark, and Trino build and start up. Eventually, you won’t see any more logs and see some logs relating to Spark, resembling the following: - -``` -spark-sql-1 | Spark Web UI available at http://8bc4de8ed854:4040 -spark-sql-1 | Spark master: local[*], Application Id: local-1743745174604 -spark-sql-1 | 25/04/04 05:39:38 WARN SparkSQLCLIDriver: WARNING: Directory for Hive history file: /home/spark does not exist. History will not be available during this session. -spark-sql-1 | 25/04/04 05:39:39 WARN RESTSessionCatalog: Iceberg REST client is missing the OAuth2 server URI configuration and defaults to http://polaris:8181/api/catalogv1/oauth/tokens. This automatic fallback will be removed in a future Iceberg release.It is recommended to configure the OAuth2 endpoint using the 'oauth2-server-uri' property to be prepared. This warning will disappear if the OAuth2 endpoint is explicitly configured. See https://github.com/apache/iceberg/issues/10537 -``` - -The Docker image pre-configures a sample catalog called `quickstart_catalog` that uses a local file system. - -## Running Polaris as a Standalone Process - -You can also start Polaris through Gradle (packaged within the Polaris repository): - -1. **Start the Server** - -Run the following command to start Polaris: - -```shell -./gradlew run -``` - -You should see output for some time as Polaris builds and starts up. Eventually, you won’t see any more logs and should see messages that resemble the following: - -``` -INFO [io.quarkus] [,] [,,,] (Quarkus Main Thread) polaris-runtime-service on JVM (powered by Quarkus ) started in 2.656s. Listening on: http://localhost:8181. Management interface listening on http://0.0.0.0:8182. -INFO [io.quarkus] [,] [,,,] (Quarkus Main Thread) Profile prod activated. Live Coding activated. -INFO [io.quarkus] [,] [,,,] (Quarkus Main Thread) Installed features: [...] -``` - -At this point, Polaris is running. - -When using a Gradle-launched Polaris instance in this tutorial, we'll launch an instance of Polaris that stores entities only in-memory. This means that any entities that you define will be destroyed when Polaris is shut down. -For more information on how to configure Polaris for production usage, see the [docs]({{% relref "../configuring-polaris-for-production" %}}). - -When Polaris is run using the `./gradlew run` command, the root principal credentials are `root` and `secret` for the `CLIENT_ID` and `CLIENT_SECRET`, respectively. - -### Installing Apache Spark and Trino Locally for Testing - -#### Apache Spark - -If you want to connect to Polaris with [Apache Spark](https://spark.apache.org/), you'll need to start by cloning Spark. As in the [prerequisites]({{% ref "install-dependencies#git" %}}), make sure [git](https://git-scm.com/) is installed first. - -Then, clone Spark and check out a versioned branch. This guide uses [Spark 3.5](https://spark.apache.org/releases/spark-release-3-5-0.html). - -```shell -git clone --branch branch-3.5 https://github.com/apache/spark.git ~/spark -``` - -#### Trino -If you want to connect to Polaris with [Trino](https://trino.io/), it is recommended to set up a test instance of Trino using Docker. As in the [prerequisites]({{% ref "install-dependencies#docker" %}}), make sure [Docker](https://www.docker.com/) is installed first - -```shell -docker run --name trino -d -p 8080:8080 trinodb/trino -``` - -## Next Steps -Congrats, you now have a running instance of Polaris! For further information regarding how to use Polaris, check out the [Using Polaris]({{% ref "using-polaris" %}}) page. \ No newline at end of file diff --git a/site/content/in-dev/1.0.0/getting-started/using-polaris.md b/site/content/in-dev/1.0.0/getting-started/using-polaris.md deleted file mode 100644 index 35f0bae336..0000000000 --- a/site/content/in-dev/1.0.0/getting-started/using-polaris.md +++ /dev/null @@ -1,315 +0,0 @@ ---- -# -# Licensed to the Apache Software Foundation (ASF) under one -# or more contributor license agreements. See the NOTICE file -# distributed with this work for additional information -# regarding copyright ownership. The ASF licenses this file -# to you under the Apache License, Version 2.0 (the -# "License"); you may not use this file except in compliance -# with the License. You may obtain a copy of the License at -# -# http://www.apache.org/licenses/LICENSE-2.0 -# -# Unless required by applicable law or agreed to in writing, -# software distributed under the License is distributed on an -# "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY -# KIND, either express or implied. See the License for the -# specific language governing permissions and limitations -# under the License. -# -Title: Using Polaris -type: docs -weight: 400 ---- - -## Setup - -Ensure your `CLIENT_ID` & `CLIENT_SECRET` variables are already defined, as they were required for starting the Polaris server earlier. - -```shell -export CLIENT_ID=YOUR_CLIENT_ID -export CLIENT_SECRET=YOUR_CLIENT_SECRET -``` - -## Defining a Catalog - -In Polaris, the [catalog]({{% relref "../entities#catalog" %}}) is the top-level entity that objects like [tables]({{% relref "../entities#table" %}}) and [views]({{% relref "../entities#view" %}}) are organized under. With a Polaris service running, you can create a catalog like so: - -```shell -cd ~/polaris - -./polaris \ - --client-id ${CLIENT_ID} \ - --client-secret ${CLIENT_SECRET} \ - catalogs \ - create \ - --storage-type s3 \ - --default-base-location ${DEFAULT_BASE_LOCATION} \ - --role-arn ${ROLE_ARN} \ - quickstart_catalog -``` - -This will create a new catalog called **quickstart_catalog**. If you are using one of the Getting Started locally-built Docker images, we have already created a catalog named `quickstart_catalog` for you. - -The `DEFAULT_BASE_LOCATION` you provide will be the default location that objects in this catalog should be stored in, and the `ROLE_ARN` you provide should be a [Role ARN](https://docs.aws.amazon.com/IAM/latest/UserGuide/reference-arns.html) with access to read and write data in that location. These credentials will be provided to engines reading data from the catalog once they have authenticated with Polaris using credentials that have access to those resources. - -If you’re using a storage type other than S3, such as Azure, you’ll provide a different type of credential than a Role ARN. For more details on supported storage types, see the [docs]({{% relref "../entities#storage-type" %}}). - -Additionally, if Polaris is running somewhere other than `localhost:8181`, you can specify the correct hostname and port by providing `--host` and `--port` flags. For the full set of options supported by the CLI, please refer to the [docs]({{% relref "../command-line-interface" %}}). - - -### Creating a Principal and Assigning it Privileges - -With a catalog created, we can create a [principal]({{% relref "../entities#principal" %}}) that has access to manage that catalog. For details on how to configure the Polaris CLI, see [the section above](#defining-a-catalog) or refer to the [docs]({{% relref "../command-line-interface" %}}). - -```shell -./polaris \ - --client-id ${CLIENT_ID} \ - --client-secret ${CLIENT_SECRET} \ - principals \ - create \ - quickstart_user - -./polaris \ - --client-id ${CLIENT_ID} \ - --client-secret ${CLIENT_SECRET} \ - principal-roles \ - create \ - quickstart_user_role - -./polaris \ - --client-id ${CLIENT_ID} \ - --client-secret ${CLIENT_SECRET} \ - catalog-roles \ - create \ - --catalog quickstart_catalog \ - quickstart_catalog_role -``` - -Be sure to provide the necessary credentials, hostname, and port as before. - -When the `principals create` command completes successfully, it will return the credentials for this new principal. Export them for future use. For example: - -```shell -./polaris ... principals create example -{"clientId": "XXXX", "clientSecret": "YYYY"} -export USER_CLIENT_ID=XXXX -export USER_CLIENT_SECRET=YYYY -``` - -Now, we grant the principal the [principal role]({{% relref "../entities#principal-role" %}}) we created, and grant the [catalog role]({{% relref "../entities#catalog-role" %}}) the principal role we created. For more information on these entities, please refer to the linked documentation. - -```shell -./polaris \ - --client-id ${CLIENT_ID} \ - --client-secret ${CLIENT_SECRET} \ - principal-roles \ - grant \ - --principal quickstart_user \ - quickstart_user_role - -./polaris \ - --client-id ${CLIENT_ID} \ - --client-secret ${CLIENT_SECRET} \ - catalog-roles \ - grant \ - --catalog quickstart_catalog \ - --principal-role quickstart_user_role \ - quickstart_catalog_role -``` - -Now, we’ve linked our principal to the catalog via roles like so: - -![Principal to Catalog](/img/quickstart/privilege-illustration-1.png "Principal to Catalog") - -In order to give this principal the ability to interact with the catalog, we must assign some [privileges]({{% relref "../entities#privilege" %}}). For the time being, we will give this principal the ability to fully manage content in our new catalog. We can do this with the CLI like so: - -```shell -./polaris \ - --client-id ${CLIENT_ID} \ - --client-secret ${CLIENT_SECRET} \ - privileges \ - catalog \ - grant \ - --catalog quickstart_catalog \ - --catalog-role quickstart_catalog_role \ - CATALOG_MANAGE_CONTENT -``` - -This grants the [catalog privileges]({{% relref "../entities#privilege" %}}) `CATALOG_MANAGE_CONTENT` to our catalog role, linking everything together like so: - -![Principal to Catalog with Catalog Role](/img/quickstart/privilege-illustration-2.png "Principal to Catalog with Catalog Role") - -`CATALOG_MANAGE_CONTENT` has create/list/read/write privileges on all entities within the catalog. The same privilege could be granted to a namespace, in which case the principal could create/list/read/write any entity under that namespace. - -## Using Iceberg & Polaris - -At this point, we’ve created a principal and granted it the ability to manage a catalog. We can now use an external engine to assume that principal, access our catalog, and store data in that catalog using [Apache Iceberg](https://iceberg.apache.org/). Polaris is compatible with any [Apache Iceberg](https://iceberg.apache.org/) client that supports the REST API. Depending on the client you plan to use, refer to the respective examples below. - -### Connecting with Spark - -#### Using a Local Build of Spark - -To use a Polaris-managed catalog in [Apache Spark](https://spark.apache.org/), we can configure Spark to use the Iceberg catalog REST API. - -This guide uses [Apache Spark 3.5](https://spark.apache.org/releases/spark-release-3-5-0.html), but be sure to find [the appropriate iceberg-spark package for your Spark version](https://mvnrepository.com/artifact/org.apache.iceberg/iceberg-spark). From a local Spark clone on the `branch-3.5` branch we can run the following: - -_Note: the credentials provided here are those for our principal, not the root credentials._ - -```shell -bin/spark-sql \ ---packages org.apache.iceberg:iceberg-spark-runtime-3.5_2.12:1.9.0,org.apache.iceberg:iceberg-aws-bundle:1.9.0 \ ---conf spark.sql.extensions=org.apache.iceberg.spark.extensions.IcebergSparkSessionExtensions \ ---conf spark.sql.catalog.quickstart_catalog.warehouse=quickstart_catalog \ ---conf spark.sql.catalog.quickstart_catalog.header.X-Iceberg-Access-Delegation=vended-credentials \ ---conf spark.sql.catalog.quickstart_catalog=org.apache.iceberg.spark.SparkCatalog \ ---conf spark.sql.catalog.quickstart_catalog.catalog-impl=org.apache.iceberg.rest.RESTCatalog \ ---conf spark.sql.catalog.quickstart_catalog.uri=http://localhost:8181/api/catalog \ ---conf spark.sql.catalog.quickstart_catalog.credential='${USER_CLIENT_ID}:${USER_CLIENT_SECRET}' \ ---conf spark.sql.catalog.quickstart_catalog.scope='PRINCIPAL_ROLE:ALL' \ ---conf spark.sql.catalog.quickstart_catalog.token-refresh-enabled=true \ ---conf spark.sql.catalog.quickstart_catalog.client.region=us-west-2 -``` - -Similar to the CLI commands above, this configures Spark to use the Polaris running at `localhost:8181`. If your Polaris server is running elsewhere, but sure to update the configuration appropriately. - -Finally, note that we include the `iceberg-aws-bundle` package here. If your table is using a different filesystem, be sure to include the appropriate dependency. - -#### Using Spark SQL from a Docker container - -Refresh the Docker container with the user's credentials: -```shell -docker compose -p polaris -f getting-started/eclipselink/docker-compose.yml stop spark-sql -docker compose -p polaris -f getting-started/eclipselink/docker-compose.yml rm -f spark-sql -docker compose -p polaris -f getting-started/eclipselink/docker-compose.yml up -d --no-deps spark-sql -``` - -Attach to the running spark-sql container: - -```shell -docker attach $(docker ps -q --filter name=spark-sql) -``` - -#### Sample Commands - -Once the Spark session starts, we can create a namespace and table within the catalog: - -```sql -USE quickstart_catalog; -CREATE NAMESPACE IF NOT EXISTS quickstart_namespace; -CREATE NAMESPACE IF NOT EXISTS quickstart_namespace.schema; -USE NAMESPACE quickstart_namespace.schema; -CREATE TABLE IF NOT EXISTS quickstart_table (id BIGINT, data STRING) USING ICEBERG; -``` - -We can now use this table like any other: - -``` -INSERT INTO quickstart_table VALUES (1, 'some data'); -SELECT * FROM quickstart_table; -. . . -+---+---------+ -|id |data | -+---+---------+ -|1 |some data| -+---+---------+ -``` - -If at any time access is revoked... - -```shell -./polaris \ - --client-id ${CLIENT_ID} \ - --client-secret ${CLIENT_SECRET} \ - privileges \ - catalog \ - revoke \ - --catalog quickstart_catalog \ - --catalog-role quickstart_catalog_role \ - CATALOG_MANAGE_CONTENT -``` - -Spark will lose access to the table: - -``` -INSERT INTO quickstart_table VALUES (1, 'some data'); - -org.apache.iceberg.exceptions.ForbiddenException: Forbidden: Principal 'quickstart_user' with activated PrincipalRoles '[]' and activated grants via '[quickstart_catalog_role, quickstart_user_role]' is not authorized for op LOAD_TABLE_WITH_READ_DELEGATION -``` - -### Connecting with Trino - -Refresh the Docker container with the user's credentials: - -```shell -docker compose -p polaris -f getting-started/eclipselink/docker-compose.yml stop trino -docker compose -p polaris -f getting-started/eclipselink/docker-compose.yml rm -f trino -docker compose -p polaris -f getting-started/eclipselink/docker-compose.yml up -d --no-deps trino -``` - -Attach to the running Trino container: - -```shell -docker exec -it $(docker ps -q --filter name=trino) trino -``` - -You may not see Trino's prompt immediately, type ENTER to see it. A few commands that you can try: - -```sql -SHOW CATALOGS; -SHOW SCHEMAS FROM iceberg; -CREATE SCHEMA iceberg.quickstart_schema; -CREATE TABLE iceberg.quickstart_schema.quickstart_table AS SELECT 1 x; -SELECT * FROM iceberg.quickstart_schema.quickstart_table; -``` - -If at any time access is revoked... - -```shell -./polaris \ - --client-id ${CLIENT_ID} \ - --client-secret ${CLIENT_SECRET} \ - privileges \ - catalog \ - revoke \ - --catalog quickstart_catalog \ - --catalog-role quickstart_catalog_role \ - CATALOG_MANAGE_CONTENT -``` - -Trino will lose access to the table: - -```sql -SELECT * FROM iceberg.quickstart_schema.quickstart_table; - -org.apache.iceberg.exceptions.ForbiddenException: Forbidden: Principal 'quickstart_user' with activated PrincipalRoles '[]' and activated grants via '[quickstart_catalog_role, quickstart_user_role]' is not authorized for op LOAD_TABLE_WITH_READ_DELEGATION -``` - -### Connecting Using REST APIs - -To access Polaris from the host machine, first request an access token: - -```shell -export POLARIS_TOKEN=$(curl -s http://polaris:8181/api/catalog/v1/oauth/tokens \ - --resolve polaris:8181:127.0.0.1 \ - --user ${CLIENT_ID}:${CLIENT_SECRET} \ - -d 'grant_type=client_credentials' \ - -d 'scope=PRINCIPAL_ROLE:ALL' | jq -r .access_token) -``` - -Then, use the access token in the Authorization header when accessing Polaris: - -```shell -curl -v http://127.0.0.1:8181/api/management/v1/principal-roles -H "Authorization: Bearer $POLARIS_TOKEN" -curl -v http://127.0.0.1:8181/api/management/v1/catalogs/quickstart_catalog -H "Authorization: Bearer $POLARIS_TOKEN" -``` - -## Next Steps -* Visit [Configuring Polaris for Production]({{% relref "../configuring-polaris-for-production" %}}). -* A Getting Started experience for using Spark with Jupyter Notebooks is documented [here](https://github.com/apache/polaris/blob/main/getting-started/spark/README.md). -* To shut down a locally-deployed Polaris server and clean up all related Docker containers, run the command listed below. Cloud Deployments have their respective termination commands on their Deployment page, while Polaris running on Gradle will terminate when the Gradle process terminates. -```shell -docker compose -p polaris -f getting-started/assets/postgres/docker-compose-postgres.yml -f getting-started/jdbc/docker-compose-bootstrap-db.yml -f getting-started/jdbc/docker-compose.yml down -``` - - diff --git a/site/content/in-dev/1.0.0/metastores.md b/site/content/in-dev/1.0.0/metastores.md deleted file mode 100644 index 4810b124a0..0000000000 --- a/site/content/in-dev/1.0.0/metastores.md +++ /dev/null @@ -1,151 +0,0 @@ ---- -# -# Licensed to the Apache Software Foundation (ASF) under one -# or more contributor license agreements. See the NOTICE file -# distributed with this work for additional information -# regarding copyright ownership. The ASF licenses this file -# to you under the Apache License, Version 2.0 (the -# "License"); you may not use this file except in compliance -# with the License. You may obtain a copy of the License at -# -# http://www.apache.org/licenses/LICENSE-2.0 -# -# Unless required by applicable law or agreed to in writing, -# software distributed under the License is distributed on an -# "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY -# KIND, either express or implied. See the License for the -# specific language governing permissions and limitations -# under the License. -# -title: Metastores -type: docs -weight: 700 ---- - -This page explains how to configure and use Polaris metastores with either the recommended Relational JDBC or the -deprecated EclipseLink persistence backends. - -## Relational JDBC -This implementation leverages Quarkus for datasource management and supports configuration through -environment variables or JVM -D flags at startup. For more information, refer to the [Quarkus configuration reference](https://quarkus.io/guides/config-reference#env-file). - - -``` -POLARIS_PERSISTENCE_TYPE=relational-jdbc - -QUARKUS_DATASOURCE_DB_KIND=postgresql -QUARKUS_DATASOURCE_USERNAME= -QUARKUS_DATASOURCE_PASSWORD= -QUARKUS_DATASOURCE_JDBC_URL= -``` - -The Relational JDBC metastore currently relies on a Quarkus-managed datasource and supports only PostgresSQL and H2 databases. This limitation is similar to that of EclipseLink, primarily due to underlying schema differences. At this time, official documentation is provided exclusively for usage with PostgreSQL. -Please refer to the documentation here: -[Configure data sources in Quarkus](https://quarkus.io/guides/datasource) - -Additionally the retries can be configured via `polaris.persistence.relational.jdbc.*` properties please ref [configuration](./configuration.md) - -## EclipseLink (Deprecated) -> [!IMPORTANT] Eclipse link is deprecated, its recommend to use Relational JDBC as persistence instead. - -Polaris includes EclipseLink plugin by default with PostgresSQL driver. - -Configure the `polaris.persistence` section in your Polaris configuration file -(`application.properties`) as follows: - -``` -polaris.persistence.type=eclipse-link -polaris.persistence.eclipselink.configuration-file=/path/to/persistence.xml -polaris.persistence.eclipselink.persistence-unit=polaris -``` - -Alternatively, configuration can also be done with environment variables or system properties. Refer -to the [Quarkus Configuration Reference] for more information. - -The `configuration-file` option must point to an [EclipseLink configuration file]. This file, named -`persistence.xml`, is used to set up the database connection properties, which can differ depending -on the type of database and its configuration. - -> Note: You have to locate the `persistence.xml` at least two folders down to the root folder, e.g. `/deployments/config/persistence.xml` is OK, whereas `/deployments/persistence.xml` will cause an infinity loop. -[Quarkus Configuration Reference]: https://quarkus.io/guides/config-reference -[EclipseLink configuration file]: https://eclipse.dev/eclipselink/documentation/4.0/solutions/solutions.html#TESTINGJPA002 - -Polaris creates and connects to a separate database for each realm. Specifically, the `{realm}` placeholder in `jakarta.persistence.jdbc.url` is substituted with the actual realm name, allowing the Polaris server to connect to different databases based on the realm. - -> Note: some database systems such as Postgres don't create databases automatically. Database admins need to create them manually before running Polaris server. - -A single `persistence.xml` can describe multiple [persistence units](https://eclipse.dev/eclipselink/documentation/4.0/concepts/concepts.html#APPDEV001). For example, with both a `polaris-dev` and `polaris` persistence unit defined, you could use a single `persistence.xml` to easily switch between development and production databases. Use the `persistence-unit` option in the Polaris server configuration to easily switch between persistence units. - -### Using H2 - -> [!IMPORTANT] H2 is an in-memory database and is not suitable for production! - -The default [persistence.xml] in Polaris is already configured for H2, but you can easily customize -your H2 configuration using the persistence unit template below: - -[persistence.xml]: https://github.com/apache/polaris/blob/main/extension/persistence/eclipselink/src/main/resources/META-INF/persistence.xml - -```xml - - org.eclipse.persistence.jpa.PersistenceProvider - org.apache.polaris.extension.persistence.impl.eclipselink.models.ModelEntity - org.apache.polaris.extension.persistence.impl.eclipselink.models.ModelEntityActive - org.apache.polaris.extension.persistence.impl.eclipselink.models.ModelEntityChangeTracking - org.apache.polaris.extension.persistence.impl.eclipselink.models.ModelEntityDropped - org.apache.polaris.extension.persistence.impl.eclipselink.models.ModelGrantRecord - org.apache.polaris.extension.persistence.impl.eclipselink.models.ModelPrincipalSecrets - org.apache.polaris.extension.persistence.impl.eclipselink.models.ModelSequenceId - NONE - - - - - - - -``` - -To build Polaris with the necessary H2 dependency and start the Polaris service, run the following: - -```shell -./gradlew \ - :polaris-server:assemble \ - :polaris-server:quarkusAppPartsBuild --rerun \ - -PeclipseLinkDeps=com.h2database:h2:2.3.232 -java -Dpolaris.persistence.type=eclipse-link \ - -Dpolaris.persistence.eclipselink.configuration-file=/path/to/persistence.xml \ - -Dpolaris.persistence.eclipselink.persistence-unit=polaris \ - -jar runtime/server/build/quarkus-app/quarkus-run.jar -``` - -### Using Postgres - -PostgreSQL is included by default in the Polaris server distribution. - -The following shows a sample configuration for integrating Polaris with Postgres. - -```xml - - org.eclipse.persistence.jpa.PersistenceProvider - org.apache.polaris.extension.persistence.impl.eclipselink.models.ModelEntity - org.apache.polaris.extension.persistence.impl.eclipselink.models.ModelEntityActive - org.apache.polaris.extension.persistence.impl.eclipselink.models.ModelEntityChangeTracking - org.apache.polaris.extension.persistence.impl.eclipselink.models.ModelEntityDropped - org.apache.polaris.extension.persistence.impl.eclipselink.models.ModelGrantRecord - org.apache.polaris.extension.persistence.impl.eclipselink.models.ModelPrincipalSecrets - org.apache.polaris.extension.persistence.impl.eclipselink.models.ModelSequenceId - NONE - - - - - - - - - - -``` - diff --git a/site/content/in-dev/1.0.0/polaris-catalog-service.md b/site/content/in-dev/1.0.0/polaris-catalog-service.md deleted file mode 100644 index 02fed63f46..0000000000 --- a/site/content/in-dev/1.0.0/polaris-catalog-service.md +++ /dev/null @@ -1,26 +0,0 @@ ---- -# -# Licensed to the Apache Software Foundation (ASF) under one -# or more contributor license agreements. See the NOTICE file -# distributed with this work for additional information -# regarding copyright ownership. The ASF licenses this file -# to you under the Apache License, Version 2.0 (the -# "License"); you may not use this file except in compliance -# with the License. You may obtain a copy of the License at -# -# http://www.apache.org/licenses/LICENSE-2.0 -# -# Unless required by applicable law or agreed to in writing, -# software distributed under the License is distributed on an -# "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY -# KIND, either express or implied. See the License for the -# specific language governing permissions and limitations -# under the License. -# -linkTitle: 'Catalog API Spec' -weight: 900 -params: - show_page_toc: false ---- - -{{< redoc-polaris "generated/bundled-polaris-catalog-service.yaml" >}} diff --git a/site/content/in-dev/1.0.0/polaris-management-service.md b/site/content/in-dev/1.0.0/polaris-management-service.md deleted file mode 100644 index 0b66b9daa4..0000000000 --- a/site/content/in-dev/1.0.0/polaris-management-service.md +++ /dev/null @@ -1,27 +0,0 @@ ---- -# -# Licensed to the Apache Software Foundation (ASF) under one -# or more contributor license agreements. See the NOTICE file -# distributed with this work for additional information -# regarding copyright ownership. The ASF licenses this file -# to you under the Apache License, Version 2.0 (the -# "License"); you may not use this file except in compliance -# with the License. You may obtain a copy of the License at -# -# http://www.apache.org/licenses/LICENSE-2.0 -# -# Unless required by applicable law or agreed to in writing, -# software distributed under the License is distributed on an -# "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY -# KIND, either express or implied. See the License for the -# specific language governing permissions and limitations -# under the License. -# -title: 'Apache Polaris Management Service OpenAPI' -linkTitle: 'Management OpenAPI' -weight: 800 -params: - show_page_toc: false ---- - -{{< redoc-polaris "polaris-management-service.yml" >}} diff --git a/site/content/in-dev/1.0.0/polaris-spark-client.md b/site/content/in-dev/1.0.0/polaris-spark-client.md deleted file mode 100644 index a34bceeced..0000000000 --- a/site/content/in-dev/1.0.0/polaris-spark-client.md +++ /dev/null @@ -1,141 +0,0 @@ ---- -# -# Licensed to the Apache Software Foundation (ASF) under one -# or more contributor license agreements. See the NOTICE file -# distributed with this work for additional information -# regarding copyright ownership. The ASF licenses this file -# to you under the Apache License, Version 2.0 (the -# "License"); you may not use this file except in compliance -# with the License. You may obtain a copy of the License at -# -# http://www.apache.org/licenses/LICENSE-2.0 -# -# Unless required by applicable law or agreed to in writing, -# software distributed under the License is distributed on an -# "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY -# KIND, either express or implied. See the License for the -# specific language governing permissions and limitations -# under the License. -# -Title: Polaris Spark Client -type: docs -weight: 650 ---- - -Apache Polaris now provides Catalog support for Generic Tables (non-Iceberg tables), please check out -the [Catalog API Spec]({{% ref "polaris-catalog-service" %}}) for Generic Table API specs. - -Along with the Generic Table Catalog support, Polaris is also releasing a Spark client, which helps to -provide an end-to-end solution for Apache Spark to manage Delta tables using Polaris. - -Note the Polaris Spark client is able to handle both Iceberg and Delta tables, not just Delta. - -This page documents how to connect Spark with Polaris Service using the Polaris Spark client. - -## Quick Start with Local Polaris service -If you want to quickly try out the functionality with a local Polaris service, simply check out the Polaris repo -and follow the instructions in the Spark plugin getting-started -[README](https://github.com/apache/polaris/blob/main/plugins/spark/v3.5/getting-started/README.md). - -Check out the Polaris repo: -```shell -cd ~ -git clone https://github.com/apache/polaris.git -``` - -## Start Spark against a deployed Polaris service -Before starting, ensure that the deployed Polaris service supports Generic Tables, and that Spark 3.5(version 3.5.3 or later is installed). -Spark 3.5.5 is recommended, and you can follow the instructions below to get a Spark 3.5.5 distribution. -```shell -cd ~ -wget https://archive.apache.org/dist/spark/spark-3.5.5/spark-3.5.5-bin-hadoop3.tgz -mkdir spark-3.5 -tar xzvf spark-3.5.5-bin-hadoop3.tgz -C spark-3.5 --strip-components=1 -cd spark-3.5 -``` - -### Connecting with Spark using the Polaris Spark client -The following CLI command can be used to start the Spark with connection to the deployed Polaris service using -a released Polaris Spark client. - -```shell -bin/spark-shell \ ---packages ,org.apache.iceberg:iceberg-aws-bundle:1.9.0,io.delta:delta-spark_2.12:3.3.1 \ ---conf spark.sql.extensions=org.apache.iceberg.spark.extensions.IcebergSparkSessionExtensions,io.delta.sql.DeltaSparkSessionExtension \ ---conf spark.sql.catalog.spark_catalog=org.apache.spark.sql.delta.catalog.DeltaCatalog \ ---conf spark.sql.catalog..warehouse= \ ---conf spark.sql.catalog..header.X-Iceberg-Access-Delegation=vended-credentials \ ---conf spark.sql.catalog.=org.apache.polaris.spark.SparkCatalog \ ---conf spark.sql.catalog..uri= \ ---conf spark.sql.catalog..credential=':' \ ---conf spark.sql.catalog..scope='PRINCIPAL_ROLE:ALL' \ ---conf spark.sql.catalog..token-refresh-enabled=true -``` -Assume the released Polaris Spark client you want to use is `org.apache.polaris:polaris-spark-3.5_2.12:1.0.0`, -replace the `polaris-spark-client-package` field with the release. - -The `spark-catalog-name` is the catalog name you will use with Spark, and `polaris-catalog-name` is the catalog name used -by Polaris service, for simplicity, you can use the same name. - -Replace the `polaris-service-uri` with the uri of the deployed Polaris service. For example, with a locally deployed -Polaris service, the uri would be `http://localhost:8181/api/catalog`. - -For `client-id` and `client-secret` values, you can refer to [Using Polaris]({{% ref "getting-started/using-polaris" %}}) -for more details. - -You can also start the connection by programmatically initialize a SparkSession, following is an example with PySpark: -```python -from pyspark.sql import SparkSession - -spark = SparkSession.builder - .config("spark.jars.packages", ",org.apache.iceberg:iceberg-aws-bundle:1.9.0,io.delta:delta-spark_2.12:3.3.1") - .config("spark.sql.catalog.spark_catalog", "org.apache.spark.sql.delta.catalog.DeltaCatalog") - .config("spark.sql.extensions", "org.apache.iceberg.spark.extensions.IcebergSparkSessionExtensions,io.delta.sql.DeltaSparkSessionExtension") - .config("spark.sql.catalog.", "org.apache.polaris.spark.SparkCatalog") - .config("spark.sql.catalog..uri", ) - .config("spark.sql.catalog..token-refresh-enabled", "true") - .config("spark.sql.catalog..credential", ":") - .config("spark.sql.catalog..warehouse", ) - .config("spark.sql.catalog.polaris.scope", 'PRINCIPAL_ROLE:ALL') - .config("spark.sql.catalog.polaris.header.X-Iceberg-Access-Delegation", 'vended-credentials') - .getOrCreate() -``` -Similar as the CLI command, make sure the corresponding fields are replaced correctly. - -### Create tables with Spark -After Spark is started, you can use it to create and access Iceberg and Delta tables, for example: -```python -spark.sql("USE polaris") -spark.sql("CREATE NAMESPACE IF NOT EXISTS DELTA_NS") -spark.sql("CREATE NAMESPACE IF NOT EXISTS DELTA_NS.PUBLIC") -spark.sql("USE NAMESPACE DELTA_NS.PUBLIC") -spark.sql("""CREATE TABLE IF NOT EXISTS PEOPLE ( - id int, name string) -USING delta LOCATION 'file:///tmp/var/delta_tables/people'; -""") -``` - -## Connecting with Spark using local Polaris Spark client jar -If you would like to use a version of the Spark client that is currently not yet released, you can -build a Spark client jar locally from source. Please check out the Polaris repo and refer to the Spark plugin -[README](https://github.com/apache/polaris/blob/main/plugins/spark/README.md) for detailed instructions. - -## Limitations -The Polaris Spark client has the following functionality limitations: -1) Create table as select (CTAS) is not supported for Delta tables. As a result, the `saveAsTable` method of `Dataframe` - is also not supported, since it relies on the CTAS support. -2) Create a Delta table without explicit location is not supported. -3) Rename a Delta table is not supported. -4) ALTER TABLE ... SET LOCATION is not supported for DELTA table. -5) For other non-Iceberg tables like csv, it is not supported. - -## Iceberg Spark Client compatibility with Polaris Spark Client -The Polaris Spark client today depends on a specific Iceberg client version, and the version dependency is described -in the following table: - -| Spark Client Version | Iceberg Spark Client Version | -|----------------------|------------------------------| -| 1.0.0 | 1.9.0 | - -The Iceberg dependency is automatically downloaded when the Polaris package is downloaded, so there is no need to -add the Iceberg Spark client in the `packages` configuration. diff --git a/site/content/in-dev/1.0.0/policy.md b/site/content/in-dev/1.0.0/policy.md deleted file mode 100644 index 3f49353884..0000000000 --- a/site/content/in-dev/1.0.0/policy.md +++ /dev/null @@ -1,197 +0,0 @@ ---- -# -# Licensed to the Apache Software Foundation (ASF) under one -# or more contributor license agreements. See the NOTICE file -# distributed with this work for additional information -# regarding copyright ownership. The ASF licenses this file -# to you under the Apache License, Version 2.0 (the -# "License"); you may not use this file except in compliance -# with the License. You may obtain a copy of the License at -# -# http://www.apache.org/licenses/LICENSE-2.0 -# -# Unless required by applicable law or agreed to in writing, -# software distributed under the License is distributed on an -# "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY -# KIND, either express or implied. See the License for the -# specific language governing permissions and limitations -# under the License. -# -title: Policy -type: docs -weight: 425 ---- - -The Polaris Policy framework empowers organizations to centrally define, manage, and enforce fine-grained governance, lifecycle, and operational rules across all data resources in the catalog. - -With the policy API, you can: -- Create and manage policies -- Attach policies to specific resources (catalogs, namespaces, tables, or views) -- Check applicable policies for any given resource - -## What is a Policy? - -A policy in Apache Polaris is a structured entity that defines rules governing actions on specified resources under -predefined conditions. Each policy contains: - -- **Name**: A unique identifier within a namespace -- **Type**: Determines the semantics and expected format of the policy content -- **Description**: Explains the purpose of the policy -- **Content**: Contains the actual rules defining the policy behavior -- **Version**: An automatically tracked revision number -- **Inheritable**: Whether the policy can be inherited by child resources, decided by its type - -### Policy Types - -Polaris supports several predefined system policy types (prefixed with `system.`): - -| Policy Type | Purpose | JSON-Schema | Applies To | -|-------------|-------------------------------------------------------|-------------|------------| -| **`system.data-compaction`** | Defines rules for data file compaction operations | [`data-compaction/2025-02-03.json`](https://polaris.apache.org/schemas/policies/system/data-compaction/2025-02-03.json) | Iceberg **table**, **namespace**, **catalog** | -| **`system.metadata-compaction`** | Defines rules for metadata file compaction operations | [`metadata-compaction/2025-02-03.json`](https://polaris.apache.org/schemas/policies/system/metadata-compaction/2025-02-03.json) | Iceberg **table**, **namespace**, **catalog** | -| **`system.orphan-file-removal`** | Defines rules for removing orphaned files | [`orphan-file-removal/2025-02-03.json`](https://polaris.apache.org/schemas/policies/system/orphan-file-removal/2025-02-03.json) | Iceberg **table**, **namespace**, **catalog** | -| **`system.snapshot-expiry`** | Defines rules for snapshot expiration | [`snapshot-expiry/2025-02-03.json`](https://polaris.apache.org/schemas/policies/system/snapshot-expiry/2025-02-03.json) | Iceberg **table**, **namespace**, **catalog** | - -Support for additional predefined system policy types and custom policy type definitions is in progress. -For more details, please refer to the [roadmap](https://github.com/apache/polaris/discussions/1028). - -### Policy Inheritance - -The entity hierarchy in Polaris is structured as follows: - -``` - Catalog - | - Namespace - | - +-----------+----------+ - | | | -Iceberg Iceberg Generic - Table View Table -``` - -Policies can be attached at any level, and inheritance flows from catalog down to namespace, then to tables and views. - -Policies can be inheritable or non-inheritable: - -- **Inheritable policies**: Apply to the target resource and all its applicable child resources -- **Non-inheritable policies**: Apply only to the specific target resource - -The inheritance follows an override mechanism: -1. Table-level policies override namespace and catalog policies -2. Namespace-level policies override parent namespace and catalog policies - -> **Important:** Because an override completely replaces the same policy type at higher levels, -> **only one instance of a given policy type can be attached to (and therefore affect) a resource**. - -## Working with Policies - -### Creating a Policy - -To create a policy, you need to provide a name, type, and optionally a description and content: - -```json -POST /polaris/v1/{prefix}/namespaces/{namespace}/policies -{ - "name": "compaction-policy", - "type": "system.data-compaction", - "description": "Policy for optimizing table storage", - "content": "{\"version\": \"2025-02-03\", \"enable\": true, \"config\": {\"target_file_size_bytes\": 134217728}}" -} -``` - -The policy content is validated against a schema specific to its type. Here are a few policy content examples: -- Data Compaction Policy -```json -{ - "version": "2025-02-03", - "enable": true, - "config": { - "target_file_size_bytes": 134217728, - "compaction_strategy": "bin-pack", - "max-concurrent-file-group-rewrites": 5 - } -} -``` -- Orphan File Removal Policy -```json -{ - "version": "2025-02-03", - "enable": true, - "max_orphan_file_age_in_days": 30, - "locations": ["s3://my-bucket/my-table-location"], - "config": { - "prefix_mismatch_mode": "ignore" - } -} -``` - -### Attaching Policies to Resources - -Policies can be attached to different resource levels: - -1. **Catalog level**: Applies to the entire catalog -2. **Namespace level**: Applies to a specific namespace -3. **Table-like level**: Applies to individual tables or views - -Example of attaching a policy to a table: - -```json -PUT /polaris/v1/{prefix}/namespaces/{namespace}/policies/{policy-name}/mappings -{ - "target": { - "type": "table-like", - "path": ["NS1", "NS2", "test_table_1"] - } -} -``` - -For inheritable policies, only one policy of a given type can be attached to a resource. For non-inheritable policies, -multiple policies of the same type can be attached. - -### Retrieving Applicable Policies -A user can view applicable policies on a resource (e.g., table, namespace, or catalog) as long as they have -read permission on that resource. - -Here is an example to find all policies that apply to a specific resource (including inherited policies): -``` -GET /polaris/v1/catalog/applicable-policies?namespace=finance%1Fquarterly&target-name=transactions -``` - -**Sample response:** -```json -{ - "policies": [ - { - "name": "snapshot-expiry-policy", - "type": "system.snapshot-expiry", - "appliedAt": "namespace", - "content": { - "version": "2025-02-03", - "enable": true, - "config": { - "min_snapshot_to_keep": 1, - "max_snapshot_age_days": 2, - "max_ref_age_days": 3 - } - } - }, - { - "name": "compaction-policy", - "type": "system.data-compaction", - "appliedAt": "catalog", - "content": { - "version": "2025-02-03", - "enable": true, - "config": { - "target_file_size_bytes": 134217728 - } - } - } - ] -} -``` - -### API Reference - -For the complete and up-to-date API specification, see the [policy-api.yaml](https://github.com/apache/polaris/blob/main/spec/polaris-catalog-apis/policy-apis.yaml). \ No newline at end of file diff --git a/site/content/in-dev/1.0.0/realm.md b/site/content/in-dev/1.0.0/realm.md deleted file mode 100644 index 9da5e7e25b..0000000000 --- a/site/content/in-dev/1.0.0/realm.md +++ /dev/null @@ -1,53 +0,0 @@ ---- -# -# Licensed to the Apache Software Foundation (ASF) under one -# or more contributor license agreements. See the NOTICE file -# distributed with this work for additional information -# regarding copyright ownership. The ASF licenses this file -# to you under the Apache License, Version 2.0 (the -# "License"); you may not use this file except in compliance -# with the License. You may obtain a copy of the License at -# -# http://www.apache.org/licenses/LICENSE-2.0 -# -# Unless required by applicable law or agreed to in writing, -# software distributed under the License is distributed on an -# "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY -# KIND, either express or implied. See the License for the -# specific language governing permissions and limitations -# under the License. -# -Title: Realm -type: docs -weight: 350 ---- - -This page explains what a realm is and what it is used for in Polaris. - -### What is it? - -A realm in Polaris serves as logical partitioning mechanism within the catalog system. This isolation allows for multitenancy, enabling different teams, environments or organizations to operate independently within the same Polaris deployment. - -### Key Characteristics - -**Isolation:** Each realm encapsulates its own set of resources, ensuring that operations, policies in one realm do not affect others. - -**Authentication Context:** When configuring Polaris, principals credentials are associated with a specific realm. This allows for the separation of security concerns across different realms. - -**Configuration Scope:** Realm identifiers are used in various configurations, such as connection strings feature configurations, etc. - -An example of this is: - -`jdbc:postgresql://localhost:5432/{realm} -` -This ensures that each realm's data is stored separately. - -### How is it used in the system? - -**RealmContext:** It is a key concept used to identify and resolve the context in which operations are performed. For example `DefaultRealmContextResolver`, a realm is resolved from request headers, and operations are performed based on the resolved realm identifier. - -**Authentication and Authorization:** For example, in `BasePolarisAuthenticator`, `RealmContext` is used to provide context about the current security domain, which is used to retrieve the correct `PolarisMetastoreManager` that manages all Polaris entities and associated grant records metadata for -authorization. - -**Isolation:** In methods like `createEntityManagerFactory(@Nonnull RealmContext realmContext)` from `PolarisEclipseLinkPersistenceUnit` interface, the realm context influence how resources are created or managed based on the security policies of that realm. -An example of this is the way a realm name can be used to create a database connection url so that you have one database instance per realm, when applicable. Or it can be more granular and applied at primary key level (within the same database instance). \ No newline at end of file diff --git a/site/content/in-dev/1.0.0/telemetry.md b/site/content/in-dev/1.0.0/telemetry.md deleted file mode 100644 index 8df97f505d..0000000000 --- a/site/content/in-dev/1.0.0/telemetry.md +++ /dev/null @@ -1,192 +0,0 @@ ---- -# -# Licensed to the Apache Software Foundation (ASF) under one -# or more contributor license agreements. See the NOTICE file -# distributed with this work for additional information -# regarding copyright ownership. The ASF licenses this file -# to you under the Apache License, Version 2.0 (the -# "License"); you may not use this file except in compliance -# with the License. You may obtain a copy of the License at -# -# http://www.apache.org/licenses/LICENSE-2.0 -# -# Unless required by applicable law or agreed to in writing, -# software distributed under the License is distributed on an -# "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY -# KIND, either express or implied. See the License for the -# specific language governing permissions and limitations -# under the License. -# -title: Telemetry -type: docs -weight: 450 ---- - -## Metrics - -Metrics are published using [Micrometer]; they are available from Polaris's management interface -(port 8282 by default) under the path `/q/metrics`. For example, if the server is running on -localhost, the metrics can be accessed via http://localhost:8282/q/metrics. - -[Micrometer]: https://quarkus.io/guides/telemetry-micrometer - -Metrics can be scraped by Prometheus or any compatible metrics scraping server. See: -[Prometheus](https://prometheus.io) for more information. - -Additional tags can be added to the metrics by setting the `polaris.metrics.tags.*` property. Each -tag is a key-value pair, where the key is the tag name and the value is the tag value. For example, -to add a tag `environment=prod` to all metrics, set `polaris.metrics.tags.environment=prod`. Many -tags can be added, such as below: - -```properties -polaris.metrics.tags.service=polaris -polaris.metrics.tags.environment=prod -polaris.metrics.tags.region=us-west-2 -``` - -Note that by default Polaris adds one tag: `application=Polaris`. You can override this tag by -setting the `polaris.metrics.tags.application=` property. - -### Realm ID Tag - -Polaris can add the realm ID as a tag to all API and HTTP request metrics. This is disabled by -default to prevent high cardinality issues, but can be enabled by setting the following properties: - -```properties -polaris.metrics.realm-id-tag.enable-in-api-metrics=true -polaris.metrics.realm-id-tag.enable-in-http-metrics=true -``` - -You should be particularly careful when enabling the realm ID tag in HTTP request metrics, as these -metrics typically have a much higher cardinality than API request metrics. - -In order to prevent the number of tags from growing indefinitely and causing performance issues or -crashing the server, the number of unique realm IDs in HTTP request metrics is limited to 100 by -default. If the number of unique realm IDs exceeds this value, a warning will be logged and no more -HTTP request metrics will be recorded. This threshold can be changed by setting the -`polaris.metrics.realm-id-tag.http-metrics-max-cardinality` property. - -## Traces - -Traces are published using [OpenTelemetry]. - -[OpenTelemetry]: https://quarkus.io/guides/opentelemetry-tracing - -By default OpenTelemetry is disabled in Polaris, because there is no reasonable default -for the collector endpoint for all cases. - -To enable OpenTelemetry and publish traces for Polaris set `quarkus.otel.sdk.disabled=false` -and configure a valid collector endpoint URL with `http://` or `https://` as the server property -`quarkus.otel.exporter.otlp.traces.endpoint`. - -_If these properties are not set, the server will not publish traces._ - -The collector must talk the OpenTelemetry protocol (OTLP) and the port must be its gRPC port -(by default 4317), e.g. "http://otlp-collector:4317". - -By default, Polaris adds a few attributes to the [OpenTelemetry Resource] to identify the server, -and notably: - -- `service.name`: set to `Apache Polaris Server (incubating)`; -- `service.version`: set to the Polaris version. - -[OpenTelemetry Resource]: https://opentelemetry.io/docs/languages/js/resources/ - -You can override the default resource attributes or add additional ones by setting the -`quarkus.otel.resource.attributes` property. - -This property expects a comma-separated list of key-value pairs, where the key is the attribute name -and the value is the attribute value. For example, to change the service name to `Polaris` and add -an attribute `deployment.environment=dev`, set the following property: - -```properties -quarkus.otel.resource.attributes=service.name=Polaris,deployment.environment=dev -``` - -The alternative syntax below can also be used: - -```properties -quarkus.otel.resource.attributes[0]=service.name=Polaris -quarkus.otel.resource.attributes[1]=deployment.environment=dev -``` - -Finally, two additional span attributes are added to all request parent spans: - -- `polaris.request.id`: The unique identifier of the request, if set by the caller through the - `Polaris-Request-Id` header. -- `polaris.realm`: The unique identifier of the realm. Always set (unless the request failed because - of a realm resolution error). - -### Troubleshooting Traces - -If the server is unable to publish traces, check first for a log warning message like the following: - -``` -SEVERE [io.ope.exp.int.grp.OkHttpGrpcExporter] (OkHttp http://localhost:4317/...) Failed to export spans. -The request could not be executed. Full error message: Failed to connect to localhost/0:0:0:0:0:0:0:1:4317 -``` - -This means that the server is unable to connect to the collector. Check that the collector is -running and that the URL is correct. - -## Logging - -Polaris relies on [Quarkus](https://quarkus.io/guides/logging) for logging. - -By default, logs are written to the console and to a file located in the `./logs` directory. The log -file is rotated daily and compressed. The maximum size of the log file is 10MB, and the maximum -number of backup files is 14. - -JSON logging can be enabled by setting the `quarkus.log.console.json` and `quarkus.log.file.json` -properties to `true`. By default, JSON logging is disabled. - -The log level can be set for the entire application or for specific packages. The default log level -is `INFO`. To set the log level for the entire application, use the `quarkus.log.level` property. - -To set the log level for a specific package, use the `quarkus.log.category."package-name".level`, -where `package-name` is the name of the package. For example, the package `io.smallrye.config` has a -useful logger to help debugging configuration issues; but it needs to be set to the `DEBUG` level. -This can be done by setting the following property: - -```properties -quarkus.log.category."io.smallrye.config".level=DEBUG -``` - -The log message format for both console and file output is highly configurable. The default format -is: - -``` -%d{yyyy-MM-dd HH:mm:ss,SSS} %-5p [%c{3.}] [%X{requestId},%X{realmId}] [%X{traceId},%X{parentId},%X{spanId},%X{sampled}] (%t) %s%e%n -``` - -Refer to the [Logging format](https://quarkus.io/guides/logging#logging-format) guide for more -information on placeholders and how to customize the log message format. - -### MDC Logging - -Polaris uses Mapped Diagnostic Context (MDC) to enrich log messages with additional context. The -following MDC keys are available: - -- `requestId`: The unique identifier of the request, if set by the caller through the - `Polaris-Request-Id` header. -- `realmId`: The unique identifier of the realm. Always set. -- `traceId`: The unique identifier of the trace. Present if tracing is enabled and the message is - originating from a traced context. -- `parentId`: The unique identifier of the parent span. Present if tracing is enabled and the - message is originating from a traced context. -- `spanId`: The unique identifier of the span. Present if tracing is enabled and the message is - originating from a traced context. -- `sampled`: Whether the trace has been sampled. Present if tracing is enabled and the message is - originating from a traced context. - -Other MDC keys can be added by setting the `polaris.log.mdc.*` property. Each property is a -key-value pair, where the key is the MDC key name and the value is the MDC key value. For example, -to add the MDC keys `environment=prod` and `region=us-west-2` to all log messages, set the following -properties: - -```properties -polaris.log.mdc.environment=prod -polaris.log.mdc.region=us-west-2 -``` - -MDC context is propagated across threads, including in `TaskExecutor` threads. \ No newline at end of file diff --git a/site/hugo.yaml b/site/hugo.yaml index 5a33b0406a..7fb3fbac66 100644 --- a/site/hugo.yaml +++ b/site/hugo.yaml @@ -109,7 +109,7 @@ menu: url: "/in-dev/0.9.0/" parent: "releases" - name: "1.0.0" - url: "/in-dev/1.0.0/" + url: "/releases/1.0.0/" parent: "releases" - name: "In Development" url: "/in-dev/unreleased/" From 3e3a1177815ff73c09dd42d5ca924250dc1d68cf Mon Sep 17 00:00:00 2001 From: Yufei Gu Date: Tue, 8 Jul 2025 16:44:39 -0700 Subject: [PATCH 3/3] Resolve comments --- site/.gitignore | 2 - site/content/releases/1.0.0/_index.md | 186 +++ site/content/releases/1.0.0/access-control.md | 212 +++ site/content/releases/1.0.0/admin-tool.md | 142 ++ .../releases/1.0.0/command-line-interface.md | 1224 +++++++++++++++++ site/content/releases/1.0.0/configuration.md | 187 +++ .../configuring-polaris-for-production.md | 222 +++ site/content/releases/1.0.0/entities.md | 95 ++ site/content/releases/1.0.0/evolution.md | 115 ++ site/content/releases/1.0.0/generic-table.md | 169 +++ .../releases/1.0.0/getting-started/_index.md | 25 + .../deploying-polaris/_index.md | 27 + .../quickstart-deploy-aws.md | 57 + .../quickstart-deploy-azure.md | 52 + .../quickstart-deploy-gcp.md | 52 + .../getting-started/install-dependencies.md | 118 ++ .../1.0.0/getting-started/quickstart.md | 116 ++ .../1.0.0/getting-started/using-polaris.md | 315 +++++ site/content/releases/1.0.0/metastores.md | 151 ++ .../releases/1.0.0/polaris-catalog-service.md | 26 + .../1.0.0/polaris-management-service.md | 27 + .../releases/1.0.0/polaris-spark-client.md | 141 ++ site/content/releases/1.0.0/policy.md | 197 +++ site/content/releases/1.0.0/realm.md | 53 + site/content/releases/1.0.0/telemetry.md | 192 +++ 25 files changed, 4101 insertions(+), 2 deletions(-) create mode 100644 site/content/releases/1.0.0/_index.md create mode 100644 site/content/releases/1.0.0/access-control.md create mode 100644 site/content/releases/1.0.0/admin-tool.md create mode 100644 site/content/releases/1.0.0/command-line-interface.md create mode 100644 site/content/releases/1.0.0/configuration.md create mode 100644 site/content/releases/1.0.0/configuring-polaris-for-production.md create mode 100644 site/content/releases/1.0.0/entities.md create mode 100644 site/content/releases/1.0.0/evolution.md create mode 100644 site/content/releases/1.0.0/generic-table.md create mode 100644 site/content/releases/1.0.0/getting-started/_index.md create mode 100644 site/content/releases/1.0.0/getting-started/deploying-polaris/_index.md create mode 100644 site/content/releases/1.0.0/getting-started/deploying-polaris/quickstart-deploy-aws.md create mode 100644 site/content/releases/1.0.0/getting-started/deploying-polaris/quickstart-deploy-azure.md create mode 100644 site/content/releases/1.0.0/getting-started/deploying-polaris/quickstart-deploy-gcp.md create mode 100644 site/content/releases/1.0.0/getting-started/install-dependencies.md create mode 100644 site/content/releases/1.0.0/getting-started/quickstart.md create mode 100644 site/content/releases/1.0.0/getting-started/using-polaris.md create mode 100644 site/content/releases/1.0.0/metastores.md create mode 100644 site/content/releases/1.0.0/polaris-catalog-service.md create mode 100644 site/content/releases/1.0.0/polaris-management-service.md create mode 100644 site/content/releases/1.0.0/polaris-spark-client.md create mode 100644 site/content/releases/1.0.0/policy.md create mode 100644 site/content/releases/1.0.0/realm.md create mode 100644 site/content/releases/1.0.0/telemetry.md diff --git a/site/.gitignore b/site/.gitignore index daee245831..7cb64361dc 100644 --- a/site/.gitignore +++ b/site/.gitignore @@ -17,8 +17,6 @@ # under the License. # -content/releases/ - # Created/generated when running Hugo on the local host .hugo_build.lock public/ diff --git a/site/content/releases/1.0.0/_index.md b/site/content/releases/1.0.0/_index.md new file mode 100644 index 0000000000..b82c9366c2 --- /dev/null +++ b/site/content/releases/1.0.0/_index.md @@ -0,0 +1,186 @@ +--- +# +# Licensed to the Apache Software Foundation (ASF) under one +# or more contributor license agreements. See the NOTICE file +# distributed with this work for additional information +# regarding copyright ownership. The ASF licenses this file +# to you under the Apache License, Version 2.0 (the +# "License"); you may not use this file except in compliance +# with the License. You may obtain a copy of the License at +# +# http://www.apache.org/licenses/LICENSE-2.0 +# +# Unless required by applicable law or agreed to in writing, +# software distributed under the License is distributed on an +# "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY +# KIND, either express or implied. See the License for the +# specific language governing permissions and limitations +# under the License. +# +linkTitle: '1.0.0' +title: 'Overview' +type: docs +weight: 200 +params: + top_hidden: true + show_page_toc: false +cascade: + type: docs + params: + show_page_toc: true +# This file will NOT be copied into a new release's versioned docs folder. +--- + +{{< alert title="Warning" color="warning" >}} +These pages refer to the current state of the main branch, which is still under active development. + +Functionalities can be changed, removed or added without prior notice. +{{< /alert >}} + +Apache Polaris (Incubating) is a catalog implementation for Apache Iceberg™ tables and is built on the open source Apache Iceberg™ REST protocol. + +With Polaris, you can provide centralized, secure read and write access to your Iceberg tables across different REST-compatible query engines. + +![Conceptual diagram of Apache Polaris (Incubating).](/img/overview.svg "Apache Polaris (Incubating) overview") + +## Key concepts + +This section introduces key concepts associated with using Apache Polaris (Incubating). + +In the following diagram, a sample [Apache Polaris (Incubating) structure](#catalog) with nested [namespaces](#namespace) is shown for Catalog1. No tables +or namespaces have been created yet for Catalog2 or Catalog3. + +![Diagram that shows an example Apache Polaris (Incubating) structure.](/img/sample-catalog-structure.svg "Sample Apache Polaris (Incubating) structure") + +### Catalog + +In Polaris, you can create one or more catalog resources to organize Iceberg tables. + +Configure your catalog by setting values in the storage configuration for S3, Azure, or Google Cloud Storage. An Iceberg catalog enables a +query engine to manage and organize tables. The catalog forms the first architectural layer in the [Apache Iceberg™ table specification](https://iceberg.apache.org/spec/#overview) and must support the following tasks: + +- Storing the current metadata pointer for one or more Iceberg tables. A metadata pointer maps a table name to the location of that table's + current metadata file. + +- Performing atomic operations so that you can update the current metadata pointer for a table to the metadata pointer of a new version of + the table. + +To learn more about Iceberg catalogs, see the [Apache Iceberg™ documentation](https://iceberg.apache.org/concepts/catalog/). + +#### Catalog types + +A catalog can be one of the following two types: + +- Internal: The catalog is managed by Polaris. Tables from this catalog can be read and written in Polaris. + +- External: The catalog is externally managed by another Iceberg catalog provider (for example, Snowflake, Glue, Dremio Arctic). Tables from + this catalog are synced to Polaris. These tables are read-only in Polaris. + +A catalog is configured with a storage configuration that can point to S3, Azure storage, or GCS. + +### Namespace + +You create *namespaces* to logically group Iceberg tables within a catalog. A catalog can have multiple namespaces. You can also create +nested namespaces. Iceberg tables belong to namespaces. + +> **Important** +> +> For the access privileges defined for a catalog to be enforced correctly, the following conditions must be met: +> +> - The directory only contains the data files that belong to a single table. +> - The directory hierarchy matches the namespace hierarchy for the catalog. +> +> For example, if a catalog includes the following items: +> +> - Top-level namespace namespace1 +> - Nested namespace namespace1a +> - A customers table, which is grouped under nested namespace namespace1a +> - An orders table, which is grouped under nested namespace namespace1a +> +> The directory hierarchy for the catalog must follow this structure: +> +> - /namespace1/namespace1a/customers/ +> - /namespace1/namespace1a/orders/ + +### Storage configuration + +A storage configuration stores a generated identity and access management (IAM) entity for your cloud storage and is created +when you create a catalog. The storage configuration is used to set the values to connect Polaris to your cloud storage. During the +catalog creation process, an IAM entity is generated and used to create a trust relationship between the cloud storage provider and Polaris +Catalog. + +When you create a catalog, you supply the following information about your cloud storage: + +| Cloud storage provider | Information | +| -----------------------| ----------- | +| Amazon S3 |
  • Default base location for your Amazon S3 bucket
  • Locations for your Amazon S3 bucket
  • S3 role ARN
  • External ID (optional)
| +| Google Cloud Storage (GCS) |
  • Default base location for your GCS bucket
  • Locations for your GCS bucket
| +| Azure |
  • Default base location for your Microsoft Azure container
  • Locations for your Microsoft Azure container
  • Azure tenant ID
| + +## Example workflow + +In the following example workflow, Bob creates an Apache Iceberg™ table named Table1 and Alice reads data from Table1. + +1. Bob uses Apache Spark™ to create the Table1 table under the + Namespace1 namespace in the Catalog1 catalog and insert values into + Table1. + + Bob can create Table1 and insert data into it because he is using a + service connection with a service principal that has + the privileges to perform these actions. + +2. Alice uses Snowflake to read data from Table1. + + Alice can read data from Table1 because she is using a service + connection with a service principal with a catalog integration that + has the privileges to perform this action. Alice + creates an unmanaged table in Snowflake to read data from Table1. + +![Diagram that shows an example workflow for Apache Polaris (Incubating)](/img/example-workflow.svg "Example workflow for Apache Polaris (Incubating)") + +## Security and access control + +### Credential vending + +To secure interactions with service connections, Polaris vends temporary storage credentials to the query engine during query +execution. These credentials allow the query engine to run the query without requiring access to your cloud storage for +Iceberg tables. This process is called credential vending. + +As of now, the following limitation is known regarding Apache Iceberg support: + +- **remove_orphan_files:** Apache Spark can't use credential vending + for this due to a known issue. See [apache/iceberg#7914](https://github.com/apache/iceberg/pull/7914) for details. + +### Identity and access management (IAM) + +Polaris uses the identity and access management (IAM) entity to securely connect to your storage for accessing table data, Iceberg +metadata, and manifest files that store the table schema, partitions, and other metadata. Polaris retains the IAM entity for your +storage location. + +### Access control + +Polaris enforces the access control that you configure across all tables registered with the service and governs security for all +queries from query engines in a consistent manner. + +Polaris uses a role-based access control (RBAC) model that lets you centrally configure access for Polaris service principals to catalogs, +namespaces, and tables. + +Polaris RBAC uses two different role types to delegate privileges: + +- **Principal roles:** Granted to Polaris service principals and + analogous to roles in other access control systems that you grant to + service principals. + +- **Catalog roles:** Configured with certain privileges on Polaris + catalog resources and granted to principal roles. + +For more information, see [Access control]({{% ref "access-control" %}}). + +## Legal Notices + +Apache®, Apache Iceberg™, Apache Spark™, Apache Flink®, and Flink® are either registered trademarks or trademarks of the Apache Software Foundation in the United States and/or other countries. + + + diff --git a/site/content/releases/1.0.0/access-control.md b/site/content/releases/1.0.0/access-control.md new file mode 100644 index 0000000000..f8c21ab781 --- /dev/null +++ b/site/content/releases/1.0.0/access-control.md @@ -0,0 +1,212 @@ +--- +# +# Licensed to the Apache Software Foundation (ASF) under one +# or more contributor license agreements. See the NOTICE file +# distributed with this work for additional information +# regarding copyright ownership. The ASF licenses this file +# to you under the Apache License, Version 2.0 (the +# "License"); you may not use this file except in compliance +# with the License. You may obtain a copy of the License at +# +# http://www.apache.org/licenses/LICENSE-2.0 +# +# Unless required by applicable law or agreed to in writing, +# software distributed under the License is distributed on an +# "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY +# KIND, either express or implied. See the License for the +# specific language governing permissions and limitations +# under the License. +# +Title: Access Control +type: docs +weight: 500 +--- + +This section provides information about how access control works for Apache Polaris (Incubating). + +Polaris uses a role-based access control (RBAC) model in which the Polaris administrator assigns access privileges to catalog roles +and then grants access to resources to service principals by assigning catalog roles to principal roles. + +These are the key concepts to understanding access control in Polaris: + +- **Securable object** +- **Principal role** +- **Catalog role** +- **Privilege** + +## Securable object + +A securable object is an object to which access can be granted. Polaris +has the following securable objects: + +- Catalog +- Namespace +- Iceberg table +- View + +## Principal role + +A principal role is a resource in Polaris that you can use to logically group Polaris service principals together and grant privileges on +securable objects. + +Polaris supports a many-to-one relationship between service principals and principal roles. For example, to grant the same privileges to +multiple service principals, you can grant a single principal role to those service principals. A service principal can be granted one +principal role. When registering a service connection, the Polaris administrator specifies the principal role that is granted to the +service principal. + +You don't grant privileges directly to a principal role. Instead, you configure object permissions at the catalog role level, and then grant +catalog roles to a principal role. + +The following table shows examples of principal roles that you might configure in Polaris: + +| Principal role name | Description | +| -----------------------| ----------- | +| Data_engineer | A role that is granted to multiple service principals for running data engineering jobs. | +| Data_scientist | A role that is granted to multiple service principals for running data science or AI jobs. | + +## Catalog role + +A catalog role belongs to a particular catalog resource in Polaris and specifies a set of permissions for actions on the catalog or objects +in the catalog, such as catalog namespaces or tables. You can create one or more catalog roles for a catalog. + +You grant privileges to a catalog role and then grant the catalog role to a principal role to bestow the privileges to one or more service +principals. + +> **Note** +> +> If you update the privileges bestowed to a service principal, the updates won't take effect for up to one hour. This means that if you +> revoke or grant some privileges for a catalog, the updated privileges won't take effect on any service principal with access to that catalog +> for up to one hour. + +Polaris also supports a many-to-many relationship between catalog roles and principal roles. You can grant the same catalog role to one or more +principal roles. Likewise, a principal role can be granted to one or more catalog roles. + +The following table displays examples of catalog roles that you might +configure in Polaris: + +| Example Catalog role | Description| +| -----------------------|-----------| +| Catalog administrators | A role that has been granted multiple privileges to emulate full access to the catalog.
Principal roles that have been granted this role are permitted to create, alter, read, write, and drop tables in the catalog. | +| Catalog readers | A role that has been granted read-only privileges to tables in the catalog.
Principal roles that have been granted this role are allowed to read from tables in the catalog. | +| Catalog contributor | A role that has been granted read and write access privileges to all tables that belong to the catalog.
Principal roles that have been granted this role are allowed to perform read and write operations on tables in the catalog. | + +## RBAC model + +The following diagram illustrates the RBAC model used by Polaris. For each catalog, the Polaris administrator assigns access +privileges to catalog roles and then grants service principals access to resources by assigning catalog roles to principal roles. Polaris +supports a many-to-one relationship between service principals and principal roles. + +![Diagram that shows the RBAC model for Apache Polaris.](/img/rbac-model.svg "Apache Polaris RBAC model") + +## Access control privileges + +This section describes the privileges that are available in the Polaris access control model. Privileges are granted to catalog roles, catalog +roles are granted to principal roles, and principal roles are granted to service principals to specify the operations that service principals can +perform on objects in Polaris. + +> **Important** +> +> You can only grant privileges at the catalog level. Fine-grained access controls are not available. For example, you can grant read +> privileges to all tables in a catalog but not to an individual table in the catalog. + +To grant the full set of privileges (drop, list, read, write, etc.) on an object, you can use the *full privilege* option. + +### Table privileges + +| Privilege | Description | +| --------- | ----------- | +| TABLE_CREATE | Enables registering a table with the catalog. | +| TABLE_DROP | Enables dropping a table from the catalog. | +| TABLE_LIST | Enables listing any table in the catalog. | +| TABLE_READ_PROPERTIES | Enables reading properties of the table. | +| TABLE_WRITE_PROPERTIES | Enables configuring properties for the table. | +| TABLE_READ_DATA | Enables reading data from the table by receiving short-lived read-only storage credentials from the catalog. | +| TABLE_WRITE_DATA | Enables writing data to the table by receiving short-lived read+write storage credentials from the catalog. | +| TABLE_FULL_METADATA | Grants all table privileges, except TABLE_READ_DATA and TABLE_WRITE_DATA, which need to be granted individually. | +| TABLE_ATTACH_POLICY | Enables attaching policy to a table. | +| TABLE_DETACH_POLICY | Enables detaching policy from a table. | + +### View privileges + +| Privilege | Description | +| --------- | ----------- | +| VIEW_CREATE | Enables registering a view with the catalog. | +| VIEW_DROP | Enables dropping a view from the catalog. | +| VIEW_LIST | Enables listing any views in the catalog. | +| VIEW_READ_PROPERTIES | Enables reading all the view properties. | +| VIEW_WRITE_PROPERTIES | Enables configuring view properties. | +| VIEW_FULL_METADATA | Grants all view privileges. | + +### Namespace privileges + +| Privilege | Description | +| --------- | ----------- | +| NAMESPACE_CREATE | Enables creating a namespace in a catalog. | +| NAMESPACE_DROP | Enables dropping the namespace from the catalog. | +| NAMESPACE_LIST | Enables listing any object in the namespace, including nested namespaces and tables. | +| NAMESPACE_READ_PROPERTIES | Enables reading all the namespace properties. | +| NAMESPACE_WRITE_PROPERTIES | Enables configuring namespace properties. | +| NAMESPACE_FULL_METADATA | Grants all namespace privileges. | +| NAMESPACE_ATTACH_POLICY | Enables attaching policy to a namespace. | +| NAMESPACE_DETACH_POLICY | Enables detaching policy from a namespace. | + +### Catalog privileges + +| Privilege | Description | +| -----------------------| ----------- | +| CATALOG_MANAGE_ACCESS | Includes the ability to grant or revoke privileges on objects in a catalog to catalog roles, and the ability to grant or revoke catalog roles to or from principal roles. | +| CATALOG_MANAGE_CONTENT | Enables full management of content for the catalog. This privilege encompasses the following privileges:
  • CATALOG_MANAGE_METADATA
  • TABLE_FULL_METADATA
  • NAMESPACE_FULL_METADATA
  • VIEW_FULL_METADATA
  • TABLE_WRITE_DATA
  • TABLE_READ_DATA
  • CATALOG_READ_PROPERTIES
  • CATALOG_WRITE_PROPERTIES
| +| CATALOG_MANAGE_METADATA | Enables full management of the catalog, catalog roles, namespaces, and tables. | +| CATALOG_READ_PROPERTIES | Enables listing catalogs and reading properties of the catalog. | +| CATALOG_WRITE_PROPERTIES | Enables configuring catalog properties. | +| CATALOG_ATTACH_POLICY | Enables attaching policy to a catalog. | +| CATALOG_DETACH_POLICY | Enables detaching policy from a catalog. | + +### Policy privileges + +| Privilege | Description | +| -----------------------| ----------- | +| POLICY_CREATE | Enables creating a policy under specified namespace. | +| POLICY_READ | Enables reading policy content and metadata. | +| POLICY_WRITE | Enables updating the policy details such as its content or description. | +| POLICY_LIST | Enables listing any policy from the catalog. | +| POLICY_DROP | Enables dropping a policy if it is not attached to any resource entity. | +| POLICY_FULL_METADATA | Grants all policy privileges. | +| POLICY_ATTACH | Enables policy to be attached to entities. | +| POLICY_DETACH | Enables policy to be detached from entities. | + +## RBAC example + +The following diagram illustrates how RBAC works in Polaris and +includes the following users: + +- **Alice:** A service admin who signs up for Polaris. Alice can + create service principals. She can also create catalogs and + namespaces and configure access control for Polaris resources. + +- **Bob:** A data engineer who uses Apache Spark™ to + interact with Polaris. + + - Alice has created a service principal for Bob. It has been + granted the Data_engineer principal role, which in turn has been + granted the following catalog roles: Catalog contributor and + Data administrator (for both the Silver and Gold zone catalogs + in the following diagram). + + - The Catalog contributor role grants permission to create + namespaces and tables in the Bronze zone catalog. + + - The Data administrator roles grant full administrative rights to + the Silver zone catalog and Gold zone catalog. + +- **Mark:** A data scientist who uses trains models with data managed + by Polaris. + + - Alice has created a service principal for Mark. It has been + granted the Data_scientist principal role, which in turn has + been granted the catalog role named Catalog reader. + + - The Catalog reader role grants read-only access for a catalog + named Gold zone catalog. + +![Diagram that shows an example of how RBAC works in Apache Polaris.](/img/rbac-example.svg "Apache Polaris RBAC example") diff --git a/site/content/releases/1.0.0/admin-tool.md b/site/content/releases/1.0.0/admin-tool.md new file mode 100644 index 0000000000..14f37b6f0f --- /dev/null +++ b/site/content/releases/1.0.0/admin-tool.md @@ -0,0 +1,142 @@ +--- +# +# Licensed to the Apache Software Foundation (ASF) under one +# or more contributor license agreements. See the NOTICE file +# distributed with this work for additional information +# regarding copyright ownership. The ASF licenses this file +# to you under the Apache License, Version 2.0 (the +# "License"); you may not use this file except in compliance +# with the License. You may obtain a copy of the License at +# +# http://www.apache.org/licenses/LICENSE-2.0 +# +# Unless required by applicable law or agreed to in writing, +# software distributed under the License is distributed on an +# "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY +# KIND, either express or implied. See the License for the +# specific language governing permissions and limitations +# under the License. +# +title: Admin Tool +type: docs +weight: 300 +--- + +Polaris includes a tool for administrators to manage the metastore. + +The tool must be built with the necessary JDBC drivers to access the metastore database. For +example, to build the tool with support for Postgres, run the following: + +```shell +./gradlew \ + :polaris-admin:assemble \ + :polaris-admin:quarkusAppPartsBuild --rerun \ + -Dquarkus.container-image.build=true +``` + +The above command will generate: + +- One standalone JAR in `runtime/admin/build/polaris-admin-*-runner.jar` +- Two distribution archives in `runtime/admin/build/distributions` +- Two Docker images named `apache/polaris-admin-tool:latest` and `apache/polaris-admin-tool:` + +## Usage + +Please make sure the admin tool and Polaris server are with the same version before using it. +To run the standalone JAR, use the following command: + +```shell +java -jar runtime/admin/build/polaris-admin-*-runner.jar --help +``` + +To run the Docker image, use the following command: + +```shell +docker run apache/polaris-admin-tool:latest --help +``` + +The basic usage of the Polaris Admin Tool is outlined below: + +``` +Usage: polaris-admin-runner.jar [-hV] [COMMAND] +Polaris Admin Tool + -h, --help Show this help message and exit. + -V, --version Print version information and exit. +Commands: + help Display help information about the specified command. + bootstrap Bootstraps realms and principal credentials. + purge Purge principal credentials. +``` + +## Configuration + +The Polaris Admin Tool must be executed with the same configuration as the Polaris server. The +configuration can be done via environment variables or system properties. + +At a minimum, it is necessary to configure the Polaris Admin Tool to connect to the same database +used by the Polaris server. + +See the [metastore documentation]({{% ref "metastores" %}}) for more information on configuring the +database connection. + +Note: Polaris will always create schema 'polaris_schema' during bootstrap under the configured database. + +## Bootstrapping Realms and Principal Credentials + +The `bootstrap` command is used to bootstrap realms and create the necessary principal credentials +for the Polaris server. This command is idempotent and can be run multiple times without causing any +issues. If a realm is already bootstrapped, running the `bootstrap` command again will not have any +effect on that realm. + +```shell +java -jar runtime/admin/build/polaris-admin-*-runner.jar bootstrap --help +``` + +The basic usage of the `bootstrap` command is outlined below: + +``` +Usage: polaris-admin-runner.jar bootstrap [-hV] [-c=]... -r= [-r=]... +Bootstraps realms and root principal credentials. + -c, --credential= + Root principal credentials to bootstrap. Must be of the form + 'realm,clientId,clientSecret'. + -h, --help Show this help message and exit. + -r, --realm= The name of a realm to bootstrap. + -V, --version Print version information and exit. +``` + +For example, to bootstrap the `realm1` realm and create its root principal credential with the +client ID `admin` and client secret `admin`, you can run the following command: + +```shell +java -jar runtime/admin/build/polaris-admin-*-runner.jar bootstrap -r realm1 -c realm1,admin,admin +``` + +## Purging Realms and Principal Credentials + +The `purge` command is used to remove realms and principal credentials from the Polaris server. + +> Warning: Running the `purge` command will remove all data associated with the specified realms! + This includes all entities (catalogs, namespaces, tables, views, roles), all principal + credentials, grants, and any other data associated with the realms. + +```shell +java -jar runtime/admin/build/polaris-admin-*-runner.jar purge --help +``` + +The basic usage of the `purge` command is outlined below: + +``` +Usage: polaris-admin-runner.jar purge [-hV] -r= [-r=]... +Purge realms and all associated entities. + -h, --help Show this help message and exit. + -r, --realm= The name of a realm to purge. + -V, --version Print version information and exit. +``` + +For example, to purge the `realm1` realm, you can run the following command: + +```shell +java -jar runtime/admin/build/polaris-admin-*-runner.jar purge -r realm1 +``` \ No newline at end of file diff --git a/site/content/releases/1.0.0/command-line-interface.md b/site/content/releases/1.0.0/command-line-interface.md new file mode 100644 index 0000000000..f20210e2c6 --- /dev/null +++ b/site/content/releases/1.0.0/command-line-interface.md @@ -0,0 +1,1224 @@ +--- +# +# Licensed to the Apache Software Foundation (ASF) under one +# or more contributor license agreements. See the NOTICE file +# distributed with this work for additional information +# regarding copyright ownership. The ASF licenses this file +# to you under the Apache License, Version 2.0 (the +# "License"); you may not use this file except in compliance +# with the License. You may obtain a copy of the License at +# +# http://www.apache.org/licenses/LICENSE-2.0 +# +# Unless required by applicable law or agreed to in writing, +# software distributed under the License is distributed on an +# "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY +# KIND, either express or implied. See the License for the +# specific language governing permissions and limitations +# under the License. +# +title: Command Line Interface +type: docs +weight: 300 +--- + +In order to help administrators quickly set up and manage their Polaris server, Polaris provides a simple command-line interface (CLI) for common tasks. + +The basic syntax of the Polaris CLI is outlined below: + +``` +polaris [options] COMMAND ... + +options: +--host +--port +--base-url +--client-id +--client-secret +--access-token +--profile +``` + +`COMMAND` must be one of the following: +1. catalogs +2. principals +3. principal-roles +4. catalog-roles +5. namespaces +6. privileges +7. profiles + +Each _command_ supports several _subcommands_, and some _subcommands_ have _actions_ that come after the subcommand in turn. Finally, _arguments_ follow to form a full invocation. Within a set of named arguments at the end of an invocation ordering is generally not important. Many invocations also have a required positional argument of the type that the _command_ refers to. Again, the ordering of this positional argument relative to named arguments is not important. + +Some example full invocations: + +``` +polaris principals list +polaris catalogs delete some_catalog_name +polaris catalogs update --property foo=bar some_other_catalog +polaris catalogs update another_catalog --property k=v +polaris privileges namespace grant --namespace some.schema --catalog fourth_catalog --catalog-role some_catalog_role TABLE_READ_DATA +polaris profiles list +``` + +### Authentication + +As outlined above, the Polaris CLI may take credentials using the `--client-id` and `--client-secret` options. For example: + +``` +polaris --client-id 4b5ed1ca908c3cc2 --client-secret 07ea8e4edefb9a9e57c247e8d1a4f51c principals ... +``` + +If `--client-id` and `--client-secret` are not provided, the Polaris CLI will try to read the client ID and client secret from environment variables called `CLIENT_ID` and `CLIENT_SECRET` respectively. If these flags are not provided and the environment variables are not set, the CLI will fail. + +Alternatively, the `--access-token` option can be used instead of `--client-id` and `--client-secret`, but both authentication methods cannot be used simultaneously. + +Additionally, the `--profile` option can be used to specify a saved profile instead of providing authentication details directly. If `--profile` is not provided, the CLI will check the `CLIENT_PROFILE` environment variable. Profiles store authentication details and connection settings, simplifying repeated CLI usage. + +If the `--host` and `--port` options are not provided, the CLI will default to communicating with `localhost:8181`. + +Alternatively, the `--base-url` option can be used instead of `--host` and `--port`, but both options cannot be used simultaneously. This allows specifying arbitrary Polaris URLs, including HTTPS ones, that have additional base prefixes before the `/api/*/v1` subpaths. + +### PATH + +These examples assume the Polaris CLI is on the PATH and so can be invoked just by the command `polaris`. You can add the CLI to your PATH environment variable with a command like the following: + +``` +export PATH="~/polaris:$PATH" +``` + +Alternatively, you can run the CLI by providing a path to it, such as with the following invocation: + +``` +~/polaris principals list +``` + +## Commands + +Each of the commands `catalogs`, `principals`, `principal-roles`, `catalog-roles`, and `privileges` is used to manage a different type of entity within Polaris. + +In addition to these, the `profiles` command is available for managing stored authentication profiles, allowing login credentials to be configured for reuse. This provides an alternative to passing authentication details with every command. + +To find details on the options that can be provided to a particular command or subcommand ad-hoc, you may wish to use the `--help` flag. For example: + +``` +polaris catalogs --help +polaris principals create --help +polaris profiles --help +``` + +### catalogs + +The `catalogs` command is used to create, discover, and otherwise manage catalogs within Polaris. + +`catalogs` supports the following subcommands: + +1. create +2. delete +3. get +4. list +5. update + +#### create + +The `create` subcommand is used to create a catalog. + +``` +input: polaris catalogs create --help +options: + create + Named arguments: + --type The type of catalog to create in [INTERNAL, EXTERNAL]. INTERNAL by default. + --storage-type (Required) The type of storage to use for the catalog + --default-base-location (Required) Default base location of the catalog + --allowed-location An allowed location for files tracked by the catalog. Multiple locations can be provided by specifying this option more than once. + --role-arn (Required for S3) A role ARN to use when connecting to S3 + --external-id (Only for S3) The external ID to use when connecting to S3 + --tenant-id (Required for Azure) A tenant ID to use when connecting to Azure Storage + --multi-tenant-app-name (Only for Azure) The app name to use when connecting to Azure Storage + --consent-url (Only for Azure) A consent URL granting permissions for the Azure Storage location + --service-account (Only for GCS) The service account to use when connecting to GCS + --property A key/value pair such as: tag=value. Multiple can be provided by specifying this option more than once + Positional arguments: + catalog +``` + +##### Examples + +``` +polaris catalogs create \ + --storage-type s3 \ + --default-base-location s3://example-bucket/my_data \ + --role-arn ${ROLE_ARN} \ + my_catalog + +polaris catalogs create \ + --storage-type s3 \ + --default-base-location s3://example-bucket/my_other_data \ + --allowed-location s3://example-bucket/second_location \ + --allowed-location s3://other-bucket/third_location \ + --role-arn ${ROLE_ARN} \ + my_other_catalog + +polaris catalogs create \ + --storage-type file \ + --default-base-location file:///example/tmp \ + quickstart_catalog +``` + +#### delete + +The `delete` subcommand is used to delete a catalog. + +``` +input: polaris catalogs delete --help +options: + delete + Positional arguments: + catalog +``` + +##### Examples + +``` +polaris catalogs delete some_catalog +``` + +#### get + +The `get` subcommand is used to retrieve details about a catalog. + +``` +input: polaris catalogs get --help +options: + get + Positional arguments: + catalog +``` + +##### Examples + +``` +polaris catalogs get some_catalog + +polaris catalogs get another_catalog +``` + +#### list + +The `list` subcommand is used to show details about all catalogs, or those that a certain principal role has access to. The principal used to perform this operation must have the `CATALOG_LIST` privilege. + +``` +input: polaris catalogs list --help +options: + list + Named arguments: + --principal-role The name of a principal role +``` + +##### Examples + +``` +polaris catalogs list + +polaris catalogs list --principal-role some_user +``` + +#### update + +The `update` subcommand is used to update a catalog. Currently, this command supports changing the properties of a catalog or updating its storage configuration. + +``` +input: polaris catalogs update --help +options: + update + Named arguments: + --default-base-location (Required) Default base location of the catalog + --allowed-location An allowed location for files tracked by the catalog. Multiple locations can be provided by specifying this option more than once. + --property A key/value pair such as: tag=value. Multiple can be provided by specifying this option more than once + Positional arguments: + catalog +``` + +##### Examples + +``` +polaris catalogs update --property tag=new_value my_catalog + +polaris catalogs update --default-base-location s3://new-bucket/my_data my_catalog +``` + +### Principals + +The `principals` command is used to manage principals within Polaris. + +`principals` supports the following subcommands: + +1. create +2. delete +3. get +4. list +5. rotate-credentials +6. update +7. access + +#### create + +The `create` subcommand is used to create a new principal. + +``` +input: polaris principals create --help +options: + create + Named arguments: + --type The type of principal to create in [SERVICE] + --property A key/value pair such as: tag=value. Multiple can be provided by specifying this option more than once + Positional arguments: + principal +``` + +##### Examples + +``` +polaris principals create some_user + +polaris principals create --client-id ${CLIENT_ID} --property admin=true some_admin_user +``` + +#### delete + +The `delete` subcommand is used to delete a principal. + +``` +input: polaris principals delete --help +options: + delete + Positional arguments: + principal +``` + +##### Examples + +``` +polaris principals delete some_user + +polaris principals delete some_admin_user +``` + +#### get + +The `get` subcommand retrieves details about a principal. + +``` +input: polaris principals get --help +options: + get + Positional arguments: + principal +``` + +##### Examples + +``` +polaris principals get some_user + +polaris principals get some_admin_user +``` + +#### list + +The `list` subcommand shows details about all principals. + +##### Examples + +``` +polaris principals list +``` + +#### rotate-credentials + +The `rotate-credentials` subcommand is used to update the credentials used by a principal. After this command runs successfully, the new credentials will be printed to stdout. + +``` +input: polaris principals rotate-credentials --help +options: + rotate-credentials + Positional arguments: + principal +``` + +##### Examples + +``` +polaris principals rotate-credentials some_user + +polaris principals rotate-credentials some_admin_user +``` + +#### update + +The `update` subcommand is used to update a principal. Currently, this supports rewriting the properties associated with a principal. + +``` +input: polaris principals update --help +options: + update + Named arguments: + --property A key/value pair such as: tag=value. Multiple can be provided by specifying this option more than once + Positional arguments: + principal +``` + +##### Examples + +``` +polaris principals update --property key=value --property other_key=other_value some_user + +polaris principals update --property are_other_keys_removed=yes some_user +``` + +#### access + +The `access` subcommand retrieves entities relation about a principal. + +``` +input: polaris principals access --help +options: + access + Positional arguments: + principal +``` + +##### Examples + +``` +polaris principals access quickstart_user +``` + +### Principal Roles + +The `principal-roles` command is used to create, discover, and manage principal roles within Polaris. Additionally, this command can identify principals or catalog roles associated with a principal role, and can be used to grant a principal role to a principal. + +`principal-roles` supports the following subcommands: + +1. create +2. delete +3. get +4. list +5. update +6. grant +7. revoke + +#### create + +The `create` subcommand is used to create a new principal role. + +``` +input: polaris principal-roles create --help +options: + create + Named arguments: + --property A key/value pair such as: tag=value. Multiple can be provided by specifying this option more than once + Positional arguments: + principal_role +``` + +##### Examples + +``` +polaris principal-roles create data_engineer + +polaris principal-roles create --property key=value data_analyst +``` + +#### delete + +The `delete` subcommand is used to delete a principal role. + +``` +input: polaris principal-roles delete --help +options: + delete + Positional arguments: + principal_role +``` + +##### Examples + +``` +polaris principal-roles delete data_engineer + +polaris principal-roles delete data_analyst +``` + +#### get + +The `get` subcommand retrieves details about a principal role. + +``` +input: polaris principal-roles get --help +options: + get + Positional arguments: + principal_role +``` + +##### Examples + +``` +polaris principal-roles get data_engineer + +polaris principal-roles get data_analyst +``` + +#### list + +The list subcommand is used to print out all principal roles or, alternatively, to list all principal roles associated with a given principal or with a given catalog role. + +``` +input: polaris principal-roles list --help +options: + list + Named arguments: + --catalog-role The name of a catalog role. If provided, show only principal roles assigned to this catalog role. + --principal The name of a principal. If provided, show only principal roles assigned to this principal. +``` + +##### Examples + +``` +polaris principal-roles list + +polaris principal-roles --principal d.knuth + +polaris principal-roles --catalog-role super_secret_data +``` + +#### update + +The `update` subcommand is used to update a principal role. Currently, this supports updating the properties tied to a principal role. + +``` +input: polaris principal-roles update --help +options: + update + Named arguments: + --property A key/value pair such as: tag=value. Multiple can be provided by specifying this option more than once + Positional arguments: + principal_role +``` + +##### Examples + +``` +polaris principal-roles update --property key=value2 data_engineer + +polaris principal-roles update data_analyst --property key=value3 +``` + +#### grant + +The `grant` subcommand is used to grant a principal role to a principal. + +``` +input: polaris principal-roles grant --help +options: + grant + Named arguments: + --principal A principal to grant this principal role to + Positional arguments: + principal_role +``` + +##### Examples + +``` +polaris principal-roles grant --principal d.knuth data_engineer + +polaris principal-roles grant data_scientist --principal a.ng +``` + +#### revoke + +The `revoke` subcommand is used to revoke a principal role from a principal. + +``` +input: polaris principal-roles revoke --help +options: + revoke + Named arguments: + --principal A principal to revoke this principal role from + Positional arguments: + principal_role +``` + +##### Examples + +``` +polaris principal-roles revoke --principal former.employee data_engineer + +polaris principal-roles revoke data_scientist --principal changed.role +``` + +### Catalog Roles + +The catalog-roles command is used to create, discover, and manage catalog roles within Polaris. Additionally, this command can be used to grant a catalog role to a principal role. + +`catalog-roles` supports the following subcommands: + +1. create +2. delete +3. get +4. list +5. update +6. grant +7. revoke + +#### create + +The `create` subcommand is used to create a new catalog role. + +``` +input: polaris catalog-roles create --help +options: + create + Named arguments: + --catalog The name of an existing catalog + --property A key/value pair such as: tag=value. Multiple can be provided by specifying this option more than once + Positional arguments: + catalog_role +``` + +##### Examples + +``` +polaris catalog-roles create --property key=value --catalog some_catalog sales_data + +polaris catalog-roles create --catalog other_catalog sales_data +``` + +#### delete + +The `delete` subcommand is used to delete a catalog role. + +``` +input: polaris catalog-roles delete --help +options: + delete + Named arguments: + --catalog The name of an existing catalog + Positional arguments: + catalog_role +``` + +##### Examples + +``` +polaris catalog-roles delete --catalog some_catalog sales_data + +polaris catalog-roles delete --catalog other_catalog sales_data +``` + +#### get + +The `get` subcommand retrieves details about a catalog role. + +``` +input: polaris catalog-roles get --help +options: + get + Named arguments: + --catalog The name of an existing catalog + Positional arguments: + catalog_role +``` + +##### Examples + +``` +polaris catalog-roles get --catalog some_catalog inventory_data + +polaris catalog-roles get --catalog other_catalog inventory_data +``` + +#### list + +The `list` subcommand is used to print all catalog roles. Alternatively, if a principal role is provided, only catalog roles associated with that principal are shown. + +``` +input: polaris catalog-roles list --help +options: + list + Named arguments: + --principal-role The name of a principal role + Positional arguments: + catalog +``` + +##### Examples + +``` +polaris catalog-roles list + +polaris catalog-roles list --principal-role data_engineer +``` + +#### update + +The `update` subcommand is used to update a catalog role. Currently, only updating properties associated with the catalog role is supported. + +``` +input: polaris catalog-roles update --help +options: + update + Named arguments: + --catalog The name of an existing catalog + --property A key/value pair such as: tag=value. Multiple can be provided by specifying this option more than once + Positional arguments: + catalog_role +``` + +##### Examples + +``` +polaris catalog-roles update --property contains_pii=true --catalog some_catalog sales_data + +polaris catalog-roles update sales_data --catalog some_catalog --property key=value +``` + +#### grant + +The `grant` subcommand is used to grant a catalog role to a principal role. + +``` +input: polaris catalog-roles grant --help +options: + grant + Named arguments: + --catalog The name of an existing catalog + --principal-role The name of a catalog role + Positional arguments: + catalog_role +``` + +##### Examples + +``` +polaris catalog-roles grant sensitive_data --catalog some_catalog --principal-role power_user + +polaris catalog-roles grant --catalog sales_data contains_cc_info_catalog_role --principal-role financial_analyst_role +``` + +#### revoke + +The `revoke` subcommand is used to revoke a catalog role from a principal role. + +``` +input: polaris catalog-roles revoke --help +options: + revoke + Named arguments: + --catalog The name of an existing catalog + --principal-role The name of a catalog role + Positional arguments: + catalog_role +``` + +##### Examples + +``` +polaris catalog-roles revoke sensitive_data --catalog some_catalog --principal-role power_user + +polaris catalog-roles revoke --catalog sales_data contains_cc_info_catalog_role --principal-role financial_analyst_role +``` + +### Namespaces + +The `namespaces` command is used to manage namespaces within Polaris. + +`namespaces` supports the following subcommands: + +1. create +2. delete +3. get +4. list + +#### create + +The `create` subcommand is used to create a new namespace. + +When creating a namespace with an explicit location, that location must reside within the parent catalog or namespace. + +``` +input: polaris namespaces create --help +options: + create + Named arguments: + --catalog The name of an existing catalog + --location If specified, the location at which to store the namespace and entities inside it + --property A key/value pair such as: tag=value. Multiple can be provided by specifying this option more than once + Positional arguments: + namespace +``` + +##### Examples + +``` +polaris namespaces create --catalog my_catalog outer + +polaris namespaces create --catalog my_catalog --location 's3://bucket/outer/inner_SUFFIX' outer.inner +``` + +#### delete + +The `delete` subcommand is used to delete a namespace. + +``` +input: polaris namespaces delete --help +options: + delete + Named arguments: + --catalog The name of an existing catalog + Positional arguments: + namespace +``` + +##### Examples + +``` +polaris namespaces delete outer_namespace.inner_namespace --catalog my_catalog + +polaris namespaces delete --catalog my_catalog outer_namespace +``` + +#### get + +The `get` subcommand retrieves details about a namespace. + +``` +input: polaris namespaces get --help +options: + get + Named arguments: + --catalog The name of an existing catalog + Positional arguments: + namespace +``` + +##### Examples + +``` +polaris namespaces get --catalog some_catalog a.b + +polaris namespaces get a.b.c --catalog some_catalog +``` + +#### list + +The `list` subcommand shows details about all namespaces directly within a catalog or, optionally, within some parent prefix in that catalog. + +``` +input: polaris namespaces list --help +options: + list + Named arguments: + --catalog The name of an existing catalog + --parent If specified, list namespaces inside this parent namespace +``` + +##### Examples + +``` +polaris namespaces list --catalog my_catalog + +polaris namespaces list --catalog my_catalog --parent a + +polaris namespaces list --catalog my_catalog --parent a.b +``` + +### Privileges + +The `privileges` command is used to grant various privileges to a catalog role, or to revoke those privileges. Privileges can be on the level of a catalog, a namespace, a table, or a view. For more information on privileges, please refer to the [docs]({{% ref "entities#privilege" %}}). + +Note that when using the `privileges` command, the user specifies the relevant catalog and catalog role before selecting a subcommand. + +`privileges` supports the following subcommands: + +1. list +2. catalog +3. namespace +4. table +5. view + +Each of these subcommands, except `list`, supports the `grant` and `revoke` actions and requires an action to be specified. + +Note that each subcommand's `revoke` action always accepts the same options that the corresponding `grant` action does, but with the addition of the `cascade` option. `cascade` is used to revoke all other privileges that depend on the specified privilege. + +#### list + +The `list` subcommand shows details about all privileges for a catalog role. + +``` +input: polaris privileges list --help +options: + list + Named arguments: + --catalog The name of an existing catalog + --catalog-role The name of a catalog role +``` + +##### Examples + +``` +polaris privileges list --catalog my_catalog --catalog-role my_role + +polaris privileges my_role list --catalog-role my_other_role --catalog my_catalog +``` + +#### catalog + +The `catalog` subcommand manages privileges at the catalog level. `grant` is used to grant catalog privileges to the specified catalog role, and `revoke` is used to revoke them. + +``` +input: polaris privileges catalog --help +options: + catalog + grant + Named arguments: + --catalog The name of an existing catalog + --catalog-role The name of a catalog role + Positional arguments: + privilege + revoke + Named arguments: + --cascade When revoking privileges, additionally revoke privileges that depend on the specified privilege + --catalog The name of an existing catalog + --catalog-role The name of a catalog role + Positional arguments: + privilege +``` + +##### Examples + +``` +polaris privileges \ + catalog \ + grant \ + --catalog my_catalog \ + --catalog-role catalog_role \ + TABLE_CREATE + +polaris privileges \ + catalog \ + revoke \ + --catalog my_catalog \ + --catalog-role catalog_role \ + --cascade \ + TABLE_CREATE +``` + +#### namespace + +The `namespace` subcommand manages privileges at the namespace level. + +``` +input: polaris privileges namespace --help +options: + namespace + grant + Named arguments: + --namespace A period-delimited namespace + --catalog The name of an existing catalog + --catalog-role The name of a catalog role + Positional arguments: + privilege + revoke + Named arguments: + --namespace A period-delimited namespace + --cascade When revoking privileges, additionally revoke privileges that depend on the specified privilege + --catalog The name of an existing catalog + --catalog-role The name of a catalog role + Positional arguments: + privilege +``` + +##### Examples + +``` +polaris privileges \ + namespace \ + grant \ + --catalog my_catalog \ + --catalog-role catalog_role \ + --namespace a.b \ + TABLE_LIST + +polaris privileges \ + namespace \ + revoke \ + --catalog my_catalog \ + --catalog-role catalog_role \ + --namespace a.b \ + TABLE_LIST +``` + +#### table + +The `table` subcommand manages privileges at the table level. + +``` +input: polaris privileges table --help +options: + table + grant + Named arguments: + --namespace A period-delimited namespace + --table The name of a table + --catalog The name of an existing catalog + --catalog-role The name of a catalog role + Positional arguments: + privilege + revoke + Named arguments: + --namespace A period-delimited namespace + --table The name of a table + --cascade When revoking privileges, additionally revoke privileges that depend on the specified privilege + --catalog The name of an existing catalog + --catalog-role The name of a catalog role + Positional arguments: + privilege +``` + +##### Examples + +``` +polaris privileges \ + table \ + grant \ + --catalog my_catalog \ + --catalog-role catalog_role \ + --namespace a.b \ + --table t \ + TABLE_DROP + +polaris privileges \ + table \ + grant \ + --catalog my_catalog \ + --catalog-role catalog_role \ + --namespace a.b \ + --table t \ + --cascade \ + TABLE_DROP +``` + +#### view + +The `view` subcommand manages privileges at the view level. + +``` +input: polaris privileges view --help +options: + view + grant + Named arguments: + --namespace A period-delimited namespace + --view The name of a view + --catalog The name of an existing catalog + --catalog-role The name of a catalog role + Positional arguments: + privilege + revoke + Named arguments: + --namespace A period-delimited namespace + --view The name of a view + --cascade When revoking privileges, additionally revoke privileges that depend on the specified privilege + --catalog The name of an existing catalog + --catalog-role The name of a catalog role + Positional arguments: + privilege +``` + +##### Examples + +``` +polaris privileges \ + view \ + grant \ + --catalog my_catalog \ + --catalog-role catalog_role \ + --namespace a.b.c \ + --view v \ + VIEW_FULL_METADATA + +polaris privileges \ + view \ + grant \ + --catalog my_catalog \ + --catalog-role catalog_role \ + --namespace a.b.c \ + --view v \ + --cascade \ + VIEW_FULL_METADATA +``` + +### profiles + +The `profiles` command is used to manage stored authentication profiles in Polaris. Profiles allow authentication credentials to be saved and reused, eliminating the need to pass credentials with every command. + +`profiles` supports the following subcommands: + +1. create +2. delete +3. get +4. list +5. update + +#### create + +The `create` subcommand is used to create a new authentication profile. + +``` +input: polaris profiles create --help +options: + create + Positional arguments: + profile +``` + +##### Examples + +``` +polaris profiles create dev +``` + +#### delete + +The `delete` subcommand removes a stored profile. + +``` +input: polaris profiles delete --help +options: + delete + Positional arguments: + profile +``` + +##### Examples + +``` +polaris profiles delete dev +``` + +#### get + +The `get` subcommand removes a stored profile. + +``` +input: polaris profiles get --help +options: + get + Positional arguments: + profile +``` + +##### Examples + +``` +polaris profiles get dev +``` + +#### list + +The `list` subcommand displays all stored profiles. + +``` +input: polaris profiles list --help +options: + list +``` + +##### Examples + +``` +polaris profiles list +``` + +#### update + +The `update` subcommand modifies an existing profile. + +``` +input: polaris profiles update --help +options: + update + Positional arguments: + profile +``` + +##### Examples + +``` +polaris profiles update dev +``` + +## Examples + +This section outlines example code for a few common operations as well as for some more complex ones. + +For especially complex operations, you may wish to instead directly use the Python API. + +### Creating a principal and a catalog + +``` +polaris principals create my_user + +polaris catalogs create \ + --type internal \ + --storage-type s3 \ + --default-base-location s3://iceberg-bucket/polaris-base \ + --role-arn arn:aws:iam::111122223333:role/ExampleCorpRole \ + --allowed-location s3://iceberg-bucket/polaris-alt-location-1 \ + --allowed-location s3://iceberg-bucket/polaris-alt-location-2 \ + my_catalog +``` + +### Granting a principal the ability to manage the content of a catalog + +``` +polaris principal-roles create power_user +polaris principal-roles grant --principal my_user power_user + +polaris catalog-roles create --catalog my_catalog my_catalog_role +polaris catalog-roles grant \ + --catalog my_catalog \ + --principal-role power_user \ + my_catalog_role + +polaris privileges \ + catalog \ + --catalog my_catalog \ + --catalog-role my_catalog_role \ + grant \ + CATALOG_MANAGE_CONTENT +``` + +### Identifying the tables a given principal has been granted explicit access to read + +_Note that some other privileges, such as `CATALOG_MANAGE_CONTENT`, subsume `TABLE_READ_DATA` and would not be discovered here._ + +``` +principal_roles=$(polaris principal-roles list --principal my_principal) +for principal_role in ${principal_roles}; do + catalog_roles=$(polaris catalog-roles --list --principal-role "${principal_role}") + for catalog_role in ${catalog_roles}; do + grants=$(polaris privileges list --catalog-role "${catalog_role}" --catalog "${catalog}") + for grant in $(echo "${grants}" | jq -c '.[] | select(.privilege == "TABLE_READ_DATA")'); do + echo "${grant}" + fi + done + done +done +``` + + diff --git a/site/content/releases/1.0.0/configuration.md b/site/content/releases/1.0.0/configuration.md new file mode 100644 index 0000000000..95d77230f9 --- /dev/null +++ b/site/content/releases/1.0.0/configuration.md @@ -0,0 +1,187 @@ +--- +# +# Licensed to the Apache Software Foundation (ASF) under one +# or more contributor license agreements. See the NOTICE file +# distributed with this work for additional information +# regarding copyright ownership. The ASF licenses this file +# to you under the Apache License, Version 2.0 (the +# "License"); you may not use this file except in compliance +# with the License. You may obtain a copy of the License at +# +# http://www.apache.org/licenses/LICENSE-2.0 +# +# Unless required by applicable law or agreed to in writing, +# software distributed under the License is distributed on an +# "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY +# KIND, either express or implied. See the License for the +# specific language governing permissions and limitations +# under the License. +# +title: Configuring Polaris +type: docs +weight: 550 +--- + +## Overview + +This page provides information on how to configure Apache Polaris (Incubating). Unless stated +otherwise, this information is valid both for Polaris Docker images (and Kubernetes deployments) as +well as for Polaris binary distributions. + +> Note: for Production tips and best practices, refer to [Configuring Polaris for Production]({{% ref "configuring-polaris-for-production.md" %}}). + +First off, Polaris server runs on Quarkus, and uses its configuration mechanisms. Read Quarkus +[configuration guide](https://quarkus.io/guides/config) to get familiar with the basics. + +Quarkus aggregates configuration properties from multiple sources, applying them in a specific order +of precedence. When a property is defined in multiple sources, the value from the source with the +higher priority overrides those from lower-priority sources. + +The sources are listed below, from highest to lowest priority: + +1. System properties: properties set via the Java command line using `-Dproperty.name=value`. +2. Environment variables (see below for important details). +3. Settings in `$PWD/config/application.properties` file. +4. The `application.properties` files packaged in Polaris. +5. Default values: hardcoded defaults within the application. + +When using environment variables, there are two naming conventions: + +1. If possible, just use the property name as the environment variable name. This works fine in most + cases, e.g. in Kubernetes deployments. For example, `polaris.realm-context.realms` can be + included as is in a container YAML definition: + ```yaml + env: + - name: "polaris.realm-context.realms" + value: "realm1,realm2" + ``` + +2. If running from a script or shell prompt, however, stricter naming rules apply: variable names + can consist solely of uppercase letters, digits, and the `_` (underscore) sign. In such + situations, the environment variable name must be derived from the property name, by using + uppercase letters, and replacing all dots, dashes and quotes by underscores. For example, + `polaris.realm-context.realms` becomes `POLARIS_REALM_CONTEXT_REALMS`. See + [here](https://smallrye.io/smallrye-config/Main/config/environment-variables/) for more details. + +> [!IMPORTANT] +> While convenient, uppercase-only environment variables can be problematic for complex property +> names. In these situations, it's preferable to use system properties or a configuration file. + +As stated above, a configuration file can also be provided at runtime; it should be available +(mounted) at `$PWD/config/application.properties` for Polaris server to recognize it. In Polaris +official Docker images, this location is `/deployment/config/application.properties`. + +For Kubernetes deployments, the configuration file is typically defined as a `ConfigMap`, then +mounted in the container at `/deployment/config/application.properties`. It can be mounted in +read-only mode, as Polaris only reads the configuration file once, at startup. + +## Polaris Configuration Options Reference + +| Configuration Property | Default Value | Description | +|----------------------------------------------------------------------------------------|-----------------------|--------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------| +| `polaris.persistence.type` | `relational-jdbc` | Define the persistence backend used by Polaris (`in-memory`, `relational-jdbc`, `eclipse-link` (deprecated)). See [Configuring Apache Polaris for Production)[{{% ref "configuring-polaris-for-production.md" %}}) | +| `polaris.persistence.relational.jdbc.max-retries` | `1` | Total number of retries JDBC persistence will attempt on connection resets or serialization failures before giving up. | +| `polaris.persistence.relational.jdbc.max_duaration_in_ms` | `5000 ms` | Max time interval (ms) since the start of a transaction when retries can be attempted. | +| `polaris.persistence.relational.jdbc.initial_delay_in_ms` | `100 ms` | Initial delay before retrying. The delay is doubled after each retry. | +| `polaris.persistence.eclipselink.configurationFile` | | Define the location of the `persistence.xml`. By default, it's the built-in `persistence.xml` in use. | +| `polaris.persistence.eclipselink.persistenceUnit` | `polaris` | Define the name of the persistence unit to use, as defined in the `persistence.xml`. | +| `polaris.realm-context.type` | `default` | Define the type of the Polaris realm to use. | +| `polaris.realm-context.realms` | `POLARIS` | Define the list of realms to use. | +| `polaris.realm-context.header-name` | `Polaris-Realm` | Define the header name defining the realm context. | +| `polaris.features."ENFORCE_PRINCIPAL_CREDENTIAL_ROTATION_REQUIRED_CHECKING"` | `false` | Flag to enforce check if credential rotation. | +| `polaris.features."SUPPORTED_CATALOG_STORAGE_TYPES"` | `FILE` | Define the catalog supported storage. Supported values are `S3`, `GCS`, `AZURE`, `FILE`. | +| `polaris.features.realm-overrides."my-realm"."SKIP_CREDENTIAL_SUBSCOPING_INDIRECTION"` | `true` | "Override" realm features, here the skip credential subscoping indirection flag. | +| `polaris.authentication.authenticator.type` | `default` | Define the Polaris authenticator type. | +| `polaris.authentication.token-service.type` | `default` | Define the Polaris token service type. | +| `polaris.authentication.token-broker.type` | `rsa-key-pair` | Define the Polaris token broker type. Also configure the location of the key files. For RSA: if the locations of the key files are not configured, an ephemeral key-pair will be created on each Polaris server instance startup, which breaks existing tokens after server restarts and is also incompatible with running multiple Polaris server instances. | +| `polaris.authentication.token-broker.max-token-generation` | `PT1H` | Define the max token generation policy on the token broker. | +| `polaris.authentication.token-broker.rsa-key-pair.private-key-file` | | Define the location of the RSA-256 private key file, if present the `public-key` file must be specified, too. | +| `polaris.authentication.token-broker.rsa-key-pair.public-key-file` | | Define the location of the RSA-256 public key file, if present the `private-key` file must be specified, too. | +| `polaris.authentication.token-broker.symmetric-key.secret` | `secret` | Define the secret of the symmetric key. | +| `polaris.authentication.token-broker.symmetric-key.file` | `/tmp/symmetric.key` | Define the location of the symmetric key file. | +| `polaris.storage.aws.access-key` | `accessKey` | Define the AWS S3 access key. If unset, the default credential provider chain will be used. | +| `polaris.storage.aws.secret-key` | `secretKey` | Define the AWS S3 secret key. If unset, the default credential provider chain will be used. | +| `polaris.storage.gcp.token` | `token` | Define the Google Cloud Storage token. If unset, the default credential provider chain will be used. | +| `polaris.storage.gcp.lifespan` | `PT1H` | Define the Google Cloud Storage lifespan type. If unset, the default credential provider chain will be used. | +| `polaris.log.request-id-header-name` | `Polaris-Request-Id` | Define the header name to match request ID in the log. | +| `polaris.log.mdc.aid` | `polaris` | Define the log context (e.g. MDC) AID. | +| `polaris.log.mdc.sid` | `polaris-service` | Define the log context (e.g. MDC) SID. | +| `polaris.rate-limiter.filter.type` | `no-op` | Define the Polaris rate limiter. Supported values are `no-op`, `token-bucket`. | +| `polaris.rate-limiter.token-bucket.type` | `default` | Define the token bucket rate limiter. | +| `polaris.rate-limiter.token-bucket.requests-per-second` | `9999` | Define the number of requests per second for the token bucket rate limiter. | +| `polaris.rate-limiter.token-bucket.window` | `PT10S` | Define the window type for the token bucket rate limiter. | +| `polaris.metrics.tags.=` | `application=Polaris` | Define arbitrary metric tags to include in every request. | +| `polaris.metrics.realm-id-tag.api-metrics-enabled` | `false` | Whether to enable the `realm_id` metric tag in API metrics. | +| `polaris.metrics.realm-id-tag.http-metrics-enabled` | `false` | Whether to enable the `realm_id` metric tag in HTTP request metrics. | +| `polaris.metrics.realm-id-tag.http-metrics-max-cardinality` | `100` | The maximum cardinality for the `realm_id` tag in HTTP request metrics. | +| `polaris.tasks.max-concurrent-tasks` | `100` | Define the max number of concurrent tasks. | +| `polaris.tasks.max-queued-tasks` | `1000` | Define the max number of tasks in queue. | + +There are non Polaris configuration properties that can be useful: + +| Configuration Property | Default Value | Description | +|------------------------------------------------------|---------------------------------|-----------------------------------------------------------------------------| +| `quarkus.log.level` | `INFO` | Define the root log level. | +| `quarkus.log.category."org.apache.polaris".level` | | Define the log level for a specific category. | +| `quarkus.default-locale` | System locale | Force the use of a specific locale, for instance `en_US`. | +| `quarkus.http.port` | `8181` | Define the HTTP port number. | +| `quarkus.http.auth.basic` | `false` | Enable the HTTP basic authentication. | +| `quarkus.http.limits.max-body-size` | `10240K` | Define the HTTP max body size limit. | +| `quarkus.http.cors.origins` | | Define the HTTP CORS origins. | +| `quarkus.http.cors.methods` | `PATCH, POST, DELETE, GET, PUT` | Define the HTTP CORS covered methods. | +| `quarkus.http.cors.headers` | `*` | Define the HTTP CORS covered headers. | +| `quarkus.http.cors.exposed-headers` | `*` | Define the HTTP CORS covered exposed headers. | +| `quarkus.http.cors.access-control-max-age` | `PT10M` | Define the HTTP CORS access control max age. | +| `quarkus.http.cors.access-control-allow-credentials` | `true` | Define the HTTP CORS access control allow credentials flag. | +| `quarkus.management.enabled` | `true` | Enable the management server. | +| `quarkus.management.port` | `8182` | Define the port number of the Polaris management server. | +| `quarkus.management.root-path` | | Define the root path where `/metrics` and `/health` endpoints are based on. | +| `quarkus.otel.sdk.disabled` | `true` | Enable the OpenTelemetry layer. | + +> Note: This section is only relevant for Polaris Docker images and Kubernetes deployments. + +There are many other actionable environment variables available in the official Polaris Docker +image; they come from the base image used by Polaris, [ubi9/openjdk-21-runtime]. They should be used +to fine-tune the Java runtime directly, e.g. to enable debugging or to set the heap size. These +variables are not specific to Polaris, but are inherited from the base image. If in doubt, leave +everything at its default! + +[ubi9/openjdk-21-runtime]: https://catalog.redhat.com/software/containers/ubi9/openjdk-21-runtime/6501ce769a0d86945c422d5f + +| Environment variable | Description | +|----------------------------------|----------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------| +| `JAVA_OPTS` or `JAVA_OPTIONS` | **NOT RECOMMENDED**. JVM options passed to the `java` command (example: "-verbose:class"). Setting this variable will override all options set by any of the other variables in this table. To pass extra settings, use `JAVA_OPTS_APPEND` instead. | +| `JAVA_OPTS_APPEND` | User specified Java options to be appended to generated options in `JAVA_OPTS` (example: "-Dsome.property=foo"). | +| `JAVA_TOOL_OPTIONS` | This variable is defined and honored by all OpenJDK distros, see [here](https://bugs.openjdk.org/browse/JDK-4971166). Options defined here take precedence over all else; using this variable is generally not necessary, but can be useful e.g. to enforce JVM startup parameters, to set up remote debug, or to define JVM agents. | +| `JAVA_MAX_MEM_RATIO` | Is used to calculate a default maximal heap memory based on a containers restriction. If used in a container without any memory constraints for the container then this option has no effect. If there is a memory constraint then `-XX:MaxRAMPercentage` is set to a ratio of the container available memory as set here. The default is `80` which means 80% of the available memory is used as an upper boundary. You can skip this mechanism by setting this value to `0` in which case no `-XX:MaxRAMPercentage` option is added. | +| `JAVA_DEBUG` | If set remote debugging will be switched on. Disabled by default (example: true"). | +| `JAVA_DEBUG_PORT` | Port used for remote debugging. Defaults to "5005" (tip: use "*:5005" to enable debugging on all network interfaces). | +| `GC_MIN_HEAP_FREE_RATIO` | Minimum percentage of heap free after GC to avoid expansion. Default is 10. | +| `GC_MAX_HEAP_FREE_RATIO` | Maximum percentage of heap free after GC to avoid shrinking. Default is 20. | +| `GC_TIME_RATIO` | Specifies the ratio of the time spent outside the garbage collection. Default is 4. | +| `GC_ADAPTIVE_SIZE_POLICY_WEIGHT` | The weighting given to the current GC time versus previous GC times. Default is 90. | +| `GC_METASPACE_SIZE` | The initial metaspace size. There is no default (example: "20"). | +| `GC_MAX_METASPACE_SIZE` | The maximum metaspace size. There is no default (example: "100"). | +| `GC_CONTAINER_OPTIONS` | Specify Java GC to use. The value of this variable should contain the necessary JRE command-line options to specify the required GC, which will override the default of `-XX:+UseParallelGC` (example: `-XX:+UseG1GC`). | +Here are some examples: + +| Example | `docker run` option | +|--------------------------------------------|---------------------------------------------------------------------------------------------------------------------| +| Using another GC | `-e GC_CONTAINER_OPTIONS="-XX:+UseShenandoahGC"` lets Polaris use Shenandoah GC instead of the default parallel GC. | +| Set the Java heap size to a _fixed_ amount | `-e JAVA_OPTS_APPEND="-Xms8g -Xmx8g"` lets Polaris use a Java heap of 8g. | +| Set the maximum heap percentage | `-e JAVA_MAX_MEM_RATIO="70"` lets Polaris use 70% percent of the available memory. | + + +## Troubleshooting Configuration Issues + +If you encounter issues with the configuration, you can ask Polaris to print out the configuration it +is using. To do this, set the log level for the `io.smallrye.config` category to `DEBUG`, and also +set the console appender level to `DEBUG`: + +```properties +quarkus.log.console.level=DEBUG +quarkus.log.category."io.smallrye.config".level=DEBUG +``` + +> [!IMPORTANT] This will print out all configuration values, including sensitive ones like +> passwords. Don't do this in production, and don't share this output with anyone you don't trust! diff --git a/site/content/releases/1.0.0/configuring-polaris-for-production.md b/site/content/releases/1.0.0/configuring-polaris-for-production.md new file mode 100644 index 0000000000..fac51b40f9 --- /dev/null +++ b/site/content/releases/1.0.0/configuring-polaris-for-production.md @@ -0,0 +1,222 @@ +--- +# +# Licensed to the Apache Software Foundation (ASF) under one +# or more contributor license agreements. See the NOTICE file +# distributed with this work for additional information +# regarding copyright ownership. The ASF licenses this file +# to you under the Apache License, Version 2.0 (the +# "License"); you may not use this file except in compliance +# with the License. You may obtain a copy of the License at +# +# http://www.apache.org/licenses/LICENSE-2.0 +# +# Unless required by applicable law or agreed to in writing, +# software distributed under the License is distributed on an +# "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY +# KIND, either express or implied. See the License for the +# specific language governing permissions and limitations +# under the License. +# +title: Configuring Polaris for Production +linkTitle: Production Configuration +type: docs +weight: 600 +--- + +The default server configuration is intended for development and testing. When you deploy Polaris in production, +review and apply the following checklist: +- [ ] Configure OAuth2 keys +- [ ] Enforce realm header validation (`require-header=true`) +- [ ] Use a durable metastore (JDBC + PostgreSQL) +- [ ] Bootstrap valid realms in the metastore +- [ ] Disable local FILE storage + +### Configure OAuth2 + +Polaris authentication requires specifying a token broker factory type. Two implementations are +supported out of the box: + +- [rsa-key-pair] uses a pair of public and private keys; +- [symmetric-key] uses a shared secret. + +[rsa-key-pair]: https://github.com/apache/polaris/blob/390f1fa57bb1af24a21aa95fdbff49a46e31add7/service/common/src/main/java/org/apache/polaris/service/auth/JWTRSAKeyPairFactory.java +[symmetric-key]: https://github.com/apache/polaris/blob/390f1fa57bb1af24a21aa95fdbff49a46e31add7/service/common/src/main/java/org/apache/polaris/service/auth/JWTSymmetricKeyFactory.java + +By default, Polaris uses `rsa-key-pair`, with randomly generated keys. + +> [!IMPORTANT] +> The default `rsa-key-pair` configuration is not suitable when deploying many replicas of Polaris, +> as each replica will have its own set of keys. This will cause token validation to fail when a +> request is routed to a different replica than the one that issued the token. + +It is highly recommended to configure Polaris with previously-generated RSA keys. This can be done +by setting the following properties: + +```properties +polaris.authentication.token-broker.type=rsa-key-pair +polaris.authentication.token-broker.rsa-key-pair.public-key-file=/tmp/public.key +polaris.authentication.token-broker.rsa-key-pair.private-key-file=/tmp/private.key +``` + +To generate an RSA key pair, you can use the following commands: + +```shell +openssl genrsa -out private.key 2048 +openssl rsa -in private.key -pubout -out public.key +``` + +Alternatively, you can use a symmetric key by setting the following properties: + +```properties +polaris.authentication.token-broker.type=symmetric-key +polaris.authentication.token-broker.symmetric-key.file=/tmp/symmetric.key +``` + +Note: it is also possible to set the symmetric key secret directly in the configuration file. If +possible, pass the secret as an environment variable to avoid storing sensitive information in the +configuration file: + +```properties +polaris.authentication.token-broker.symmetric-key.secret=${POLARIS_SYMMETRIC_KEY_SECRET} +``` + +Finally, you can also configure the token broker to use a maximum lifespan by setting the following +property: + +```properties +polaris.authentication.token-broker.max-token-generation=PT1H +``` + +Typically, in Kubernetes, you would define the keys as a `Secret` and mount them as files in the +container. + +### Realm Context Resolver + +By default, Polaris resolves realms based on incoming request headers. You can configure the realm +context resolver by setting the following properties in `application.properties`: + +```properties +polaris.realm-context.realms=POLARIS,MY-REALM +polaris.realm-context.header-name=Polaris-Realm +``` + +Where: + +- `realms` is a comma-separated list of allowed realms. This setting _must_ be correctly configured. + At least one realm must be specified. +- `header-name` is the name of the header used to resolve the realm; by default, it is + `Polaris-Realm`. + +If a request contains the specified header, Polaris will use the realm specified in the header. If +the realm is not in the list of allowed realms, Polaris will return a `404 Not Found` response. + +If a request _does not_ contain the specified header, however, by default Polaris will use the first +realm in the list as the default realm. In the above example, `POLARIS` is the default realm and +would be used if the `Polaris-Realm` header is not present in the request. + +This is not recommended for production use, as it may lead to security vulnerabilities. To avoid +this, set the following property to `true`: + +```properties +polaris.realm-context.require-header=true +``` + +This will cause Polaris to also return a `404 Not Found` response if the realm header is not present +in the request. + +### Metastore Configuration + +A metastore should be configured with an implementation that durably persists Polaris entities. By +default, Polaris uses an in-memory metastore. + +> [!IMPORTANT] +> The default in-memory metastore is not suitable for production use, as it will lose all data +> when the server is restarted; it is also unusable when multiple Polaris replicas are used. + +To enable a durable metastore, configure your system to use the Relational JDBC-backed metastore. +This implementation leverages Quarkus for datasource management and supports configuration through +environment variables or JVM -D flags at startup. For more information, refer to the [Quarkus configuration reference](https://quarkus.io/guides/config-reference#env-file). + +Configure the metastore by setting the following ENV variables: + +``` +POLARIS_PERSISTENCE_TYPE=relational-jdbc + +QUARKUS_DATASOURCE_DB_KIND=postgresql +QUARKUS_DATASOURCE_USERNAME= +QUARKUS_DATASOURCE_PASSWORD= +QUARKUS_DATASOURCE_JDBC_URL= +``` + + +The relational JDBC metastore is a Quarkus-managed datasource and only supports Postgres and H2 as of now. +Please refer to the documentation here: +[Configure data sources in Quarkus](https://quarkus.io/guides/datasource) + +> [!IMPORTANT] +> Be sure to secure your metastore backend since it will be storing sensitive data and catalog +> metadata. + +Note: Polaris will always create schema 'polaris_schema' during bootstrap under the configured database. + +### Bootstrapping + +Before using Polaris, you must **bootstrap** the metastore. This is a manual operation that must be +performed **only once** for each realm in order to prepare the metastore to integrate with Polaris. + +By default, when bootstrapping a new realm, Polaris will create randomised `CLIENT_ID` and +`CLIENT_SECRET` for the `root` principal and store their hashes in the metastore backend. + +Depending on your database, this may not be convenient as the generated credentials are not stored +in clear text in the database. + +In order to provide your own credentials for `root` principal (so you can request tokens via +`api/catalog/v1/oauth/tokens`), use the [Polaris Admin Tool]({{% ref "admin-tool" %}}) + +You can verify the setup by attempting a token issue for the `root` principal: + +```bash +curl -X POST http://localhost:8181/api/catalog/v1/oauth/tokens \ + -d "grant_type=client_credentials" \ + -d "client_id=my-client-id" \ + -d "client_secret=my-client-secret" \ + -d "scope=PRINCIPAL_ROLE:ALL" +``` + +Which should return an access token: + +```json +{ + "access_token": "...", + "token_type": "bearer", + "issued_token_type": "urn:ietf:params:oauth:token-type:access_token", + "expires_in": 3600 +} +``` + +If you used a non-default realm name, add the appropriate request header to the `curl` command, +otherwise Polaris will resolve the realm to the first one in the configuration +`polaris.realm-context.realms`. Here is an example to set realm header: + +```bash +curl -X POST http://localhost:8181/api/catalog/v1/oauth/tokens \ + -H "Polaris-Realm: my-realm" \ + -d "grant_type=client_credentials" \ + -d "client_id=my-client-id" \ + -d "client_secret=my-client-secret" \ + -d "scope=PRINCIPAL_ROLE:ALL" +``` + +### Disable FILE Storage Type +By default, Polaris allows using the local file system (`FILE`) for catalog storage. This is fine for testing, +but **not recommended for production**. To disable it, set the supported storage types like this: +```hocon +polaris.features."SUPPORTED_CATALOG_STORAGE_TYPES" = [ "S3", "Azure" ] +``` +Leave out `FILE` to prevent its use. Only include the storage types your setup needs. + +### Upgrade Considerations + +The [Polaris Evolution](../evolution) page discusses backward compatibility and +upgrade concerns. + diff --git a/site/content/releases/1.0.0/entities.md b/site/content/releases/1.0.0/entities.md new file mode 100644 index 0000000000..04d625bb94 --- /dev/null +++ b/site/content/releases/1.0.0/entities.md @@ -0,0 +1,95 @@ +--- +# +# Licensed to the Apache Software Foundation (ASF) under one +# or more contributor license agreements. See the NOTICE file +# distributed with this work for additional information +# regarding copyright ownership. The ASF licenses this file +# to you under the Apache License, Version 2.0 (the +# "License"); you may not use this file except in compliance +# with the License. You may obtain a copy of the License at +# +# http://www.apache.org/licenses/LICENSE-2.0 +# +# Unless required by applicable law or agreed to in writing, +# software distributed under the License is distributed on an +# "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY +# KIND, either express or implied. See the License for the +# specific language governing permissions and limitations +# under the License. +# +Title: Entities +type: docs +weight: 400 +--- + +This page documents various entities that can be managed in Apache Polaris (Incubating). + +## Catalog + +A catalog is a top-level entity in Polaris that may contain other entities like [namespaces](#namespace) and [tables](#table). These map directly to [Apache Iceberg catalogs](https://iceberg.apache.org/concepts/catalog/). + +For information on managing catalogs with the REST API or for more information on what data can be associated with a catalog, see [the API docs]({{% github-polaris "client/python/docs/CreateCatalogRequest.md" %}}). + +### Storage Type + +All catalogs in Polaris are associated with a _storage type_. Valid Storage Types are `S3`, `Azure`, and `GCS`. The `FILE` type is also additionally available for testing. Each of these types relates to a different storage provider where data within the catalog may reside. Depending on the storage type, various other configurations may be set for a catalog including credentials to be used when accessing data inside the catalog. + +For details on how to use Storage Types in the REST API, see [the API docs]({{% github-polaris "client/python/docs/StorageConfigInfo.md" %}}). + +For usage examples of storage types, see [docs]({{% ref "command-line-interface" %}}). + +## Namespace + +A namespace is a logical entity that resides within a [catalog](#catalog) and can contain other entities such as [tables](#table) or [views](#view). Some other systems may refer to namespaces as _schemas_ or _databases_. + +In Polaris, namespaces can be nested. For example, `a.b.c.d.e.f.g` is a valid namespace. `b` is said to reside within `a`, and so on. + +For information on managing namespaces with the REST API or for more information on what data can be associated with a namespace, see [the API docs]({{% github-polaris "client/python/docs/CreateNamespaceRequest.md" %}}). + +## Table + +Polaris tables are entities that map to [Apache Iceberg tables](https://iceberg.apache.org/docs/nightly/configuration/), [Delta tables](https://docs.databricks.com/aws/en/delta/table-properties), or [Hudi tables](https://hudi.apache.org/docs/next/configurations#TABLE_CONFIG). + +For information on managing tables with the REST API or for more information on what data can be associated with a table, see [the API docs]({{% github-polaris "client/python/docs/CreateTableRequest.md" %}}). + +## View + +Polaris views are entities that map to [Apache Iceberg views](https://iceberg.apache.org/view-spec/). + +For information on managing views with the REST API or for more information on what data can be associated with a view, see [the API docs]({{% github-polaris "client/python/docs/CreateViewRequest.md" %}}). + +## Principal + +Polaris principals are unique identities that can be used to represent users or services. Each principal may have one or more [principal roles](#principal-role) assigned to it for the purpose of accessing catalogs and the entities within them. + +For information on managing principals with the REST API or for more information on what data can be associated with a principal, see [the API docs]({{% github-polaris "client/python/docs/CreatePrincipalRequest.md" %}}). + +## Principal Role + +Polaris principal roles are labels that may be granted to [principals](#principal). Each principal may have one or more principal roles, and the same principal role may be granted to multiple principals. Principal roles may be assigned based on the persona or responsibilities of a given principal, or on how that principal will need to access different entities within Polaris. + +For information on managing principal roles with the REST API or for more information on what data can be associated with a principal role, see [the API docs]({{% github-polaris "client/python/docs/CreatePrincipalRoleRequest.md" %}}). + +## Catalog Role + +Polaris catalog roles are labels that may be granted to [catalogs](#catalog). Each catalog may have one or more catalog roles, and the same catalog role may be granted to multiple catalogs. Catalog roles may be assigned based on the nature of data that will reside in a catalog, or by the groups of users and services that might need to access that data. + +Each catalog role may have multiple [privileges](#privilege) granted to it, and each catalog role can be granted to one or more [principal roles](#principal-role). This is the mechanism by which principals are granted access to entities inside a catalog such as namespaces and tables. + +## Policy + +Polaris policy is a set of rules governing actions on specified resources under predefined conditions. Polaris support policy for Iceberg table compaction, snapshot expiry, row-level access control, and custom policy definitions. + +Policy can be applied at catalog level, namespace level, or table level. Policy inheritance can be achieved by attaching one to a higher-level scope, such as namespace or catalog. As a result, tables registered under those entities do not need to be declared individually for the same policy. If a table or a namespace requires a different policy, user can assign a different policy, hence overriding policy of the same type declared at the higher level entities. + +## Privilege + +Polaris privileges are granted to [catalog roles](#catalog-role) in order to grant principals with a given principal role some degree of access to catalogs with a given catalog role. When a privilege is granted to a catalog role, any principal roles granted that catalog role receive the privilege. In turn, any principals who are granted that principal role receive it. + +A privilege can be scoped to any entity inside a catalog, including the catalog itself. + +For a list of supported privileges for each privilege class, see the API docs: +* [Table Privileges]({{% github-polaris "client/python/docs/TablePrivilege.md" %}}) +* [View Privileges]({{% github-polaris "client/python/docs/ViewPrivilege.md" %}}) +* [Namespace Privileges]({{% github-polaris "client/python/docs/NamespacePrivilege.md" %}}) +* [Catalog Privileges]({{% github-polaris "client/python/docs/CatalogPrivilege.md" %}}) diff --git a/site/content/releases/1.0.0/evolution.md b/site/content/releases/1.0.0/evolution.md new file mode 100644 index 0000000000..ea29badc84 --- /dev/null +++ b/site/content/releases/1.0.0/evolution.md @@ -0,0 +1,115 @@ +--- +# +# Licensed to the Apache Software Foundation (ASF) under one +# or more contributor license agreements. See the NOTICE file +# distributed with this work for additional information +# regarding copyright ownership. The ASF licenses this file +# to you under the Apache License, Version 2.0 (the +# "License"); you may not use this file except in compliance +# with the License. You may obtain a copy of the License at +# +# http://www.apache.org/licenses/LICENSE-2.0 +# +# Unless required by applicable law or agreed to in writing, +# software distributed under the License is distributed on an +# "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY +# KIND, either express or implied. See the License for the +# specific language governing permissions and limitations +# under the License. +# +title: Polaris Evolution +type: docs +weight: 1000 +--- + +This page discusses what can be expected from Apache Polaris as the project evolves. + +## Using Polaris as a Catalog + +Polaris is primarily intended to be used as a Catalog of Tables and Views. As such, +it implements the Iceberg REST Catalog API and its own REST APIs. + +Revisions of the Iceberg REST Catalog API are controlled by the [Apache Iceberg](https://iceberg.apache.org/) +community. Polaris attempts to accurately implement this specification. Nonetheless, +optional REST Catalog features may or may not be supported immediately. In general, +there is no guarantee that Polaris releases always implement the latest version of +the Iceberg REST Catalog API. + +Any API under Polaris control that is not in an "experimental" or "beta" state +(e.g. the Management API) is maintained as a versioned REST API. New releases of Polaris +may include changes to the current version of the API. When that happens those changes +are intended to be compatible with prior versions of Polaris clients. Certain endpoints +and parameters may be deprecated. + +In case a major change is required to an API that cannot be implemented in a +backward-compatible way, new endpoints (URI paths) may be introduced. New URI "roots" may +be introduced too (e.g. `api/catalog/v2`). + +Note that those "v1", "v2", etc. URI path segments are not meant to be 1:1 with Polaris +releases or Polaris project version numbers (e.g. a "v2" path segment does not mean that +it is added in Polaris 2.0). + +Polaris servers will support deprecated API endpoints / parameters / versions / etc. +for some transition period to allow clients to migrate. + +### Managing Polaris Database + +Polaris stores its data in a database, which is sometimes referred to as "Metastore" or +"Persistence" in other docs. + +Each Polaris release may support multiple Persistence [implementations](../metastores), +for example, "EclipseLink" (deprecated) and "JDBC" (current). + +Each type of Persistence evolves individually. Within each Persistence type, Polaris +attempts to support rolling upgrades (both version X and X + 1 servers running at the +same time). + +However, migrating between different Persistence types is not supported in a rolling +upgrade manner (for example, migrating from "EclipseLink" to "JDBC"). Polaris provides +[tools](https://github.com/apache/polaris-tools/) for migrating between different +catalogs and those tools may be used to migrate between different Persistence types +as well. Service interruption (downtime) should be expected in those cases. + +## Using Polaris as a Build-Time Dependency + +Polaris produces several jars. These jars or custom builds of Polaris code may be used in +downstream projects according to the terms of the license included into Polaris distributions. + +The minimal version of the JRE required by Polaris code (compilation target) may be updated in +any release. Different Polaris jars may have different minimal JRE version requirements. + +Changes in Java class should be expected at any time regardless of the module name or +whether the class / method is `public` or not. + +This approach is not meant to discourage the use of Polaris code in downstream projects, but +to allow more flexibility in evolving the codebase to support new catalog-level features +and improve code efficiency. Maintainers of downstream projects are encouraged to join Polaris +mailing lists to monitor project changes, suggest improvements, and engage with the Polaris +community in case of specific compatibility concerns. + +## Semantic Versioning + +Polaris strives to follow [Semantic Versioning](https://semver.org/) conventions both with +respect to REST APIs (beta and experimental APIs excepted), [Polaris Policies](../policy/) +and user-facing [configuration](../configuration/). + +The following are some examples of Polaris approach to SemVer in REST APIs / configuration. +These examples are for illustration purposes and should not be considered to be +exhaustive. + +* Polaris implementing an optional Iceberg REST Catalog feature that was unimplemented +in the previous release is not considered a major change. + +* Supporting a new revision of the Iceberg REST Catalog spec in a backward-compatible way +is not considered a major change. Specifically, supporting new REST API prefixes (e.g. `v2`) +is not a major change because it does not affect older clients. + +* Changing the implementation of an Iceberg REST Catalog feature / endpoint in a non-backward +compatible way (e.g. removing or renaming a request parameter) is a major change. + +* Dropping support for a configuration property with the `polaris.` name prefix is a major change. + +* Dropping support for any previously defined [Policy](../policy/) type or property is a major change. + +* Upgrading Quarkus Runtime to its next major version is a major change (because +Quarkus-managed configuration may change). diff --git a/site/content/releases/1.0.0/generic-table.md b/site/content/releases/1.0.0/generic-table.md new file mode 100644 index 0000000000..2e0e3fe8e6 --- /dev/null +++ b/site/content/releases/1.0.0/generic-table.md @@ -0,0 +1,169 @@ +--- +# +# Licensed to the Apache Software Foundation (ASF) under one +# or more contributor license agreements. See the NOTICE file +# distributed with this work for additional information +# regarding copyright ownership. The ASF licenses this file +# to you under the Apache License, Version 2.0 (the +# "License"); you may not use this file except in compliance +# with the License. You may obtain a copy of the License at +# +# http://www.apache.org/licenses/LICENSE-2.0 +# +# Unless required by applicable law or agreed to in writing, +# software distributed under the License is distributed on an +# "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY +# KIND, either express or implied. See the License for the +# specific language governing permissions and limitations +# under the License. +# +title: Generic Table (Beta) +type: docs +weight: 435 +--- + +The Generic Table in Apache Polaris is designed to provide support for non-Iceberg tables across different table formats includes delta, csv etc. It currently provides the following capabilities: +- Create a generic table under a namespace +- Load a generic table +- Drop a generic table +- List all generic tables under a namespace + +**NOTE** The current generic table is in beta release. Please use it with caution and report any issue if encountered. + +## What is a Generic Table? + +A generic table in Polaris is an entity that defines the following fields: + +- **name** (required): A unique identifier for the table within a namespace +- **format** (required): The format for the generic table, i.e. "delta", "csv" +- **base-location** (optional): Table base location in URI format. For example: s3:///path/to/table + - The table base location is a location that includes all files for the table + - A table with multiple disjoint locations (i.e. containing files that are outside the configured base location) is not compliant with the current generic table support in Polaris. + - If no location is provided, clients or users are responsible for managing the location. +- **properties** (optional): Properties for the generic table passed on creation. + - Currently, there is no reserved property key defined. + - The property definition and interpretation is delegated to client or engine implementations. +- **doc** (optional): Comment or description for the table + +## Generic Table API Vs. Iceberg Table API + +Generic Table provides a different set of APIs to operate on the generic table entities while Iceberg APIs operates on +the Iceberg table entities. + +| Operations | **Iceberg Table API** | **Generic Table API** | +|--------------|---------------------------------------------------------------------------------------------------------------------------------------------------------------------|-------------------------------------------------------------------------------------------------------------------------------| +| Create Table | Create an Iceberg table | Create a generic table | +| Load Table | Load an Iceberg table. If the table to load is a generic table, you need to call the Generic Table loadTable API, otherwise a TableNotFoundException will be thrown | Load a generic table. Similarly, try to load an Iceberg table through Generic Table API will thrown a TableNotFoundException. | +| Drop Table | Drop an Iceberg table. Similar as load table, if the table to drop is a Generic table, a tableNotFoundException will be thrown. | Drop a generic table. Drop an Iceberg table through Generic table endpoint will thrown an TableNotFound Exception | +| List Table | List all Iceberg tables | List all generic tables | + +Note that generic table shares the same namespace with Iceberg tables, the table name has to be unique under the same namespace. Furthermore, since +there is currently no support for Update Generic Table, any update to the existing table requires a drop and re-create. + +## Working with Generic Table + +There are two ways to work with Polaris Generic Tables today: +1) Directly communicate with Polaris through REST API calls using tools such as `curl`. Details will be described in the later section. +2) Use the Spark client provided if you are working with Spark. Please refer to [Polaris Spark Client]({{% ref "polaris-spark-client" %}}) for detailed instructions. + +### Create a Generic Table + +To create a generic table, you need to provide the corresponding fields as described in [What is a Generic Table](#what-is-a-generic-table). + +The REST API for creating a generic Table is `POST /polaris/v1/{prefix}/namespaces/{namespace}/generic-tables`, and the +request body looks like the following: + +```json +{ + "name": "", + "format": "", + "base-location": "", + "doc": "", + "properties": { + "": "" + } +} +``` + +Here is an example to create a generic table with name `delta_table` and format as `delta` under a namespace `delta_ns` +for catalog `delta_catalog` using curl: + +```shell +curl -X POST http://localhost:8181/api/catalog/polaris/v1/delta_catalog/namespaces/delta_ns/generic-tables \ + -H "Content-Type: application/json" \ + -d '{ + "name": "delta_table", + "format": "delta", + "base-location": "s3:///path/to/table", + "doc": "delta table example", + "properties": { + "key1": "value1" + } + }' +``` + +### Load a Generic Table +The REST endpoint for load a generic table is `GET /polaris/v1/{prefix}/namespaces/{namespace}/generic-tables/{generic-table}`. + +Here is an example to load the table `delta_table` using curl: +```shell +curl -X GET http://localhost:8181/api/catalog/polaris/v1/delta_catalog/namespaces/delta_ns/generic-tables/delta_table +``` +And the response looks like the following: +```json +{ + "table": { + "name": "delta_table", + "format": "delta", + "base-location": "s3:///path/to/table", + "doc": "delta table example", + "properties": { + "key1": "value1" + } + } +} +``` + +### List Generic Tables +The REST endpoint for listing the generic tables under a given +namespace is `GET /polaris/v1/{prefix}/namespaces/{namespace}/generic-tables/`. + +Following curl command lists all tables under namespace delta_namespace: +```shell +curl -X GET http://localhost:8181/api/catalog/polaris/v1/delta_catalog/namespaces/delta_ns/generic-tables/ +``` +Example Response: +```json +{ + "identifiers": [ + { + "namespace": ["delta_ns"], + "name": "delta_table" + } + ], + "next-page-token": null +} +``` + +### Drop a Generic Table +The drop generic table REST endpoint is `DELETE /polaris/v1/{prefix}/namespaces/{namespace}/generic-tables/{generic-table}` + +The following curl call drops the table `delat_table`: +```shell +curl -X DELETE http://localhost:8181/api/catalog/polaris/v1/delta_catalog/namespaces/delta_ns/generic-tables/{generic-table} +``` + +### API Reference + +For the complete and up-to-date API specification, see the [Catalog API Spec](https://editor-next.swagger.io/?url=https://raw.githubusercontent.com/apache/polaris/refs/heads/main/spec/generated/bundled-polaris-catalog-service.yaml). + +## Limitations + +Current limitations of Generic Table support: +1) Limited spec information. Currently, there is no spec for information like Schema, Partition etc. +2) No commit coordination or update capability provided at the catalog service level. + +Therefore, the catalog itself is unaware of anything about the underlying table except some of the loosely defined metadata. +It is the responsibility of the engine (and plugins used by the engine) to determine exactly how loading or commiting data +should look like based on the metadata. For example, with the delta support, th delta log serialization, deserialization +and update all happens at client side. diff --git a/site/content/releases/1.0.0/getting-started/_index.md b/site/content/releases/1.0.0/getting-started/_index.md new file mode 100644 index 0000000000..d4f13e6f63 --- /dev/null +++ b/site/content/releases/1.0.0/getting-started/_index.md @@ -0,0 +1,25 @@ +--- +# +# Licensed to the Apache Software Foundation (ASF) under one +# or more contributor license agreements. See the NOTICE file +# distributed with this work for additional information +# regarding copyright ownership. The ASF licenses this file +# to you under the Apache License, Version 2.0 (the +# "License"); you may not use this file except in compliance +# with the License. You may obtain a copy of the License at +# +# http://www.apache.org/licenses/LICENSE-2.0 +# +# Unless required by applicable law or agreed to in writing, +# software distributed under the License is distributed on an +# "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY +# KIND, either express or implied. See the License for the +# specific language governing permissions and limitations +# under the License. +# +title: 'Getting Started' +type: docs +weight: 101 +build: + render: never +--- \ No newline at end of file diff --git a/site/content/releases/1.0.0/getting-started/deploying-polaris/_index.md b/site/content/releases/1.0.0/getting-started/deploying-polaris/_index.md new file mode 100644 index 0000000000..32fd5dafd6 --- /dev/null +++ b/site/content/releases/1.0.0/getting-started/deploying-polaris/_index.md @@ -0,0 +1,27 @@ +--- +# +# Licensed to the Apache Software Foundation (ASF) under one +# or more contributor license agreements. See the NOTICE file +# distributed with this work for additional information +# regarding copyright ownership. The ASF licenses this file +# to you under the Apache License, Version 2.0 (the +# "License"); you may not use this file except in compliance +# with the License. You may obtain a copy of the License at +# +# http://www.apache.org/licenses/LICENSE-2.0 +# +# Unless required by applicable law or agreed to in writing, +# software distributed under the License is distributed on an +# "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY +# KIND, either express or implied. See the License for the +# specific language governing permissions and limitations +# under the License. +# +Title: Deploying Polaris on Cloud Providers +type: docs +weight: 300 +--- + +We will now demonstrate how to deploy Polaris locally, as well as with all supported Cloud Providers: Amazon Web Services (AWS), Azure, and Google Cloud Platform (GCP). + +Locally, Polaris can be deployed using both Docker and local build. On the cloud, this tutorial will deploy Polaris using Docker only - but local builds can also be executed. \ No newline at end of file diff --git a/site/content/releases/1.0.0/getting-started/deploying-polaris/quickstart-deploy-aws.md b/site/content/releases/1.0.0/getting-started/deploying-polaris/quickstart-deploy-aws.md new file mode 100644 index 0000000000..fd95b72b0c --- /dev/null +++ b/site/content/releases/1.0.0/getting-started/deploying-polaris/quickstart-deploy-aws.md @@ -0,0 +1,57 @@ +--- +# +# Licensed to the Apache Software Foundation (ASF) under one +# or more contributor license agreements. See the NOTICE file +# distributed with this work for additional information +# regarding copyright ownership. The ASF licenses this file +# to you under the Apache License, Version 2.0 (the +# "License"); you may not use this file except in compliance +# with the License. You may obtain a copy of the License at +# +# http://www.apache.org/licenses/LICENSE-2.0 +# +# Unless required by applicable law or agreed to in writing, +# software distributed under the License is distributed on an +# "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY +# KIND, either express or implied. See the License for the +# specific language governing permissions and limitations +# under the License. +# +Title: Deploying Polaris on Amazon Web Services (AWS) +type: docs +weight: 310 +--- + +Build and launch Polaris using the AWS Startup Script at the location provided in the command below. This script will start an [Amazon RDS for PostgreSQL](https://aws.amazon.com/rds/postgresql/) instance, which will be used as the backend Postgres instance holding all Polaris data. +Additionally, Polaris will be bootstrapped to use this database and Docker containers will be spun up for Spark SQL and Trino. + +The requirements to run the script below are: +* There must be at least two subnets created in the VPC and region in which your EC2 instance reside. The span of subnets MUST include at least 2 availability zones (AZs) within the same region. +* Your EC2 instance must be enabled with [IMDSv1 or IMDSv2 with 2+ hop limit](https://docs.aws.amazon.com/AWSEC2/latest/UserGuide/configuring-IMDS-new-instances.html#configure-IMDS-new-instances-instance-settings). +* The AWS identity that you will use to run this script must have the following AWS permissions: + * "ec2:DescribeInstances" + * "rds:CreateDBInstance" + * "rds:DescribeDBInstances" + * "rds:CreateDBSubnetGroup" + * "sts:AssumeRole" on the same role as the Instance Profile role of the EC2 instance on which you are running this script. Additionally, you should ensure that the Instance Profile contains a trust policy that allows the role to trust itself to be assumed. + +```shell +chmod +x getting-started/assets/cloud_providers/deploy-aws.sh +export ASSETS_PATH=$(pwd)/getting-started/assets/ +export CLIENT_ID=root +export CLIENT_SECRET=s3cr3t +./getting-started/assets/cloud_providers/deploy-aws.sh +``` + +## Next Steps +Congrats, you now have a running instance of1 Polaris! For details on how to use Polaris, check out the [Using Polaris]({{% relref "../using-polaris.md" %}}) page. + +## Cleanup Instructions +To shut down the Polaris server, run the following commands: + +```shell +export ASSETS_PATH=$(pwd)/getting-started/assets/ +docker compose -p polaris -f getting-started/eclipselink/docker-compose.yml down +``` + +To deploy Polaris in a production setting, please review further recommendations at the [Configuring Polaris for Production]({{% relref "../../configuring-polaris-for-production" %}}) page. \ No newline at end of file diff --git a/site/content/releases/1.0.0/getting-started/deploying-polaris/quickstart-deploy-azure.md b/site/content/releases/1.0.0/getting-started/deploying-polaris/quickstart-deploy-azure.md new file mode 100644 index 0000000000..74df725db0 --- /dev/null +++ b/site/content/releases/1.0.0/getting-started/deploying-polaris/quickstart-deploy-azure.md @@ -0,0 +1,52 @@ +--- +# +# Licensed to the Apache Software Foundation (ASF) under one +# or more contributor license agreements. See the NOTICE file +# distributed with this work for additional information +# regarding copyright ownership. The ASF licenses this file +# to you under the Apache License, Version 2.0 (the +# "License"); you may not use this file except in compliance +# with the License. You may obtain a copy of the License at +# +# http://www.apache.org/licenses/LICENSE-2.0 +# +# Unless required by applicable law or agreed to in writing, +# software distributed under the License is distributed on an +# "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY +# KIND, either express or implied. See the License for the +# specific language governing permissions and limitations +# under the License. +# +Title: Deploying Polaris on Azure +type: docs +weight: 320 +--- + +Build and launch Polaris using the AWS Startup Script at the location provided in the command below. This script will start an [Azure Database for PostgreSQL - Flexible Server](https://learn.microsoft.com/en-us/azure/postgresql/flexible-server/overview) instance, which will be used as the backend Postgres instance holding all Polaris data. +Additionally, Polaris will be bootstrapped to use this database and Docker containers will be spun up for Spark SQL and Trino. + +The requirements to run the script below are: +* Install the AZ CLI, if it is not already installed on the Azure VM. Instructions to download the AZ CLI can be found [here](https://learn.microsoft.com/en-us/cli/azure/install-azure-cli). +* You must be logged into the AZ CLI. Please run `az account show` to ensure that you are logged in prior to running this script. +* Assign a System-Assigned Managed Identity to the Azure VM. + +```shell +chmod +x getting-started/assets/cloud_providers/deploy-azure.sh +export ASSETS_PATH=$(pwd)/getting-started/assets/ +export CLIENT_ID=root +export CLIENT_SECRET=s3cr3t +./getting-started/assets/cloud_providers/deploy-azure.sh +``` + +## Next Steps +Congrats, you now have a running instance of Polaris! For further information regarding how to use Polaris, check out the [Using Polaris]({{% relref "../using-polaris" %}}) page. + +## Cleanup Instructions +To shut down the Polaris server, run the following commands: + +```shell +export ASSETS_PATH=$(pwd)/getting-started/assets/ +docker compose -p polaris -f getting-started/eclipselink/docker-compose.yml down +``` + +To deploy Polaris in a production setting, please review further recommendations at the [Configuring Polaris for Production]({{% relref "../../configuring-polaris-for-production" %}}) page. \ No newline at end of file diff --git a/site/content/releases/1.0.0/getting-started/deploying-polaris/quickstart-deploy-gcp.md b/site/content/releases/1.0.0/getting-started/deploying-polaris/quickstart-deploy-gcp.md new file mode 100644 index 0000000000..9641ad7282 --- /dev/null +++ b/site/content/releases/1.0.0/getting-started/deploying-polaris/quickstart-deploy-gcp.md @@ -0,0 +1,52 @@ +--- +# +# Licensed to the Apache Software Foundation (ASF) under one +# or more contributor license agreements. See the NOTICE file +# distributed with this work for additional information +# regarding copyright ownership. The ASF licenses this file +# to you under the Apache License, Version 2.0 (the +# "License"); you may not use this file except in compliance +# with the License. You may obtain a copy of the License at +# +# http://www.apache.org/licenses/LICENSE-2.0 +# +# Unless required by applicable law or agreed to in writing, +# software distributed under the License is distributed on an +# "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY +# KIND, either express or implied. See the License for the +# specific language governing permissions and limitations +# under the License. +# +Title: Deploying Polaris on Google Cloud Platform (GCP) +type: docs +weight: 330 +--- + +Build and launch Polaris using the AWS Startup Script at the location provided in the command below. This script will start a [Cloud SQL for PostgreSQL](https://cloud.google.com/sql/docs/postgres) instance, which will be used as the backend Postgres instance holding all Polaris data. +Additionally, Polaris will be bootstrapped to use this database and Docker containers will be spun up for Spark SQL and Trino. + +The requirements to run the script below are: +* Install the `gcloud` CLI, if it is not already installed on the GCP VM. Instructions to download the `gcloud` CLI can be found [here](https://cloud.google.com/sdk/docs/install). +* Ensure the `Cloud SQL Admin API` has been enabled in your project and that your VM's Principal has access to the correct role: `roles/cloudsql.admin`. +* Ensure the VM's Principal has access to at least Read-only scope on Compute Engine: `compute.readonly`. + +```shell +chmod +x getting-started/assets/cloud_providers/deploy-gcp.sh +export ASSETS_PATH=$(pwd)/getting-started/assets/ +export CLIENT_ID=root +export CLIENT_SECRET=s3cr3t +./getting-started/assets/cloud_providers/deploy-gcp.sh +``` + +## Next Steps +Congrats, you now have a running instance of Polaris! For further information regarding how to use Polaris, check out the [Using Polaris]({{% relref "../using-polaris" %}}) page. + +## Cleanup Instructions +To shut down the Polaris server, run the following commands: + +```shell +export ASSETS_PATH=$(pwd)/getting-started/assets/ +docker compose -p polaris -f getting-started/eclipselink/docker-compose.yml down +``` + +To deploy Polaris in a production setting, please review further recommendations at the [Configuring Polaris for Production]({{% relref "../../configuring-polaris-for-production" %}}) page. \ No newline at end of file diff --git a/site/content/releases/1.0.0/getting-started/install-dependencies.md b/site/content/releases/1.0.0/getting-started/install-dependencies.md new file mode 100644 index 0000000000..7341118868 --- /dev/null +++ b/site/content/releases/1.0.0/getting-started/install-dependencies.md @@ -0,0 +1,118 @@ +--- +# +# Licensed to the Apache Software Foundation (ASF) under one +# or more contributor license agreements. See the NOTICE file +# distributed with this work for additional information +# regarding copyright ownership. The ASF licenses this file +# to you under the Apache License, Version 2.0 (the +# "License"); you may not use this file except in compliance +# with the License. You may obtain a copy of the License at +# +# http://www.apache.org/licenses/LICENSE-2.0 +# +# Unless required by applicable law or agreed to in writing, +# software distributed under the License is distributed on an +# "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY +# KIND, either express or implied. See the License for the +# specific language governing permissions and limitations +# under the License. +# +Title: Installing Dependencies +type: docs +weight: 100 +--- + +This guide serves as an introduction to several key entities that can be managed with Apache Polaris (Incubating), describes how to build and deploy Polaris locally, and finally includes examples of how to use Polaris with Apache Spark™. + +# Prerequisites + +This guide covers building Polaris, deploying it locally or via [Docker](https://www.docker.com/), and interacting with it using the command-line interface and [Apache Spark](https://spark.apache.org/). Before proceeding with Polaris, be sure to satisfy the relevant prerequisites listed here. + +## Git + +To get the latest Polaris code, you'll need to clone the repository using [git](https://git-scm.com/). You can install git using [homebrew](https://brew.sh/) on MacOS: + +```shell +brew install git +``` + +Please follow instructions from the [Git Documentation](https://git-scm.com/book/en/v2/Getting-Started-Installing-Git) for instructions on installing Git on other platforms. + +Then, use git to clone the Polaris repo: + +```shell +cd ~ +git clone https://github.com/apache/polaris.git +``` + +## Docker + +It is recommended to deploy Polaris inside [Docker](https://www.docker.com/) for the Quickstart workflow. Instructions for deploying the Quickstart workflow on the supported Cloud Providers (AWS, Azure, GCP) will be provided only with Docker. However, non-Docker deployment instructions for local deployments can also be followed on Cloud Providers. + +Instructions to install Docker can be found on the [Docker website](https://docs.docker.com/engine/install/). Ensure that Docker and the Docker Compose plugin are both installed. + +### Docker on MacOS +Docker can be installed using [homebrew](https://brew.sh/): + +```shell +brew install --cask docker +``` + +There could be a [Docker permission issues](https://github.com/apache/polaris/pull/971) related to seccomp configuration. To resolve these issues, set the `seccomp` profile to "unconfined" when running a container. For example: + +```shell +docker run --security-opt seccomp=unconfined apache/polaris:latest +``` + +Note: Setting the seccomp profile to "unconfined" disables the default system call filtering, which may pose security risks. Use this configuration with caution, especially in production environments. + +### Docker on Amazon Linux +Docker can be installed using a modification to the CentOS instructions. For example: + +```shell +sudo dnf update -y +# Remove old version +sudo dnf remove -y docker docker-client docker-client-latest docker-common docker-latest docker-latest-logrotate docker-logrotate docker-engine +# Install dnf plugin +sudo dnf -y install dnf-plugins-core +# Add CentOS repository +sudo dnf config-manager --add-repo https://download.docker.com/linux/centos/docker-ce.repo +# Adjust release server version in the path as it will not match with Amazon Linux 2023 +sudo sed -i 's/$releasever/9/g' /etc/yum.repos.d/docker-ce.repo +# Install as usual +sudo dnf -y install docker-ce docker-ce-cli containerd.io docker-buildx-plugin docker-compose-plugin +``` + +### Confirm Docker Installation + +Once installed, make sure that both Docker and the Docker Compose plugin are installed: + +```shell +docker version +docker compose version +``` + +Also make sure Docker is running and is able to run a sample Docker container: + +```shell +docker run hello-world +``` + +## Java + +If you plan to build Polaris from source yourself or using this tutorial's instructions on a Cloud Provider, you will need to satisfy a few prerequisites first. + +Polaris is built using [gradle](https://gradle.org/) and is compatible with Java 21. We recommend the use of [jenv](https://www.jenv.be/) to manage multiple Java versions. For example, to install Java 21 via [homebrew](https://brew.sh/) and configure it with jenv: + +```shell +cd ~/polaris +brew install openjdk@21 jenv +jenv add $(brew --prefix openjdk@21) +jenv local 21 +``` + +Ensure that `java --version` and `javac` both return non-zero responses. + +## jq + +Most Polaris Quickstart scripts require `jq`. Follow the instructions from the [jq](https://jqlang.org/download/) website to download this tool. \ No newline at end of file diff --git a/site/content/releases/1.0.0/getting-started/quickstart.md b/site/content/releases/1.0.0/getting-started/quickstart.md new file mode 100644 index 0000000000..a9fd43f906 --- /dev/null +++ b/site/content/releases/1.0.0/getting-started/quickstart.md @@ -0,0 +1,116 @@ +--- +# +# Licensed to the Apache Software Foundation (ASF) under one +# or more contributor license agreements. See the NOTICE file +# distributed with this work for additional information +# regarding copyright ownership. The ASF licenses this file +# to you under the Apache License, Version 2.0 (the +# "License"); you may not use this file except in compliance +# with the License. You may obtain a copy of the License at +# +# http://www.apache.org/licenses/LICENSE-2.0 +# +# Unless required by applicable law or agreed to in writing, +# software distributed under the License is distributed on an +# "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY +# KIND, either express or implied. See the License for the +# specific language governing permissions and limitations +# under the License. +# +Title: Quickstart +type: docs +weight: 200 +--- + +Polaris can be deployed via a docker image or as a standalone process. Before starting, be sure that you've satisfied the relevant prerequisites detailed in the previous page. + +## Common Setup +Before running Polaris, ensure you have completed the following setup steps: + +1. **Build Polaris** +```shell +cd ~/polaris +./gradlew \ + :polaris-server:assemble \ + :polaris-server:quarkusAppPartsBuild \ + :polaris-admin:assemble --rerun \ + -Dquarkus.container-image.tag=postgres-latest \ + -Dquarkus.container-image.build=true +``` +- **For standalone**: Omit the `-Dquarkus.container-image.tag` and `-Dquarkus.container-image.build` options if you do not need to build a Docker image. + +## Running Polaris with Docker + +To start using Polaris in Docker and launch Polaris, which is packaged with a Postgres instance, Apache Spark, and Trino. + +```shell +export ASSETS_PATH=$(pwd)/getting-started/assets/ +export QUARKUS_DATASOURCE_JDBC_URL=jdbc:postgresql://postgres:5432/POLARIS +export QUARKUS_DATASOURCE_USERNAME=postgres +export QUARKUS_DATASOURCE_PASSWORD=postgres +export CLIENT_ID=root +export CLIENT_SECRET=s3cr3t +docker compose -p polaris -f getting-started/assets/postgres/docker-compose-postgres.yml \ + -f getting-started/jdbc/docker-compose-bootstrap-db.yml \ + -f getting-started/jdbc/docker-compose.yml up -d +``` + +You should see output for some time as Polaris, Spark, and Trino build and start up. Eventually, you won’t see any more logs and see some logs relating to Spark, resembling the following: + +``` +spark-sql-1 | Spark Web UI available at http://8bc4de8ed854:4040 +spark-sql-1 | Spark master: local[*], Application Id: local-1743745174604 +spark-sql-1 | 25/04/04 05:39:38 WARN SparkSQLCLIDriver: WARNING: Directory for Hive history file: /home/spark does not exist. History will not be available during this session. +spark-sql-1 | 25/04/04 05:39:39 WARN RESTSessionCatalog: Iceberg REST client is missing the OAuth2 server URI configuration and defaults to http://polaris:8181/api/catalogv1/oauth/tokens. This automatic fallback will be removed in a future Iceberg release.It is recommended to configure the OAuth2 endpoint using the 'oauth2-server-uri' property to be prepared. This warning will disappear if the OAuth2 endpoint is explicitly configured. See https://github.com/apache/iceberg/issues/10537 +``` + +The Docker image pre-configures a sample catalog called `quickstart_catalog` that uses a local file system. + +## Running Polaris as a Standalone Process + +You can also start Polaris through Gradle (packaged within the Polaris repository): + +1. **Start the Server** + +Run the following command to start Polaris: + +```shell +./gradlew run +``` + +You should see output for some time as Polaris builds and starts up. Eventually, you won’t see any more logs and should see messages that resemble the following: + +``` +INFO [io.quarkus] [,] [,,,] (Quarkus Main Thread) polaris-runtime-service on JVM (powered by Quarkus ) started in 2.656s. Listening on: http://localhost:8181. Management interface listening on http://0.0.0.0:8182. +INFO [io.quarkus] [,] [,,,] (Quarkus Main Thread) Profile prod activated. Live Coding activated. +INFO [io.quarkus] [,] [,,,] (Quarkus Main Thread) Installed features: [...] +``` + +At this point, Polaris is running. + +When using a Gradle-launched Polaris instance in this tutorial, we'll launch an instance of Polaris that stores entities only in-memory. This means that any entities that you define will be destroyed when Polaris is shut down. +For more information on how to configure Polaris for production usage, see the [docs]({{% relref "../configuring-polaris-for-production" %}}). + +When Polaris is run using the `./gradlew run` command, the root principal credentials are `root` and `secret` for the `CLIENT_ID` and `CLIENT_SECRET`, respectively. + +### Installing Apache Spark and Trino Locally for Testing + +#### Apache Spark + +If you want to connect to Polaris with [Apache Spark](https://spark.apache.org/), you'll need to start by cloning Spark. As in the [prerequisites]({{% ref "install-dependencies#git" %}}), make sure [git](https://git-scm.com/) is installed first. + +Then, clone Spark and check out a versioned branch. This guide uses [Spark 3.5](https://spark.apache.org/releases/spark-release-3-5-0.html). + +```shell +git clone --branch branch-3.5 https://github.com/apache/spark.git ~/spark +``` + +#### Trino +If you want to connect to Polaris with [Trino](https://trino.io/), it is recommended to set up a test instance of Trino using Docker. As in the [prerequisites]({{% ref "install-dependencies#docker" %}}), make sure [Docker](https://www.docker.com/) is installed first + +```shell +docker run --name trino -d -p 8080:8080 trinodb/trino +``` + +## Next Steps +Congrats, you now have a running instance of Polaris! For further information regarding how to use Polaris, check out the [Using Polaris]({{% ref "using-polaris" %}}) page. \ No newline at end of file diff --git a/site/content/releases/1.0.0/getting-started/using-polaris.md b/site/content/releases/1.0.0/getting-started/using-polaris.md new file mode 100644 index 0000000000..35f0bae336 --- /dev/null +++ b/site/content/releases/1.0.0/getting-started/using-polaris.md @@ -0,0 +1,315 @@ +--- +# +# Licensed to the Apache Software Foundation (ASF) under one +# or more contributor license agreements. See the NOTICE file +# distributed with this work for additional information +# regarding copyright ownership. The ASF licenses this file +# to you under the Apache License, Version 2.0 (the +# "License"); you may not use this file except in compliance +# with the License. You may obtain a copy of the License at +# +# http://www.apache.org/licenses/LICENSE-2.0 +# +# Unless required by applicable law or agreed to in writing, +# software distributed under the License is distributed on an +# "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY +# KIND, either express or implied. See the License for the +# specific language governing permissions and limitations +# under the License. +# +Title: Using Polaris +type: docs +weight: 400 +--- + +## Setup + +Ensure your `CLIENT_ID` & `CLIENT_SECRET` variables are already defined, as they were required for starting the Polaris server earlier. + +```shell +export CLIENT_ID=YOUR_CLIENT_ID +export CLIENT_SECRET=YOUR_CLIENT_SECRET +``` + +## Defining a Catalog + +In Polaris, the [catalog]({{% relref "../entities#catalog" %}}) is the top-level entity that objects like [tables]({{% relref "../entities#table" %}}) and [views]({{% relref "../entities#view" %}}) are organized under. With a Polaris service running, you can create a catalog like so: + +```shell +cd ~/polaris + +./polaris \ + --client-id ${CLIENT_ID} \ + --client-secret ${CLIENT_SECRET} \ + catalogs \ + create \ + --storage-type s3 \ + --default-base-location ${DEFAULT_BASE_LOCATION} \ + --role-arn ${ROLE_ARN} \ + quickstart_catalog +``` + +This will create a new catalog called **quickstart_catalog**. If you are using one of the Getting Started locally-built Docker images, we have already created a catalog named `quickstart_catalog` for you. + +The `DEFAULT_BASE_LOCATION` you provide will be the default location that objects in this catalog should be stored in, and the `ROLE_ARN` you provide should be a [Role ARN](https://docs.aws.amazon.com/IAM/latest/UserGuide/reference-arns.html) with access to read and write data in that location. These credentials will be provided to engines reading data from the catalog once they have authenticated with Polaris using credentials that have access to those resources. + +If you’re using a storage type other than S3, such as Azure, you’ll provide a different type of credential than a Role ARN. For more details on supported storage types, see the [docs]({{% relref "../entities#storage-type" %}}). + +Additionally, if Polaris is running somewhere other than `localhost:8181`, you can specify the correct hostname and port by providing `--host` and `--port` flags. For the full set of options supported by the CLI, please refer to the [docs]({{% relref "../command-line-interface" %}}). + + +### Creating a Principal and Assigning it Privileges + +With a catalog created, we can create a [principal]({{% relref "../entities#principal" %}}) that has access to manage that catalog. For details on how to configure the Polaris CLI, see [the section above](#defining-a-catalog) or refer to the [docs]({{% relref "../command-line-interface" %}}). + +```shell +./polaris \ + --client-id ${CLIENT_ID} \ + --client-secret ${CLIENT_SECRET} \ + principals \ + create \ + quickstart_user + +./polaris \ + --client-id ${CLIENT_ID} \ + --client-secret ${CLIENT_SECRET} \ + principal-roles \ + create \ + quickstart_user_role + +./polaris \ + --client-id ${CLIENT_ID} \ + --client-secret ${CLIENT_SECRET} \ + catalog-roles \ + create \ + --catalog quickstart_catalog \ + quickstart_catalog_role +``` + +Be sure to provide the necessary credentials, hostname, and port as before. + +When the `principals create` command completes successfully, it will return the credentials for this new principal. Export them for future use. For example: + +```shell +./polaris ... principals create example +{"clientId": "XXXX", "clientSecret": "YYYY"} +export USER_CLIENT_ID=XXXX +export USER_CLIENT_SECRET=YYYY +``` + +Now, we grant the principal the [principal role]({{% relref "../entities#principal-role" %}}) we created, and grant the [catalog role]({{% relref "../entities#catalog-role" %}}) the principal role we created. For more information on these entities, please refer to the linked documentation. + +```shell +./polaris \ + --client-id ${CLIENT_ID} \ + --client-secret ${CLIENT_SECRET} \ + principal-roles \ + grant \ + --principal quickstart_user \ + quickstart_user_role + +./polaris \ + --client-id ${CLIENT_ID} \ + --client-secret ${CLIENT_SECRET} \ + catalog-roles \ + grant \ + --catalog quickstart_catalog \ + --principal-role quickstart_user_role \ + quickstart_catalog_role +``` + +Now, we’ve linked our principal to the catalog via roles like so: + +![Principal to Catalog](/img/quickstart/privilege-illustration-1.png "Principal to Catalog") + +In order to give this principal the ability to interact with the catalog, we must assign some [privileges]({{% relref "../entities#privilege" %}}). For the time being, we will give this principal the ability to fully manage content in our new catalog. We can do this with the CLI like so: + +```shell +./polaris \ + --client-id ${CLIENT_ID} \ + --client-secret ${CLIENT_SECRET} \ + privileges \ + catalog \ + grant \ + --catalog quickstart_catalog \ + --catalog-role quickstart_catalog_role \ + CATALOG_MANAGE_CONTENT +``` + +This grants the [catalog privileges]({{% relref "../entities#privilege" %}}) `CATALOG_MANAGE_CONTENT` to our catalog role, linking everything together like so: + +![Principal to Catalog with Catalog Role](/img/quickstart/privilege-illustration-2.png "Principal to Catalog with Catalog Role") + +`CATALOG_MANAGE_CONTENT` has create/list/read/write privileges on all entities within the catalog. The same privilege could be granted to a namespace, in which case the principal could create/list/read/write any entity under that namespace. + +## Using Iceberg & Polaris + +At this point, we’ve created a principal and granted it the ability to manage a catalog. We can now use an external engine to assume that principal, access our catalog, and store data in that catalog using [Apache Iceberg](https://iceberg.apache.org/). Polaris is compatible with any [Apache Iceberg](https://iceberg.apache.org/) client that supports the REST API. Depending on the client you plan to use, refer to the respective examples below. + +### Connecting with Spark + +#### Using a Local Build of Spark + +To use a Polaris-managed catalog in [Apache Spark](https://spark.apache.org/), we can configure Spark to use the Iceberg catalog REST API. + +This guide uses [Apache Spark 3.5](https://spark.apache.org/releases/spark-release-3-5-0.html), but be sure to find [the appropriate iceberg-spark package for your Spark version](https://mvnrepository.com/artifact/org.apache.iceberg/iceberg-spark). From a local Spark clone on the `branch-3.5` branch we can run the following: + +_Note: the credentials provided here are those for our principal, not the root credentials._ + +```shell +bin/spark-sql \ +--packages org.apache.iceberg:iceberg-spark-runtime-3.5_2.12:1.9.0,org.apache.iceberg:iceberg-aws-bundle:1.9.0 \ +--conf spark.sql.extensions=org.apache.iceberg.spark.extensions.IcebergSparkSessionExtensions \ +--conf spark.sql.catalog.quickstart_catalog.warehouse=quickstart_catalog \ +--conf spark.sql.catalog.quickstart_catalog.header.X-Iceberg-Access-Delegation=vended-credentials \ +--conf spark.sql.catalog.quickstart_catalog=org.apache.iceberg.spark.SparkCatalog \ +--conf spark.sql.catalog.quickstart_catalog.catalog-impl=org.apache.iceberg.rest.RESTCatalog \ +--conf spark.sql.catalog.quickstart_catalog.uri=http://localhost:8181/api/catalog \ +--conf spark.sql.catalog.quickstart_catalog.credential='${USER_CLIENT_ID}:${USER_CLIENT_SECRET}' \ +--conf spark.sql.catalog.quickstart_catalog.scope='PRINCIPAL_ROLE:ALL' \ +--conf spark.sql.catalog.quickstart_catalog.token-refresh-enabled=true \ +--conf spark.sql.catalog.quickstart_catalog.client.region=us-west-2 +``` + +Similar to the CLI commands above, this configures Spark to use the Polaris running at `localhost:8181`. If your Polaris server is running elsewhere, but sure to update the configuration appropriately. + +Finally, note that we include the `iceberg-aws-bundle` package here. If your table is using a different filesystem, be sure to include the appropriate dependency. + +#### Using Spark SQL from a Docker container + +Refresh the Docker container with the user's credentials: +```shell +docker compose -p polaris -f getting-started/eclipselink/docker-compose.yml stop spark-sql +docker compose -p polaris -f getting-started/eclipselink/docker-compose.yml rm -f spark-sql +docker compose -p polaris -f getting-started/eclipselink/docker-compose.yml up -d --no-deps spark-sql +``` + +Attach to the running spark-sql container: + +```shell +docker attach $(docker ps -q --filter name=spark-sql) +``` + +#### Sample Commands + +Once the Spark session starts, we can create a namespace and table within the catalog: + +```sql +USE quickstart_catalog; +CREATE NAMESPACE IF NOT EXISTS quickstart_namespace; +CREATE NAMESPACE IF NOT EXISTS quickstart_namespace.schema; +USE NAMESPACE quickstart_namespace.schema; +CREATE TABLE IF NOT EXISTS quickstart_table (id BIGINT, data STRING) USING ICEBERG; +``` + +We can now use this table like any other: + +``` +INSERT INTO quickstart_table VALUES (1, 'some data'); +SELECT * FROM quickstart_table; +. . . ++---+---------+ +|id |data | ++---+---------+ +|1 |some data| ++---+---------+ +``` + +If at any time access is revoked... + +```shell +./polaris \ + --client-id ${CLIENT_ID} \ + --client-secret ${CLIENT_SECRET} \ + privileges \ + catalog \ + revoke \ + --catalog quickstart_catalog \ + --catalog-role quickstart_catalog_role \ + CATALOG_MANAGE_CONTENT +``` + +Spark will lose access to the table: + +``` +INSERT INTO quickstart_table VALUES (1, 'some data'); + +org.apache.iceberg.exceptions.ForbiddenException: Forbidden: Principal 'quickstart_user' with activated PrincipalRoles '[]' and activated grants via '[quickstart_catalog_role, quickstart_user_role]' is not authorized for op LOAD_TABLE_WITH_READ_DELEGATION +``` + +### Connecting with Trino + +Refresh the Docker container with the user's credentials: + +```shell +docker compose -p polaris -f getting-started/eclipselink/docker-compose.yml stop trino +docker compose -p polaris -f getting-started/eclipselink/docker-compose.yml rm -f trino +docker compose -p polaris -f getting-started/eclipselink/docker-compose.yml up -d --no-deps trino +``` + +Attach to the running Trino container: + +```shell +docker exec -it $(docker ps -q --filter name=trino) trino +``` + +You may not see Trino's prompt immediately, type ENTER to see it. A few commands that you can try: + +```sql +SHOW CATALOGS; +SHOW SCHEMAS FROM iceberg; +CREATE SCHEMA iceberg.quickstart_schema; +CREATE TABLE iceberg.quickstart_schema.quickstart_table AS SELECT 1 x; +SELECT * FROM iceberg.quickstart_schema.quickstart_table; +``` + +If at any time access is revoked... + +```shell +./polaris \ + --client-id ${CLIENT_ID} \ + --client-secret ${CLIENT_SECRET} \ + privileges \ + catalog \ + revoke \ + --catalog quickstart_catalog \ + --catalog-role quickstart_catalog_role \ + CATALOG_MANAGE_CONTENT +``` + +Trino will lose access to the table: + +```sql +SELECT * FROM iceberg.quickstart_schema.quickstart_table; + +org.apache.iceberg.exceptions.ForbiddenException: Forbidden: Principal 'quickstart_user' with activated PrincipalRoles '[]' and activated grants via '[quickstart_catalog_role, quickstart_user_role]' is not authorized for op LOAD_TABLE_WITH_READ_DELEGATION +``` + +### Connecting Using REST APIs + +To access Polaris from the host machine, first request an access token: + +```shell +export POLARIS_TOKEN=$(curl -s http://polaris:8181/api/catalog/v1/oauth/tokens \ + --resolve polaris:8181:127.0.0.1 \ + --user ${CLIENT_ID}:${CLIENT_SECRET} \ + -d 'grant_type=client_credentials' \ + -d 'scope=PRINCIPAL_ROLE:ALL' | jq -r .access_token) +``` + +Then, use the access token in the Authorization header when accessing Polaris: + +```shell +curl -v http://127.0.0.1:8181/api/management/v1/principal-roles -H "Authorization: Bearer $POLARIS_TOKEN" +curl -v http://127.0.0.1:8181/api/management/v1/catalogs/quickstart_catalog -H "Authorization: Bearer $POLARIS_TOKEN" +``` + +## Next Steps +* Visit [Configuring Polaris for Production]({{% relref "../configuring-polaris-for-production" %}}). +* A Getting Started experience for using Spark with Jupyter Notebooks is documented [here](https://github.com/apache/polaris/blob/main/getting-started/spark/README.md). +* To shut down a locally-deployed Polaris server and clean up all related Docker containers, run the command listed below. Cloud Deployments have their respective termination commands on their Deployment page, while Polaris running on Gradle will terminate when the Gradle process terminates. +```shell +docker compose -p polaris -f getting-started/assets/postgres/docker-compose-postgres.yml -f getting-started/jdbc/docker-compose-bootstrap-db.yml -f getting-started/jdbc/docker-compose.yml down +``` + + diff --git a/site/content/releases/1.0.0/metastores.md b/site/content/releases/1.0.0/metastores.md new file mode 100644 index 0000000000..4810b124a0 --- /dev/null +++ b/site/content/releases/1.0.0/metastores.md @@ -0,0 +1,151 @@ +--- +# +# Licensed to the Apache Software Foundation (ASF) under one +# or more contributor license agreements. See the NOTICE file +# distributed with this work for additional information +# regarding copyright ownership. The ASF licenses this file +# to you under the Apache License, Version 2.0 (the +# "License"); you may not use this file except in compliance +# with the License. You may obtain a copy of the License at +# +# http://www.apache.org/licenses/LICENSE-2.0 +# +# Unless required by applicable law or agreed to in writing, +# software distributed under the License is distributed on an +# "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY +# KIND, either express or implied. See the License for the +# specific language governing permissions and limitations +# under the License. +# +title: Metastores +type: docs +weight: 700 +--- + +This page explains how to configure and use Polaris metastores with either the recommended Relational JDBC or the +deprecated EclipseLink persistence backends. + +## Relational JDBC +This implementation leverages Quarkus for datasource management and supports configuration through +environment variables or JVM -D flags at startup. For more information, refer to the [Quarkus configuration reference](https://quarkus.io/guides/config-reference#env-file). + + +``` +POLARIS_PERSISTENCE_TYPE=relational-jdbc + +QUARKUS_DATASOURCE_DB_KIND=postgresql +QUARKUS_DATASOURCE_USERNAME= +QUARKUS_DATASOURCE_PASSWORD= +QUARKUS_DATASOURCE_JDBC_URL= +``` + +The Relational JDBC metastore currently relies on a Quarkus-managed datasource and supports only PostgresSQL and H2 databases. This limitation is similar to that of EclipseLink, primarily due to underlying schema differences. At this time, official documentation is provided exclusively for usage with PostgreSQL. +Please refer to the documentation here: +[Configure data sources in Quarkus](https://quarkus.io/guides/datasource) + +Additionally the retries can be configured via `polaris.persistence.relational.jdbc.*` properties please ref [configuration](./configuration.md) + +## EclipseLink (Deprecated) +> [!IMPORTANT] Eclipse link is deprecated, its recommend to use Relational JDBC as persistence instead. + +Polaris includes EclipseLink plugin by default with PostgresSQL driver. + +Configure the `polaris.persistence` section in your Polaris configuration file +(`application.properties`) as follows: + +``` +polaris.persistence.type=eclipse-link +polaris.persistence.eclipselink.configuration-file=/path/to/persistence.xml +polaris.persistence.eclipselink.persistence-unit=polaris +``` + +Alternatively, configuration can also be done with environment variables or system properties. Refer +to the [Quarkus Configuration Reference] for more information. + +The `configuration-file` option must point to an [EclipseLink configuration file]. This file, named +`persistence.xml`, is used to set up the database connection properties, which can differ depending +on the type of database and its configuration. + +> Note: You have to locate the `persistence.xml` at least two folders down to the root folder, e.g. `/deployments/config/persistence.xml` is OK, whereas `/deployments/persistence.xml` will cause an infinity loop. +[Quarkus Configuration Reference]: https://quarkus.io/guides/config-reference +[EclipseLink configuration file]: https://eclipse.dev/eclipselink/documentation/4.0/solutions/solutions.html#TESTINGJPA002 + +Polaris creates and connects to a separate database for each realm. Specifically, the `{realm}` placeholder in `jakarta.persistence.jdbc.url` is substituted with the actual realm name, allowing the Polaris server to connect to different databases based on the realm. + +> Note: some database systems such as Postgres don't create databases automatically. Database admins need to create them manually before running Polaris server. + +A single `persistence.xml` can describe multiple [persistence units](https://eclipse.dev/eclipselink/documentation/4.0/concepts/concepts.html#APPDEV001). For example, with both a `polaris-dev` and `polaris` persistence unit defined, you could use a single `persistence.xml` to easily switch between development and production databases. Use the `persistence-unit` option in the Polaris server configuration to easily switch between persistence units. + +### Using H2 + +> [!IMPORTANT] H2 is an in-memory database and is not suitable for production! + +The default [persistence.xml] in Polaris is already configured for H2, but you can easily customize +your H2 configuration using the persistence unit template below: + +[persistence.xml]: https://github.com/apache/polaris/blob/main/extension/persistence/eclipselink/src/main/resources/META-INF/persistence.xml + +```xml + + org.eclipse.persistence.jpa.PersistenceProvider + org.apache.polaris.extension.persistence.impl.eclipselink.models.ModelEntity + org.apache.polaris.extension.persistence.impl.eclipselink.models.ModelEntityActive + org.apache.polaris.extension.persistence.impl.eclipselink.models.ModelEntityChangeTracking + org.apache.polaris.extension.persistence.impl.eclipselink.models.ModelEntityDropped + org.apache.polaris.extension.persistence.impl.eclipselink.models.ModelGrantRecord + org.apache.polaris.extension.persistence.impl.eclipselink.models.ModelPrincipalSecrets + org.apache.polaris.extension.persistence.impl.eclipselink.models.ModelSequenceId + NONE + + + + + + + +``` + +To build Polaris with the necessary H2 dependency and start the Polaris service, run the following: + +```shell +./gradlew \ + :polaris-server:assemble \ + :polaris-server:quarkusAppPartsBuild --rerun \ + -PeclipseLinkDeps=com.h2database:h2:2.3.232 +java -Dpolaris.persistence.type=eclipse-link \ + -Dpolaris.persistence.eclipselink.configuration-file=/path/to/persistence.xml \ + -Dpolaris.persistence.eclipselink.persistence-unit=polaris \ + -jar runtime/server/build/quarkus-app/quarkus-run.jar +``` + +### Using Postgres + +PostgreSQL is included by default in the Polaris server distribution. + +The following shows a sample configuration for integrating Polaris with Postgres. + +```xml + + org.eclipse.persistence.jpa.PersistenceProvider + org.apache.polaris.extension.persistence.impl.eclipselink.models.ModelEntity + org.apache.polaris.extension.persistence.impl.eclipselink.models.ModelEntityActive + org.apache.polaris.extension.persistence.impl.eclipselink.models.ModelEntityChangeTracking + org.apache.polaris.extension.persistence.impl.eclipselink.models.ModelEntityDropped + org.apache.polaris.extension.persistence.impl.eclipselink.models.ModelGrantRecord + org.apache.polaris.extension.persistence.impl.eclipselink.models.ModelPrincipalSecrets + org.apache.polaris.extension.persistence.impl.eclipselink.models.ModelSequenceId + NONE + + + + + + + + + + +``` + diff --git a/site/content/releases/1.0.0/polaris-catalog-service.md b/site/content/releases/1.0.0/polaris-catalog-service.md new file mode 100644 index 0000000000..02fed63f46 --- /dev/null +++ b/site/content/releases/1.0.0/polaris-catalog-service.md @@ -0,0 +1,26 @@ +--- +# +# Licensed to the Apache Software Foundation (ASF) under one +# or more contributor license agreements. See the NOTICE file +# distributed with this work for additional information +# regarding copyright ownership. The ASF licenses this file +# to you under the Apache License, Version 2.0 (the +# "License"); you may not use this file except in compliance +# with the License. You may obtain a copy of the License at +# +# http://www.apache.org/licenses/LICENSE-2.0 +# +# Unless required by applicable law or agreed to in writing, +# software distributed under the License is distributed on an +# "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY +# KIND, either express or implied. See the License for the +# specific language governing permissions and limitations +# under the License. +# +linkTitle: 'Catalog API Spec' +weight: 900 +params: + show_page_toc: false +--- + +{{< redoc-polaris "generated/bundled-polaris-catalog-service.yaml" >}} diff --git a/site/content/releases/1.0.0/polaris-management-service.md b/site/content/releases/1.0.0/polaris-management-service.md new file mode 100644 index 0000000000..0b66b9daa4 --- /dev/null +++ b/site/content/releases/1.0.0/polaris-management-service.md @@ -0,0 +1,27 @@ +--- +# +# Licensed to the Apache Software Foundation (ASF) under one +# or more contributor license agreements. See the NOTICE file +# distributed with this work for additional information +# regarding copyright ownership. The ASF licenses this file +# to you under the Apache License, Version 2.0 (the +# "License"); you may not use this file except in compliance +# with the License. You may obtain a copy of the License at +# +# http://www.apache.org/licenses/LICENSE-2.0 +# +# Unless required by applicable law or agreed to in writing, +# software distributed under the License is distributed on an +# "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY +# KIND, either express or implied. See the License for the +# specific language governing permissions and limitations +# under the License. +# +title: 'Apache Polaris Management Service OpenAPI' +linkTitle: 'Management OpenAPI' +weight: 800 +params: + show_page_toc: false +--- + +{{< redoc-polaris "polaris-management-service.yml" >}} diff --git a/site/content/releases/1.0.0/polaris-spark-client.md b/site/content/releases/1.0.0/polaris-spark-client.md new file mode 100644 index 0000000000..a34bceeced --- /dev/null +++ b/site/content/releases/1.0.0/polaris-spark-client.md @@ -0,0 +1,141 @@ +--- +# +# Licensed to the Apache Software Foundation (ASF) under one +# or more contributor license agreements. See the NOTICE file +# distributed with this work for additional information +# regarding copyright ownership. The ASF licenses this file +# to you under the Apache License, Version 2.0 (the +# "License"); you may not use this file except in compliance +# with the License. You may obtain a copy of the License at +# +# http://www.apache.org/licenses/LICENSE-2.0 +# +# Unless required by applicable law or agreed to in writing, +# software distributed under the License is distributed on an +# "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY +# KIND, either express or implied. See the License for the +# specific language governing permissions and limitations +# under the License. +# +Title: Polaris Spark Client +type: docs +weight: 650 +--- + +Apache Polaris now provides Catalog support for Generic Tables (non-Iceberg tables), please check out +the [Catalog API Spec]({{% ref "polaris-catalog-service" %}}) for Generic Table API specs. + +Along with the Generic Table Catalog support, Polaris is also releasing a Spark client, which helps to +provide an end-to-end solution for Apache Spark to manage Delta tables using Polaris. + +Note the Polaris Spark client is able to handle both Iceberg and Delta tables, not just Delta. + +This page documents how to connect Spark with Polaris Service using the Polaris Spark client. + +## Quick Start with Local Polaris service +If you want to quickly try out the functionality with a local Polaris service, simply check out the Polaris repo +and follow the instructions in the Spark plugin getting-started +[README](https://github.com/apache/polaris/blob/main/plugins/spark/v3.5/getting-started/README.md). + +Check out the Polaris repo: +```shell +cd ~ +git clone https://github.com/apache/polaris.git +``` + +## Start Spark against a deployed Polaris service +Before starting, ensure that the deployed Polaris service supports Generic Tables, and that Spark 3.5(version 3.5.3 or later is installed). +Spark 3.5.5 is recommended, and you can follow the instructions below to get a Spark 3.5.5 distribution. +```shell +cd ~ +wget https://archive.apache.org/dist/spark/spark-3.5.5/spark-3.5.5-bin-hadoop3.tgz +mkdir spark-3.5 +tar xzvf spark-3.5.5-bin-hadoop3.tgz -C spark-3.5 --strip-components=1 +cd spark-3.5 +``` + +### Connecting with Spark using the Polaris Spark client +The following CLI command can be used to start the Spark with connection to the deployed Polaris service using +a released Polaris Spark client. + +```shell +bin/spark-shell \ +--packages ,org.apache.iceberg:iceberg-aws-bundle:1.9.0,io.delta:delta-spark_2.12:3.3.1 \ +--conf spark.sql.extensions=org.apache.iceberg.spark.extensions.IcebergSparkSessionExtensions,io.delta.sql.DeltaSparkSessionExtension \ +--conf spark.sql.catalog.spark_catalog=org.apache.spark.sql.delta.catalog.DeltaCatalog \ +--conf spark.sql.catalog..warehouse= \ +--conf spark.sql.catalog..header.X-Iceberg-Access-Delegation=vended-credentials \ +--conf spark.sql.catalog.=org.apache.polaris.spark.SparkCatalog \ +--conf spark.sql.catalog..uri= \ +--conf spark.sql.catalog..credential=':' \ +--conf spark.sql.catalog..scope='PRINCIPAL_ROLE:ALL' \ +--conf spark.sql.catalog..token-refresh-enabled=true +``` +Assume the released Polaris Spark client you want to use is `org.apache.polaris:polaris-spark-3.5_2.12:1.0.0`, +replace the `polaris-spark-client-package` field with the release. + +The `spark-catalog-name` is the catalog name you will use with Spark, and `polaris-catalog-name` is the catalog name used +by Polaris service, for simplicity, you can use the same name. + +Replace the `polaris-service-uri` with the uri of the deployed Polaris service. For example, with a locally deployed +Polaris service, the uri would be `http://localhost:8181/api/catalog`. + +For `client-id` and `client-secret` values, you can refer to [Using Polaris]({{% ref "getting-started/using-polaris" %}}) +for more details. + +You can also start the connection by programmatically initialize a SparkSession, following is an example with PySpark: +```python +from pyspark.sql import SparkSession + +spark = SparkSession.builder + .config("spark.jars.packages", ",org.apache.iceberg:iceberg-aws-bundle:1.9.0,io.delta:delta-spark_2.12:3.3.1") + .config("spark.sql.catalog.spark_catalog", "org.apache.spark.sql.delta.catalog.DeltaCatalog") + .config("spark.sql.extensions", "org.apache.iceberg.spark.extensions.IcebergSparkSessionExtensions,io.delta.sql.DeltaSparkSessionExtension") + .config("spark.sql.catalog.", "org.apache.polaris.spark.SparkCatalog") + .config("spark.sql.catalog..uri", ) + .config("spark.sql.catalog..token-refresh-enabled", "true") + .config("spark.sql.catalog..credential", ":") + .config("spark.sql.catalog..warehouse", ) + .config("spark.sql.catalog.polaris.scope", 'PRINCIPAL_ROLE:ALL') + .config("spark.sql.catalog.polaris.header.X-Iceberg-Access-Delegation", 'vended-credentials') + .getOrCreate() +``` +Similar as the CLI command, make sure the corresponding fields are replaced correctly. + +### Create tables with Spark +After Spark is started, you can use it to create and access Iceberg and Delta tables, for example: +```python +spark.sql("USE polaris") +spark.sql("CREATE NAMESPACE IF NOT EXISTS DELTA_NS") +spark.sql("CREATE NAMESPACE IF NOT EXISTS DELTA_NS.PUBLIC") +spark.sql("USE NAMESPACE DELTA_NS.PUBLIC") +spark.sql("""CREATE TABLE IF NOT EXISTS PEOPLE ( + id int, name string) +USING delta LOCATION 'file:///tmp/var/delta_tables/people'; +""") +``` + +## Connecting with Spark using local Polaris Spark client jar +If you would like to use a version of the Spark client that is currently not yet released, you can +build a Spark client jar locally from source. Please check out the Polaris repo and refer to the Spark plugin +[README](https://github.com/apache/polaris/blob/main/plugins/spark/README.md) for detailed instructions. + +## Limitations +The Polaris Spark client has the following functionality limitations: +1) Create table as select (CTAS) is not supported for Delta tables. As a result, the `saveAsTable` method of `Dataframe` + is also not supported, since it relies on the CTAS support. +2) Create a Delta table without explicit location is not supported. +3) Rename a Delta table is not supported. +4) ALTER TABLE ... SET LOCATION is not supported for DELTA table. +5) For other non-Iceberg tables like csv, it is not supported. + +## Iceberg Spark Client compatibility with Polaris Spark Client +The Polaris Spark client today depends on a specific Iceberg client version, and the version dependency is described +in the following table: + +| Spark Client Version | Iceberg Spark Client Version | +|----------------------|------------------------------| +| 1.0.0 | 1.9.0 | + +The Iceberg dependency is automatically downloaded when the Polaris package is downloaded, so there is no need to +add the Iceberg Spark client in the `packages` configuration. diff --git a/site/content/releases/1.0.0/policy.md b/site/content/releases/1.0.0/policy.md new file mode 100644 index 0000000000..3f49353884 --- /dev/null +++ b/site/content/releases/1.0.0/policy.md @@ -0,0 +1,197 @@ +--- +# +# Licensed to the Apache Software Foundation (ASF) under one +# or more contributor license agreements. See the NOTICE file +# distributed with this work for additional information +# regarding copyright ownership. The ASF licenses this file +# to you under the Apache License, Version 2.0 (the +# "License"); you may not use this file except in compliance +# with the License. You may obtain a copy of the License at +# +# http://www.apache.org/licenses/LICENSE-2.0 +# +# Unless required by applicable law or agreed to in writing, +# software distributed under the License is distributed on an +# "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY +# KIND, either express or implied. See the License for the +# specific language governing permissions and limitations +# under the License. +# +title: Policy +type: docs +weight: 425 +--- + +The Polaris Policy framework empowers organizations to centrally define, manage, and enforce fine-grained governance, lifecycle, and operational rules across all data resources in the catalog. + +With the policy API, you can: +- Create and manage policies +- Attach policies to specific resources (catalogs, namespaces, tables, or views) +- Check applicable policies for any given resource + +## What is a Policy? + +A policy in Apache Polaris is a structured entity that defines rules governing actions on specified resources under +predefined conditions. Each policy contains: + +- **Name**: A unique identifier within a namespace +- **Type**: Determines the semantics and expected format of the policy content +- **Description**: Explains the purpose of the policy +- **Content**: Contains the actual rules defining the policy behavior +- **Version**: An automatically tracked revision number +- **Inheritable**: Whether the policy can be inherited by child resources, decided by its type + +### Policy Types + +Polaris supports several predefined system policy types (prefixed with `system.`): + +| Policy Type | Purpose | JSON-Schema | Applies To | +|-------------|-------------------------------------------------------|-------------|------------| +| **`system.data-compaction`** | Defines rules for data file compaction operations | [`data-compaction/2025-02-03.json`](https://polaris.apache.org/schemas/policies/system/data-compaction/2025-02-03.json) | Iceberg **table**, **namespace**, **catalog** | +| **`system.metadata-compaction`** | Defines rules for metadata file compaction operations | [`metadata-compaction/2025-02-03.json`](https://polaris.apache.org/schemas/policies/system/metadata-compaction/2025-02-03.json) | Iceberg **table**, **namespace**, **catalog** | +| **`system.orphan-file-removal`** | Defines rules for removing orphaned files | [`orphan-file-removal/2025-02-03.json`](https://polaris.apache.org/schemas/policies/system/orphan-file-removal/2025-02-03.json) | Iceberg **table**, **namespace**, **catalog** | +| **`system.snapshot-expiry`** | Defines rules for snapshot expiration | [`snapshot-expiry/2025-02-03.json`](https://polaris.apache.org/schemas/policies/system/snapshot-expiry/2025-02-03.json) | Iceberg **table**, **namespace**, **catalog** | + +Support for additional predefined system policy types and custom policy type definitions is in progress. +For more details, please refer to the [roadmap](https://github.com/apache/polaris/discussions/1028). + +### Policy Inheritance + +The entity hierarchy in Polaris is structured as follows: + +``` + Catalog + | + Namespace + | + +-----------+----------+ + | | | +Iceberg Iceberg Generic + Table View Table +``` + +Policies can be attached at any level, and inheritance flows from catalog down to namespace, then to tables and views. + +Policies can be inheritable or non-inheritable: + +- **Inheritable policies**: Apply to the target resource and all its applicable child resources +- **Non-inheritable policies**: Apply only to the specific target resource + +The inheritance follows an override mechanism: +1. Table-level policies override namespace and catalog policies +2. Namespace-level policies override parent namespace and catalog policies + +> **Important:** Because an override completely replaces the same policy type at higher levels, +> **only one instance of a given policy type can be attached to (and therefore affect) a resource**. + +## Working with Policies + +### Creating a Policy + +To create a policy, you need to provide a name, type, and optionally a description and content: + +```json +POST /polaris/v1/{prefix}/namespaces/{namespace}/policies +{ + "name": "compaction-policy", + "type": "system.data-compaction", + "description": "Policy for optimizing table storage", + "content": "{\"version\": \"2025-02-03\", \"enable\": true, \"config\": {\"target_file_size_bytes\": 134217728}}" +} +``` + +The policy content is validated against a schema specific to its type. Here are a few policy content examples: +- Data Compaction Policy +```json +{ + "version": "2025-02-03", + "enable": true, + "config": { + "target_file_size_bytes": 134217728, + "compaction_strategy": "bin-pack", + "max-concurrent-file-group-rewrites": 5 + } +} +``` +- Orphan File Removal Policy +```json +{ + "version": "2025-02-03", + "enable": true, + "max_orphan_file_age_in_days": 30, + "locations": ["s3://my-bucket/my-table-location"], + "config": { + "prefix_mismatch_mode": "ignore" + } +} +``` + +### Attaching Policies to Resources + +Policies can be attached to different resource levels: + +1. **Catalog level**: Applies to the entire catalog +2. **Namespace level**: Applies to a specific namespace +3. **Table-like level**: Applies to individual tables or views + +Example of attaching a policy to a table: + +```json +PUT /polaris/v1/{prefix}/namespaces/{namespace}/policies/{policy-name}/mappings +{ + "target": { + "type": "table-like", + "path": ["NS1", "NS2", "test_table_1"] + } +} +``` + +For inheritable policies, only one policy of a given type can be attached to a resource. For non-inheritable policies, +multiple policies of the same type can be attached. + +### Retrieving Applicable Policies +A user can view applicable policies on a resource (e.g., table, namespace, or catalog) as long as they have +read permission on that resource. + +Here is an example to find all policies that apply to a specific resource (including inherited policies): +``` +GET /polaris/v1/catalog/applicable-policies?namespace=finance%1Fquarterly&target-name=transactions +``` + +**Sample response:** +```json +{ + "policies": [ + { + "name": "snapshot-expiry-policy", + "type": "system.snapshot-expiry", + "appliedAt": "namespace", + "content": { + "version": "2025-02-03", + "enable": true, + "config": { + "min_snapshot_to_keep": 1, + "max_snapshot_age_days": 2, + "max_ref_age_days": 3 + } + } + }, + { + "name": "compaction-policy", + "type": "system.data-compaction", + "appliedAt": "catalog", + "content": { + "version": "2025-02-03", + "enable": true, + "config": { + "target_file_size_bytes": 134217728 + } + } + } + ] +} +``` + +### API Reference + +For the complete and up-to-date API specification, see the [policy-api.yaml](https://github.com/apache/polaris/blob/main/spec/polaris-catalog-apis/policy-apis.yaml). \ No newline at end of file diff --git a/site/content/releases/1.0.0/realm.md b/site/content/releases/1.0.0/realm.md new file mode 100644 index 0000000000..9da5e7e25b --- /dev/null +++ b/site/content/releases/1.0.0/realm.md @@ -0,0 +1,53 @@ +--- +# +# Licensed to the Apache Software Foundation (ASF) under one +# or more contributor license agreements. See the NOTICE file +# distributed with this work for additional information +# regarding copyright ownership. The ASF licenses this file +# to you under the Apache License, Version 2.0 (the +# "License"); you may not use this file except in compliance +# with the License. You may obtain a copy of the License at +# +# http://www.apache.org/licenses/LICENSE-2.0 +# +# Unless required by applicable law or agreed to in writing, +# software distributed under the License is distributed on an +# "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY +# KIND, either express or implied. See the License for the +# specific language governing permissions and limitations +# under the License. +# +Title: Realm +type: docs +weight: 350 +--- + +This page explains what a realm is and what it is used for in Polaris. + +### What is it? + +A realm in Polaris serves as logical partitioning mechanism within the catalog system. This isolation allows for multitenancy, enabling different teams, environments or organizations to operate independently within the same Polaris deployment. + +### Key Characteristics + +**Isolation:** Each realm encapsulates its own set of resources, ensuring that operations, policies in one realm do not affect others. + +**Authentication Context:** When configuring Polaris, principals credentials are associated with a specific realm. This allows for the separation of security concerns across different realms. + +**Configuration Scope:** Realm identifiers are used in various configurations, such as connection strings feature configurations, etc. + +An example of this is: + +`jdbc:postgresql://localhost:5432/{realm} +` +This ensures that each realm's data is stored separately. + +### How is it used in the system? + +**RealmContext:** It is a key concept used to identify and resolve the context in which operations are performed. For example `DefaultRealmContextResolver`, a realm is resolved from request headers, and operations are performed based on the resolved realm identifier. + +**Authentication and Authorization:** For example, in `BasePolarisAuthenticator`, `RealmContext` is used to provide context about the current security domain, which is used to retrieve the correct `PolarisMetastoreManager` that manages all Polaris entities and associated grant records metadata for +authorization. + +**Isolation:** In methods like `createEntityManagerFactory(@Nonnull RealmContext realmContext)` from `PolarisEclipseLinkPersistenceUnit` interface, the realm context influence how resources are created or managed based on the security policies of that realm. +An example of this is the way a realm name can be used to create a database connection url so that you have one database instance per realm, when applicable. Or it can be more granular and applied at primary key level (within the same database instance). \ No newline at end of file diff --git a/site/content/releases/1.0.0/telemetry.md b/site/content/releases/1.0.0/telemetry.md new file mode 100644 index 0000000000..8df97f505d --- /dev/null +++ b/site/content/releases/1.0.0/telemetry.md @@ -0,0 +1,192 @@ +--- +# +# Licensed to the Apache Software Foundation (ASF) under one +# or more contributor license agreements. See the NOTICE file +# distributed with this work for additional information +# regarding copyright ownership. The ASF licenses this file +# to you under the Apache License, Version 2.0 (the +# "License"); you may not use this file except in compliance +# with the License. You may obtain a copy of the License at +# +# http://www.apache.org/licenses/LICENSE-2.0 +# +# Unless required by applicable law or agreed to in writing, +# software distributed under the License is distributed on an +# "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY +# KIND, either express or implied. See the License for the +# specific language governing permissions and limitations +# under the License. +# +title: Telemetry +type: docs +weight: 450 +--- + +## Metrics + +Metrics are published using [Micrometer]; they are available from Polaris's management interface +(port 8282 by default) under the path `/q/metrics`. For example, if the server is running on +localhost, the metrics can be accessed via http://localhost:8282/q/metrics. + +[Micrometer]: https://quarkus.io/guides/telemetry-micrometer + +Metrics can be scraped by Prometheus or any compatible metrics scraping server. See: +[Prometheus](https://prometheus.io) for more information. + +Additional tags can be added to the metrics by setting the `polaris.metrics.tags.*` property. Each +tag is a key-value pair, where the key is the tag name and the value is the tag value. For example, +to add a tag `environment=prod` to all metrics, set `polaris.metrics.tags.environment=prod`. Many +tags can be added, such as below: + +```properties +polaris.metrics.tags.service=polaris +polaris.metrics.tags.environment=prod +polaris.metrics.tags.region=us-west-2 +``` + +Note that by default Polaris adds one tag: `application=Polaris`. You can override this tag by +setting the `polaris.metrics.tags.application=` property. + +### Realm ID Tag + +Polaris can add the realm ID as a tag to all API and HTTP request metrics. This is disabled by +default to prevent high cardinality issues, but can be enabled by setting the following properties: + +```properties +polaris.metrics.realm-id-tag.enable-in-api-metrics=true +polaris.metrics.realm-id-tag.enable-in-http-metrics=true +``` + +You should be particularly careful when enabling the realm ID tag in HTTP request metrics, as these +metrics typically have a much higher cardinality than API request metrics. + +In order to prevent the number of tags from growing indefinitely and causing performance issues or +crashing the server, the number of unique realm IDs in HTTP request metrics is limited to 100 by +default. If the number of unique realm IDs exceeds this value, a warning will be logged and no more +HTTP request metrics will be recorded. This threshold can be changed by setting the +`polaris.metrics.realm-id-tag.http-metrics-max-cardinality` property. + +## Traces + +Traces are published using [OpenTelemetry]. + +[OpenTelemetry]: https://quarkus.io/guides/opentelemetry-tracing + +By default OpenTelemetry is disabled in Polaris, because there is no reasonable default +for the collector endpoint for all cases. + +To enable OpenTelemetry and publish traces for Polaris set `quarkus.otel.sdk.disabled=false` +and configure a valid collector endpoint URL with `http://` or `https://` as the server property +`quarkus.otel.exporter.otlp.traces.endpoint`. + +_If these properties are not set, the server will not publish traces._ + +The collector must talk the OpenTelemetry protocol (OTLP) and the port must be its gRPC port +(by default 4317), e.g. "http://otlp-collector:4317". + +By default, Polaris adds a few attributes to the [OpenTelemetry Resource] to identify the server, +and notably: + +- `service.name`: set to `Apache Polaris Server (incubating)`; +- `service.version`: set to the Polaris version. + +[OpenTelemetry Resource]: https://opentelemetry.io/docs/languages/js/resources/ + +You can override the default resource attributes or add additional ones by setting the +`quarkus.otel.resource.attributes` property. + +This property expects a comma-separated list of key-value pairs, where the key is the attribute name +and the value is the attribute value. For example, to change the service name to `Polaris` and add +an attribute `deployment.environment=dev`, set the following property: + +```properties +quarkus.otel.resource.attributes=service.name=Polaris,deployment.environment=dev +``` + +The alternative syntax below can also be used: + +```properties +quarkus.otel.resource.attributes[0]=service.name=Polaris +quarkus.otel.resource.attributes[1]=deployment.environment=dev +``` + +Finally, two additional span attributes are added to all request parent spans: + +- `polaris.request.id`: The unique identifier of the request, if set by the caller through the + `Polaris-Request-Id` header. +- `polaris.realm`: The unique identifier of the realm. Always set (unless the request failed because + of a realm resolution error). + +### Troubleshooting Traces + +If the server is unable to publish traces, check first for a log warning message like the following: + +``` +SEVERE [io.ope.exp.int.grp.OkHttpGrpcExporter] (OkHttp http://localhost:4317/...) Failed to export spans. +The request could not be executed. Full error message: Failed to connect to localhost/0:0:0:0:0:0:0:1:4317 +``` + +This means that the server is unable to connect to the collector. Check that the collector is +running and that the URL is correct. + +## Logging + +Polaris relies on [Quarkus](https://quarkus.io/guides/logging) for logging. + +By default, logs are written to the console and to a file located in the `./logs` directory. The log +file is rotated daily and compressed. The maximum size of the log file is 10MB, and the maximum +number of backup files is 14. + +JSON logging can be enabled by setting the `quarkus.log.console.json` and `quarkus.log.file.json` +properties to `true`. By default, JSON logging is disabled. + +The log level can be set for the entire application or for specific packages. The default log level +is `INFO`. To set the log level for the entire application, use the `quarkus.log.level` property. + +To set the log level for a specific package, use the `quarkus.log.category."package-name".level`, +where `package-name` is the name of the package. For example, the package `io.smallrye.config` has a +useful logger to help debugging configuration issues; but it needs to be set to the `DEBUG` level. +This can be done by setting the following property: + +```properties +quarkus.log.category."io.smallrye.config".level=DEBUG +``` + +The log message format for both console and file output is highly configurable. The default format +is: + +``` +%d{yyyy-MM-dd HH:mm:ss,SSS} %-5p [%c{3.}] [%X{requestId},%X{realmId}] [%X{traceId},%X{parentId},%X{spanId},%X{sampled}] (%t) %s%e%n +``` + +Refer to the [Logging format](https://quarkus.io/guides/logging#logging-format) guide for more +information on placeholders and how to customize the log message format. + +### MDC Logging + +Polaris uses Mapped Diagnostic Context (MDC) to enrich log messages with additional context. The +following MDC keys are available: + +- `requestId`: The unique identifier of the request, if set by the caller through the + `Polaris-Request-Id` header. +- `realmId`: The unique identifier of the realm. Always set. +- `traceId`: The unique identifier of the trace. Present if tracing is enabled and the message is + originating from a traced context. +- `parentId`: The unique identifier of the parent span. Present if tracing is enabled and the + message is originating from a traced context. +- `spanId`: The unique identifier of the span. Present if tracing is enabled and the message is + originating from a traced context. +- `sampled`: Whether the trace has been sampled. Present if tracing is enabled and the message is + originating from a traced context. + +Other MDC keys can be added by setting the `polaris.log.mdc.*` property. Each property is a +key-value pair, where the key is the MDC key name and the value is the MDC key value. For example, +to add the MDC keys `environment=prod` and `region=us-west-2` to all log messages, set the following +properties: + +```properties +polaris.log.mdc.environment=prod +polaris.log.mdc.region=us-west-2 +``` + +MDC context is propagated across threads, including in `TaskExecutor` threads. \ No newline at end of file