From 3d30c4be01ca264466a8c75e31f4218deabce896 Mon Sep 17 00:00:00 2001 From: Colebow Date: Wed, 31 Jul 2024 12:03:20 -0700 Subject: [PATCH 1/2] Fix wrapping in quickstart.md --- docs/quickstart.md | 124 +++++++++++++++++++++++++++++++++------------ 1 file changed, 93 insertions(+), 31 deletions(-) diff --git a/docs/quickstart.md b/docs/quickstart.md index 797f6fe643..3ccf9caca1 100644 --- a/docs/quickstart.md +++ b/docs/quickstart.md @@ -16,15 +16,21 @@ # Quick Start -This guide serves as a introduction to several key entities that can be managed with Polaris, describes how to build and deploy Polaris locally, and finally includes examples of how to use Polaris with Apache Spark. +This guide serves as a introduction to several key entities that can be managed +with Polaris, describes how to build and deploy Polaris locally, and finally +includes examples of how to use Polaris with Apache Spark and Trino. ## Prerequisites -This guide covers building Polaris, deploying it locally or via [Docker](https://www.docker.com/), and interacting with it using the command-line interface and [Apache Spark](https://spark.apache.org/). Before proceeding with Polaris, be sure to satisfy the relevant prerequisites listed here. +This guide covers building Polaris, deploying it locally or via [Docker](https://www.docker.com/), +and interacting with it using the command-line interface and +[Apache Spark](https://spark.apache.org/). Before proceeding with Polaris, be +sure to satisfy the relevant prerequisites listed here. ### Building and Deploying Polaris -To get the latest Polaris code, you'll need to clone the repository using [git](https://git-scm.com/). You can install git using [homebrew](https://brew.sh/): +To get the latest Polaris code, you'll need to clone the repository using +[git](https://git-scm.com/). You can install git using [homebrew](https://brew.sh/): ``` brew install git @@ -39,7 +45,9 @@ git clone https://github.com/polaris-catalog/polaris.git #### With Docker -If you plan to deploy Polaris inside [Docker](https://www.docker.com/), you'll need to install docker itself. For example, this can be done using [homebrew](https://brew.sh/): +If you plan to deploy Polaris inside [Docker](https://www.docker.com/)], you'll +need to install docker itself. For example, this can be done using +[homebrew](https://brew.sh/): ``` brew install --cask docker @@ -49,9 +57,13 @@ Once installed, make sure Docker is running. #### From Source -If you plan to build Polaris from source yourself, you will need to satisfy a few prerequisites first. +If you plan to build Polaris from source yourself, you will need to satisfy a +few prerequisites first. -Polaris is built using [gradle](https://gradle.org/) and is compatible with Java 21. We recommend the use of [jenv](https://www.jenv.be/) to manage multiple Java versions. For example, to install Java 21 via [homebrew](https://brew.sh/) and configure it with jenv: +Polaris is built using [gradle](https://gradle.org/) and is compatible with +Java 21. We recommend the use of [jenv](https://www.jenv.be/) to manage multiple +Java versions. For example, to install Java 21 via [homebrew](https://brew.sh/) +and configure it with jenv: ``` cd ~/polaris @@ -63,11 +75,15 @@ jenv local 21 ### Connecting to Polaris -Polaris is compatible with any [Apache Iceberg](https://iceberg.apache.org/) client that supports the REST API. Depending on the client you plan to use, refer to the prerequisites below. +Polaris is compatible with any [Apache Iceberg](https://iceberg.apache.org/) +client that supports the REST API. Depending on the client you plan to use, +refer to the prerequisites below. #### With Spark -If you want to connect to Polaris with [Apache Spark](https://spark.apache.org/), you'll need to start by cloning Spark. As [above](#building-and-deploying-polaris), make sure [git](https://git-scm.com/) is installed first. You can install it with [homebrew](https://brew.sh/): +If you want to connect to Polaris with [Apache Spark](https://spark.apache.org/), +you'll need to start by cloning Spark. As [above](#building-and-deploying-polaris), +make sure [git](https://git-scm.com/) is installed first. You can install it with [homebrew](https://brew.sh/): ``` brew install git @@ -84,7 +100,9 @@ git checkout branch-3.5 ## Deploying Polaris -Polaris can be deployed via a lightweight docker image or as a standalone process. Before starting, be sure that you've satisfied the relevant [prerequisites](#building-and-deploying-polaris) detailed above. +Polaris can be deployed via a lightweight docker image or as a standalone +process. Before starting, be sure that you've satisfied the relevant +[prerequisites](#building-and-deploying-polaris) detailed above. ### Docker Image @@ -95,7 +113,8 @@ cd ~/polaris docker compose -f docker-compose.yml up --build ``` -Once the `polaris-polaris` container is up, you can continue to [Defining a Catalog](#defining-a-catalog). +Once the `polaris-polaris` container is up, you can continue to +[Defining a Catalog](#defining-a-catalog). ### Building Polaris @@ -106,7 +125,8 @@ cd ~/polaris ./gradlew runApp ``` -You should see output for some time as Polaris builds and starts up. Eventually, you won’t see any more logs and should see messages that resemble the following: +You should see output for some time as Polaris builds and starts up. Eventually, +you won’t see any more logs and should see messages that resemble the following: ``` INFO [...] [main] [] o.e.j.s.handler.ContextHandler: Started i.d.j.MutableServletContextHandler@... @@ -119,9 +139,14 @@ At this point, Polaris is running. ## Bootstrapping Polaris -For this tutorial, we'll launch an instance of Polaris that stores entities only in-memory. This means that any entities that you define will be destroyed when Polaris is shut down. It also means that Polaris will automatically bootstrap itself with root credentials. For more information on how to configure Polaris for production usage, see the [docs](./configuring-polaris-for-production.md). +For this tutorial, we'll launch an instance of Polaris that stores entities only +in-memory. This means that any entities that you define will be destroyed when +Polaris is shut down. It also means that Polaris will automatically bootstrap +itself with root credentials. For more information on how to configure Polaris +for production usage, see the [docs](./configuring-polaris-for-production.md). -When Polaris is launched using in-memory mode the root `CLIENT_ID` and `CLIENT_SECRET` can be found in stdout on initial startup. For example: +When Polaris is launched using in-memory mode the root `CLIENT_ID` and +`CLIENT_SECRET` can be found in stdout on initial startup. For example: ``` realm: default-realm root principal credentials: XXXX:YYYY @@ -131,7 +156,10 @@ Be sure to note of these credentials as we'll be using them below. ## Defining a Catalog -In Polaris, the [catalog](./entities/catalog.md) is the top-level entity that objects like [tables](./entities.md#table) and [views](./entities.md#view) are organized under. With a Polaris service running, you can create a catalog like so: +In Polaris, the [catalog](./entities/catalog.md) is the top-level entity that +objects like [tables](./entities.md#table) and [views](./entities.md#view) are +organized under. With a Polaris service running, you can create a catalog like +so: ``` cd ~/polaris @@ -149,16 +177,29 @@ cd ~/polaris This will create a new catalog called **quickstart_catalog**. -The `DEFAULT_BASE_LOCATION` you provide will be the default location that objects in this catalog should be stored in, and the `ROLE_ARN` you provide should be a [Role ARN](https://docs.aws.amazon.com/IAM/latest/UserGuide/reference-arns.html) with access to read and write data in that location. These credentials will be provided to engines reading data from the catalog once they have authenticated with Polaris using credentials that have access to those resources. +The `DEFAULT_BASE_LOCATION` you provide will be the default location that +objects in this catalog should be stored in, and the `ROLE_ARN` you provide +should be a [Role ARN](https://docs.aws.amazon.com/IAM/latest/UserGuide/reference-arns.html) +with access to read and write data in that location. These credentials will be +provided to engines reading data from the catalog once they have authenticated +with Polaris using credentials that have access to those resources. -If you’re using a storage type other than S3, such as Azure, you’ll provide a different type of credential than a Role ARN. For more details on supported storage types, see the [docs](./entities.md#storage-type). +If you’re using a storage type other than S3, such as Azure, you’ll provide a +different type of credential than a Role ARN. For more details on supported +storage types, see the [docs](./entities.md#storage-type). -Additionally, if Polaris is running somewhere other than `localhost:8181`, you can specify the correct hostname and port by providing `--host` and `--port` flags. For the full set of options supported by the CLI, please refer to the [docs](./command-line-interface.md). +Additionally, if Polaris is running somewhere other than `localhost:8181`, you +can specify the correct hostname and port by providing `--host` and `--port` +flags. For the full set of options supported by the CLI, please refer to the +[docs](./command-line-interface.md). ### Creating a Principal and Assigning it Privileges -With a catalog created, we can create a [principal](./entities.md#principal) that has access to manage that catalog. For details on how to configure the Polaris CLI, see [the section above](#defining-a-catalog) or refer to the [docs](./command-line-interface.md). +With a catalog created, we can create a [principal](./entities.md#principal) +that has access to manage that catalog. For details on how to configure the +Polaris CLI, see [the section above](#defining-a-catalog) or refer to the +[docs](./command-line-interface.md). ``` ./polaris \ @@ -187,14 +228,18 @@ With a catalog created, we can create a [principal](./entities.md#principal) tha Be sure to provide the necessary credentials, hostname, and port as before. -When the `principals create` command completes successfully, it will return the credentials for this new principal. Be sure to note these down for later. For example: +When the `principals create` command completes successfully, it will return the +credentials for this new principal. Be sure to note these down for later. For +example: ``` ./polaris ... principals create example {"clientId": "XXXX", "clientSecret": "YYYY"} ``` -Now, we grant the principal the [principal role](./entities.md#principal-role) we created, and grant the [catalog role](./entities.md#catalog-role) the principal role we created. For more information on these entities, please refer to the linked documentation. +Now, we grant the principal the [principal role](./entities.md#principal-role) +we created, and grant the [catalog role](./entities.md#catalog-role) the principal role we created. For +more information on these entities, please refer to the linked documentation. ``` ./polaris \ @@ -219,7 +264,10 @@ Now, we’ve linked our principal to the catalog via roles like so: ![Principal to Catalog](./img/quickstart/privilege-illustration-1.png "Principal to Catalog") -In order to give this principal the ability to interact with the catalog, we must assign some [privileges](./entities.md#privileges). For the time being, we will give this principal the ability to fully manage content in our new catalog. We can do this with the CLI like so: +In order to give this principal the ability to interact with the catalog, we +must assign some [privileges](./entities.md#privileges). For the time being, we +will give this principal the ability to fully manage content in our new catalog. +We can do this with the CLI like so: ``` ./polaris \ @@ -233,23 +281,32 @@ In order to give this principal the ability to interact with the catalog, we mus CATALOG_MANAGE_CONTENT ``` -This grants the [catalog privileges](./entities.md#privilege) `CATALOG_MANAGE_CONTENT` to our catalog role, linking everything together like so: +This grants the [catalog privileges](./entities.md#privilege) `CATALOG_MANAGE_CONTENT` to our +catalog role, linking everything together like so: ![Principal to Catalog with Catalog Role](./img/quickstart/privilege-illustration-2.png "Principal to Catalog with Catalog Role") -`CATALOG_MANAGE_CONTENT` has create/list/read/write privileges on all entities within the catalog. The same privilege could be granted to a namespace, in which case the principal could create/list/read/write any entity under that namespace. +`CATALOG_MANAGE_CONTENT` has create/list/read/write privileges on all entities +within the catalog. The same privilege could be granted to a namespace, in which +case the principal could create/list/read/write any entity under that namespace. ## Using Iceberg & Polaris -At this point, we’ve created a principal and granted it the ability to manage a catalog. We can now use an external engine to assume that principal, access our catalog, and store data in that catalog using [Apache Iceberg](https://iceberg.apache.org/). +At this point, we’ve created a principal and granted it the ability to manage a +catalog. We can now use an external engine to assume that principal, access our +catalog, and store data in that catalog using [Apache Iceberg](https://iceberg.apache.org/). ### Connecting with Spark -To use a Polaris-managed catalog in [Apache Spark](https://spark.apache.org/), we can configure Spark to use the Iceberg catalog REST API. +To use a Polaris-managed catalog in [Apache Spark](https://spark.apache.org/), +we can configure Spark to use the Iceberg catalog REST API. -This guide uses [Apache Spark 3.5](https://spark.apache.org/releases/spark-release-3-5-0.html), but be sure to find [the appropriate iceberg-spark package for your Spark version](https://mvnrepository.com/artifact/org.apache.iceberg/iceberg-spark). From a local Spark clone on the `branch-3.5` branch we can run the following: +This guide uses [Apache Spark 3.5](https://spark.apache.org/releases/spark-release-3-5-0.html), +but be sure to find [the appropriate iceberg-spark package for your Spark version](https://mvnrepository.com/artifact/org.apache.iceberg/iceberg-spark). +From a local Spark clone on the `branch-3.5` branch we can run the following: -_Note: the credentials provided here are those for our principal, not the root credentials._ +_Note: the credentials provided here are those for our principal, not the root +credentials._ ``` bin/spark-shell \ @@ -266,13 +323,18 @@ bin/spark-shell \ ``` -Replace `XXXX` and `YYYY` with the client ID and client secret generated when you created the `quickstart_user` principal. +Replace `XXXX` and `YYYY` with the client ID and client secret generated when +you created the `quickstart_user` principal. -Similar to the CLI commands above, this configures Spark to use the Polaris running at `localhost:8181` as a catalog. If your Polaris server is running elsewhere, but sure to update the configuration appropriately. +Similar to the CLI commands above, this configures Spark to use the Polaris +running at `localhost:8181` as a catalog. If your Polaris server is running +elsewhere, but sure to update the configuration appropriately. -Finally, note that we include the `hadoop-aws` package here. If your table is using a different filesystem, be sure to include the appropriate dependency. +Finally, note that we include the `hadoop-aws` package here. If your table is +using a different filesystem, be sure to include the appropriate dependency. -Once the Spark session starts, we can create a namespace and table within the catalog: +Once the Spark session starts, we can create a namespace and table within the +catalog: ``` spark.sql("USE quickstart_catalog") From 79aac65a40a5cde7c18ffcaa3bb34f201db6a2ac Mon Sep 17 00:00:00 2001 From: Colebow Date: Wed, 31 Jul 2024 13:21:39 -0700 Subject: [PATCH 2/2] Add instructions for using Trino with Polaris --- README.md | 5 +- docs/index.html | 348 ++++++++++++++++++++++++++++++++------------- docs/quickstart.md | 29 ++++ spec/index.yaml | 15 +- 4 files changed, 293 insertions(+), 104 deletions(-) diff --git a/README.md b/README.md index 56bca34f6e..2a55ad1f1c 100644 --- a/README.md +++ b/README.md @@ -178,9 +178,10 @@ $ curl -i -X PUT -H "Authorization: Bearer $PRINCIPAL_TOKEN" -H 'Accept: applica -d '{"name": "polaris", "id": 100, "type": "INTERNAL", "readOnly": false}' ``` -This creates a catalog called `polaris`. From here, you can use Spark to create namespaces, tables, etc. +This creates a catalog called `polaris`. From here, you can use any Iceberg REST +compatible clients (e.g. Spark or Trino) to create namespaces, tables, etc. -You must run the following as the first query in your spark-sql shell to actually use Polaris: +You must run the following as the first query in your SQL shell to actually use Polaris: ``` use polaris; diff --git a/docs/index.html b/docs/index.html index ac8987d906..4c356e121f 100644 --- a/docs/index.html +++ b/docs/index.html @@ -449,7 +449,6 @@ -231.5279,231.248 -231.873,231.248 -0.3451,0 -104.688, -104.0616 -231.873,-231.248 z " fill="currentColor">

Polaris Catalog Documentation

Download OpenAPI specification:Download

-

Quick Start

This guide serves as a introduction to several key entities that can be managed with Polaris, describes how to build and deploy Polaris locally, and finally includes examples of how to use Polaris with Spark and Trino.

-

Prerequisites

Quick Start

This guide serves as a introduction to several key entities that can be managed +with Polaris, describes how to build and deploy Polaris locally, and finally +includes examples of how to use Polaris with Apache Spark and Trino.

+

Prerequisites

This guide covers building Polaris, deploying it locally or via Docker, and interacting with it using the command-line interface and Apache Spark. Before proceeding with Polaris, be sure to satisfy the relevant prerequisites listed here.

+">

This guide covers building Polaris, deploying it locally or via Docker, +and interacting with it using the command-line interface and +Apache Spark. Before proceeding with Polaris, be +sure to satisfy the relevant prerequisites listed here.

Building and Deploying Polaris

-

To get the latest Polaris code, you'll need to clone the repository using git. You can install git using homebrew:

+

To get the latest Polaris code, you'll need to clone the repository using +git. You can install git using homebrew:

brew install git
 

Then, use git to clone the Polaris repo:

@@ -531,15 +547,19 @@

Building and Deploying Polaris

git clone https://github.com/polaris-catalog/polaris.git

With Docker

-

If you plan to deploy Polaris inside Docker], you'll need to install docker itself. For can be done using homebrew:

-
brew install docker
-
-

Once installed, make sure Docker is running. This can be done on macOS with:

-
open -a Docker
+

If you plan to deploy Polaris inside Docker], you'll +need to install docker itself. For example, this can be done using +homebrew:

+
brew install --cask docker
 
+

Once installed, make sure Docker is running.

From Source

-

If you plan to build Polaris from source yourself, you will need to satisfy a few prerequisites first.

-

Polaris is built using gradle and is compatible with Java 21. We recommend the use of jenv to manage multiple Java versions. For example, to install Java 21 via [homebre]w(https://brew.sh/) and configure it with jenv:

+

If you plan to build Polaris from source yourself, you will need to satisfy a +few prerequisites first.

+

Polaris is built using gradle and is compatible with +Java 21. We recommend the use of jenv to manage multiple +Java versions. For example, to install Java 21 via homebrew +and configure it with jenv:

cd ~/polaris
 jenv local 21
 brew install openjdk@21 gradle@8 jenv
@@ -547,66 +567,91 @@ 

From Source

jenv local 21

Connecting to Polaris

-

Polaris is compatible with any Apache Iceberg client that supports the REST API. Depending on the client you plan to use, refer to the prerequisites below.

+

Polaris is compatible with any Apache Iceberg +client that supports the REST API. Depending on the client you plan to use, +refer to the prerequisites below.

With Spark

-

If you want to connect to Polaris with Apache Spark, you'll need to start by cloning Spark. As above, make sure git is installed first. You can install it with homebrew:

+

If you want to connect to Polaris with Apache Spark, +you'll need to start by cloning Spark. As above, +make sure git is installed first. You can install it with homebrew:

brew install git
 
-

Then, clone Spark and check out a versioned branch. This guide uses Spark 3.5.0.

+

Then, clone Spark and check out a versioned branch. This guide uses Spark 3.5.

cd ~
 git clone https://github.com/apache/spark.git
 cd ~/spark
-git checkout branch-3.5.0
+git checkout branch-3.5
 
-

Deploying Polaris

Deploying Polaris

Polaris can be deployed via a lightweight docker image or as a standalone process. Before starting, be sure that you've satisfied the relevant prerequisites detailed above.

+">

Polaris can be deployed via a lightweight docker image or as a standalone +process. Before starting, be sure that you've satisfied the relevant +prerequisites detailed above.

Docker Image

To start using Polaris in Docker, launch Polaris while Docker is running:

cd ~/polaris
 docker compose -f docker-compose.yml up --build
 
-

Once the polaris-polaris container is up, you can continue to Defining a Catalog.

+

Once the polaris-polaris container is up, you can continue to +Defining a Catalog.

Building Polaris

Run Polaris locally with:

cd ~/polaris
 ./gradlew runApp
 
-

You should see output for some time as Polaris builds and starts up. Eventually, you won’t see any more logs and should see messages that resemble the following:

+

You should see output for some time as Polaris builds and starts up. Eventually, +you won’t see any more logs and should see messages that resemble the following:

INFO  [...] [main] [] o.e.j.s.handler.ContextHandler: Started i.d.j.MutableServletContextHandler@...
 INFO  [...] [main] [] o.e.j.server.AbstractConnector: Started application@...
 INFO  [...] [main] [] o.e.j.server.AbstractConnector: Started admin@...
 INFO  [...] [main] [] o.eclipse.jetty.server.Server: Started Server@...
 

At this point, Polaris is running.

-

Bootstrapping Polaris

Bootstrapping Polaris

For this tutorial, we'll launch an instance of Polaris that stores entities only in-memory. This means that any entities that you define will be destroyed when Polaris is shut down. It also means that Polaris will automatically bootstrap itself with root credentials. For more information on how to configure Polaris for production usage, see the docs.

-

When Polaris is launched using in-memory mode the root CLIENT_ID and CLIENT_SECRET can be found in stdout on initial startup. For example:

-
Bootstrapped with credentials: {"client-id": "XXXX", "client-secret": "YYYY"}
+">

For this tutorial, we'll launch an instance of Polaris that stores entities only +in-memory. This means that any entities that you define will be destroyed when +Polaris is shut down. It also means that Polaris will automatically bootstrap +itself with root credentials. For more information on how to configure Polaris +for production usage, see the docs.

+

When Polaris is launched using in-memory mode the root CLIENT_ID and +CLIENT_SECRET can be found in stdout on initial startup. For example:

+
realm: default-realm root principal credentials: XXXX:YYYY
 

Be sure to note of these credentials as we'll be using them below.

-

Defining a Catalog

Defining a Catalog

Building Polaris quickstart_catalog </code></pre> <p>This will create a new catalog called <strong>quickstart_catalog</strong>. </p> -<p>The <code>DEFAULT_BASE_LOCATION</code> you provide will be the default location that objects in this catalog should be stored in, and the <code>ROLE_ARN</code> you provide should be a <a href="https://docs.aws.amazon.com/IAM/latest/UserGuide/reference-arns.html">Role ARN</a> with access to read and write data in that location. These credentials will be provided to engines reading data from the catalog once they have authenticated with Polaris using credentials that have access to those resources.</p> -<p>If you’re using a storage type other than S3, such as Azure, you’ll provide a different type of credential than a Role ARN. For more details on supported storage types, see the <a href="./entities.md#storage-type">docs</a>. </p> -<p>Additionally, if Polaris is running somewhere other than <code>localhost:8181</code>, you can specify the correct hostname and port by providing <code>--host</code> and <code>--port</code> flags. For the full set of options supported by the CLI, please refer to the <a href="./command-line-interface.md">docs</a>.</p> +<p>The <code>DEFAULT_BASE_LOCATION</code> you provide will be the default location that +objects in this catalog should be stored in, and the <code>ROLE_ARN</code> you provide +should be a <a href="https://docs.aws.amazon.com/IAM/latest/UserGuide/reference-arns.html">Role ARN</a> +with access to read and write data in that location. These credentials will be +provided to engines reading data from the catalog once they have authenticated +with Polaris using credentials that have access to those resources.</p> +<p>If you’re using a storage type other than S3, such as Azure, you’ll provide a +different type of credential than a Role ARN. For more details on supported +storage types, see the <a href="./entities.md#storage-type">docs</a>. </p> +<p>Additionally, if Polaris is running somewhere other than <code>localhost:8181</code>, you +can specify the correct hostname and port by providing <code>--host</code> and <code>--port</code> +flags. For the full set of options supported by the CLI, please refer to the +<a href="./command-line-interface.md">docs</a>.</p> <h3 id="creating-a-principal-and-assigning-it-privileges">Creating a Principal and Assigning it Privileges</h3> -<p>With a catalog created, we can create a <a href="./entities.md#principal">principal</a> that has access to manage that catalog. For details on how to configure the Polaris CLI, see <a href="#defining-a-catalog">the section above</a> or refer to the <a href="./command-line-interface.md">docs</a>.</p> +<p>With a catalog created, we can create a <a href="./entities.md#principal">principal</a> +that has access to manage that catalog. For details on how to configure the +Polaris CLI, see <a href="#defining-a-catalog">the section above</a> or refer to the +<a href="./command-line-interface.md">docs</a>.</p> <pre><code><span class="token punctuation">.</span><span class="token operator">/</span>polaris \ <span class="token operator">--</span>client<span class="token operator">-</span>id $<span class="token punctuation">{</span>CLIENT_ID<span class="token punctuation">}</span> \ <span class="token operator">--</span>client<span class="token operator">-</span>secret $<span class="token punctuation">{</span>CLIENT_SECRET<span class="token punctuation">}</span> \ @@ -648,11 +706,15 @@

Building Polaris

quickstart_catalog_role </code></pre> <p>Be sure to provide the necessary credentials, hostname, and port as before.</p> -<p>When the <code>principals create</code> command completes successfully, it will return the credentials for this new principal. Be sure to note these down for later. For example:</p> +<p>When the <code>principals create</code> command completes successfully, it will return the +credentials for this new principal. Be sure to note these down for later. For +example:</p> <pre><code><span class="token punctuation">.</span><span class="token operator">/</span>polaris <span class="token punctuation">.</span><span class="token punctuation">.</span><span class="token punctuation">.</span> principals create example <span class="token punctuation">{</span><span class="token string">"clientId"</span><span class="token punctuation">:</span> <span class="token string">"XXXX"</span><span class="token punctuation">,</span> <span class="token string">"clientSecret"</span><span class="token punctuation">:</span> <span class="token string">"YYYY"</span><span class="token punctuation">}</span> </code></pre> -<p>Now, we grant the principal the <a href="./entities.md#principal-role">principal role</a> we created, and grant the <a href="./entities.md#catalog-role">catalog role</a> the principal role we created. For more information on these entities, please refer to the linked documentation.</p> +<p>Now, we grant the principal the <a href="./entities.md#principal-role">principal role</a> +we created, and grant the <a href="./entities.md#catalog-role">catalog role</a> the principal role we created. For +more information on these entities, please refer to the linked documentation.</p> <pre><code><span class="token punctuation">.</span><span class="token operator">/</span>polaris \ <span class="token operator">--</span>client<span class="token operator">-</span>id $<span class="token punctuation">{</span>CLIENT_ID<span class="token punctuation">}</span> \ <span class="token operator">--</span>client<span class="token operator">-</span>secret $<span class="token punctuation">{</span>CLIENT_SECRET<span class="token punctuation">}</span> \ @@ -672,21 +734,30 @@

Building Polaris

</code></pre> <p>Now, we’ve linked our principal to the catalog via roles like so:</p> <p><img src="./img/quickstart/privilege-illustration-1.png" alt="Principal to Catalog" title="Principal to Catalog"></p> -<p>In order to give this principal the ability to interact with the catalog, we must assign some <a href="./entities.md#privileges">privileges</a>. For the time being, we will give this principal the ability to fully manage content in our new catalog. We can do this with the CLI like so:</p> +<p>In order to give this principal the ability to interact with the catalog, we +must assign some <a href="./entities.md#privileges">privileges</a>. For the time being, we +will give this principal the ability to fully manage content in our new catalog. +We can do this with the CLI like so:</p> <pre><code><span class="token punctuation">.</span><span class="token operator">/</span>polaris \ <span class="token operator">--</span>client<span class="token operator">-</span>id $<span class="token punctuation">{</span>CLIENT_ID<span class="token punctuation">}</span> \ <span class="token operator">--</span>client<span class="token operator">-</span>secret $<span class="token punctuation">{</span>CLIENT_SECRET<span class="token punctuation">}</span> \ privileges \ - <span class="token operator">--</span>catalog quickstart_catalog \ - <span class="token operator">--</span>catalog<span class="token operator">-</span>role quickstart_catalog_role \ catalog \ grant \ + <span class="token operator">--</span>catalog quickstart_catalog \ + <span class="token operator">--</span>catalog<span class="token operator">-</span>role quickstart_catalog_role \ CATALOG_MANAGE_CONTENT </code></pre> -<p>This grants the <a href="./entities.md#privilege">catalog privileges</a> <code>CATALOG_MANAGE_CONTENT</code> to our catalog role, linking everything together like so:</p> +<p>This grants the <a href="./entities.md#privilege">catalog privileges</a> <code>CATALOG_MANAGE_CONTENT</code> to our +catalog role, linking everything together like so:</p> <p><img src="./img/quickstart/privilege-illustration-2.png" alt="Principal to Catalog with Catalog Role" title="Principal to Catalog with Catalog Role"></p> -<p><code>CATALOG_MANAGE_CONTENT</code> has create/list/read/write privileges on all entities within the catalog. The same privilege could be granted to a namespace, in which case the principal could create/list/read/write any entity under that namespace.</p> -">

In Polaris, the catalog is the top-level entity that objects like tables and views are organized under. With a Polaris service running, you can create a catalog like so:

+<p><code>CATALOG_MANAGE_CONTENT</code> has create/list/read/write privileges on all entities +within the catalog. The same privilege could be granted to a namespace, in which +case the principal could create/list/read/write any entity under that namespace.</p> +">

In Polaris, the catalog is the top-level entity that +objects like tables and views are +organized under. With a Polaris service running, you can create a catalog like +so:

cd ~/polaris
 
 ./polaris \
@@ -700,11 +771,24 @@ 

Building Polaris

quickstart_catalog

This will create a new catalog called quickstart_catalog.

-

The DEFAULT_BASE_LOCATION you provide will be the default location that objects in this catalog should be stored in, and the ROLE_ARN you provide should be a Role ARN with access to read and write data in that location. These credentials will be provided to engines reading data from the catalog once they have authenticated with Polaris using credentials that have access to those resources.

-

If you’re using a storage type other than S3, such as Azure, you’ll provide a different type of credential than a Role ARN. For more details on supported storage types, see the docs.

-

Additionally, if Polaris is running somewhere other than localhost:8181, you can specify the correct hostname and port by providing --host and --port flags. For the full set of options supported by the CLI, please refer to the docs.

+

The DEFAULT_BASE_LOCATION you provide will be the default location that +objects in this catalog should be stored in, and the ROLE_ARN you provide +should be a Role ARN +with access to read and write data in that location. These credentials will be +provided to engines reading data from the catalog once they have authenticated +with Polaris using credentials that have access to those resources.

+

If you’re using a storage type other than S3, such as Azure, you’ll provide a +different type of credential than a Role ARN. For more details on supported +storage types, see the docs.

+

Additionally, if Polaris is running somewhere other than localhost:8181, you +can specify the correct hostname and port by providing --host and --port +flags. For the full set of options supported by the CLI, please refer to the +docs.

Creating a Principal and Assigning it Privileges

-

With a catalog created, we can create a principal that has access to manage that catalog. For details on how to configure the Polaris CLI, see the section above or refer to the docs.

+

With a catalog created, we can create a principal +that has access to manage that catalog. For details on how to configure the +Polaris CLI, see the section above or refer to the +docs.

./polaris \
   --client-id ${CLIENT_ID} \
   --client-secret ${CLIENT_SECRET} \
@@ -728,11 +812,15 @@ 

Creating a Principal a quickstart_catalog_role

Be sure to provide the necessary credentials, hostname, and port as before.

-

When the principals create command completes successfully, it will return the credentials for this new principal. Be sure to note these down for later. For example:

+

When the principals create command completes successfully, it will return the +credentials for this new principal. Be sure to note these down for later. For +example:

./polaris ... principals create example
 {"clientId": "XXXX", "clientSecret": "YYYY"}
 
-

Now, we grant the principal the principal role we created, and grant the catalog role the principal role we created. For more information on these entities, please refer to the linked documentation.

+

Now, we grant the principal the principal role +we created, and grant the catalog role the principal role we created. For +more information on these entities, please refer to the linked documentation.

./polaris \
   --client-id ${CLIENT_ID} \
   --client-secret ${CLIENT_SECRET} \
@@ -752,25 +840,37 @@ 

Creating a Principal a

Now, we’ve linked our principal to the catalog via roles like so:

Principal to Catalog

-

In order to give this principal the ability to interact with the catalog, we must assign some privileges. For the time being, we will give this principal the ability to fully manage content in our new catalog. We can do this with the CLI like so:

+

In order to give this principal the ability to interact with the catalog, we +must assign some privileges. For the time being, we +will give this principal the ability to fully manage content in our new catalog. +We can do this with the CLI like so:

./polaris \
   --client-id ${CLIENT_ID} \
   --client-secret ${CLIENT_SECRET} \
   privileges \
-  --catalog quickstart_catalog \
-  --catalog-role quickstart_catalog_role \
   catalog \
   grant \
+  --catalog quickstart_catalog \
+  --catalog-role quickstart_catalog_role \
   CATALOG_MANAGE_CONTENT
 
-

This grants the catalog privileges CATALOG_MANAGE_CONTENT to our catalog role, linking everything together like so:

+

This grants the catalog privileges CATALOG_MANAGE_CONTENT to our +catalog role, linking everything together like so:

Principal to Catalog with Catalog Role

-

CATALOG_MANAGE_CONTENT has create/list/read/write privileges on all entities within the catalog. The same privilege could be granted to a namespace, in which case the principal could create/list/read/write any entity under that namespace.

-

Using Iceberg & Polaris

Using Iceberg & Polaris

Creating a Principal a <span class="token operator">--</span>conf spark<span class="token punctuation">.</span>sql<span class="token punctuation">.</span>catalog<span class="token punctuation">.</span>quickstart_catalog<span class="token punctuation">.</span>scope<span class="token operator">=</span><span class="token string">'PRINCIPAL_ROLE:ALL'</span> \ <span class="token operator">--</span>conf spark<span class="token punctuation">.</span>sql<span class="token punctuation">.</span>catalog<span class="token punctuation">.</span>quickstart_catalog<span class="token punctuation">.</span>token<span class="token operator">-</span>refresh<span class="token operator">-</span>enabled<span class="token operator">=</span><span class="token boolean">true</span> </code></pre> -<p>Replace <code>XXXX</code> and <code>YYYY</code> with the client ID and client secret generated when you created the <code>quickstart_user</code> principal.</p> -<p>Similar to the CLI commands above, this configures Spark to use the Polaris running at <code>localhost:8181</code> as a catalog. If your Polaris server is running elsewhere, but sure to update the configuration appropriately.</p> -<p>Finally, note that we include the <code>hadoop-aws</code> package here. If your table is using a different filesystem, be sure to include the appropriate dependency.</p> -<p>Once the Spark session starts, we can create a namespace and table within the catalog:</p> +<p>Replace <code>XXXX</code> and <code>YYYY</code> with the client ID and client secret generated when +you created the <code>quickstart_user</code> principal.</p> +<p>Similar to the CLI commands above, this configures Spark to use the Polaris +running at <code>localhost:8181</code> as a catalog. If your Polaris server is running +elsewhere, but sure to update the configuration appropriately.</p> +<p>Finally, note that we include the <code>hadoop-aws</code> package here. If your table is +using a different filesystem, be sure to include the appropriate dependency.</p> +<p>Once the Spark session starts, we can create a namespace and table within the +catalog:</p> <pre><code>spark<span class="token punctuation">.</span><span class="token function">sql</span><span class="token punctuation">(</span><span class="token string">"USE quickstart_catalog"</span><span class="token punctuation">)</span> spark<span class="token punctuation">.</span><span class="token function">sql</span><span class="token punctuation">(</span><span class="token string">"CREATE NAMESPACE IF NOT EXISTS quickstart_namespace"</span><span class="token punctuation">)</span> spark<span class="token punctuation">.</span><span class="token function">sql</span><span class="token punctuation">(</span><span class="token string">"CREATE NAMESPACE IF NOT EXISTS quickstart_namespace.schema"</span><span class="token punctuation">)</span> @@ -813,10 +918,10 @@

Creating a Principal a <span class="token operator">--</span>client<span class="token operator">-</span>id $<span class="token punctuation">{</span>CLIENT_ID<span class="token punctuation">}</span> \ <span class="token operator">--</span>client<span class="token operator">-</span>secret $<span class="token punctuation">{</span>CLIENT_SECRET<span class="token punctuation">}</span> \ privileges \ - <span class="token operator">--</span>catalog quickstart_catalog \ - <span class="token operator">--</span>catalog<span class="token operator">-</span>role quickstart_catalog_role \ catalog \ revoke \ + <span class="token operator">--</span>catalog quickstart_catalog \ + <span class="token operator">--</span>catalog<span class="token operator">-</span>role quickstart_catalog_role \ CATALOG_MANAGE_CONTENT </code></pre> <p>Spark will lose access to the table:</p> @@ -824,11 +929,39 @@

Creating a Principal a org<span class="token punctuation">.</span>apache<span class="token punctuation">.</span>iceberg<span class="token punctuation">.</span>exceptions<span class="token punctuation">.</span>ForbiddenException<span class="token punctuation">:</span> Forbidden<span class="token punctuation">:</span> Principal <span class="token string">'quickstart_user'</span> with activated PrincipalRoles <span class="token string">'[]'</span> and activated ids <span class="token string">'[6, 7]'</span> is not authorized <span class="token keyword">for</span> op LOAD_TABLE_WITH_READ_DELEGATION </code></pre> -">

At this point, we’ve created a principal and granted it the ability to manage a catalog. We can now use an external engine to assume that principal, access our catalog, and store data in that catalog using Apache Iceberg.

+<h3 id="connecting-with-trino">Connecting with Trino</h3> +<p>To use a Polaris-managed catalog in <a href="https://trino.io/">Trino</a>, you can +configure Trino to use the Iceberg REST API.</p> +<p>You&#39;ll need to have Trino installed, so download the <a href="https://trino.io/download">latest version of Trino</a>, +and you can follow <a href="https://trino.io/docs/current/installation.html">the Trino docs</a> +to install it. You&#39;ll also need to create a catalog per the instructions above +and generate and export a <code>PRINCIPAL_TOKEN</code> per the +<a href="/README.md#creating-a-catalog-manually">README</a>.</p> +<p>Once Trino is installed and you have your <code>PRINCIPAL_TOKEN</code>, create a catalog +properties file, <code>polaris.properties</code>, in the <code>etc/catalog/</code> directory of your +Trino installation. This is the file where you can configure Trino&#39;s Iceberg +connector. Edit it to:</p> +<pre><code>connector<span class="token punctuation">.</span>name<span class="token operator">=</span>iceberg +iceberg<span class="token punctuation">.</span>catalog<span class="token punctuation">.</span>type<span class="token operator">=</span>rest +iceberg<span class="token punctuation">.</span>rest<span class="token operator">-</span>catalog<span class="token punctuation">.</span>security<span class="token operator">=</span>OAUTH2 +iceberg<span class="token punctuation">.</span>rest<span class="token operator">-</span>catalog<span class="token punctuation">.</span>oauth2<span class="token punctuation">.</span>token<span class="token operator">=</span><span class="token punctuation">{</span>the value of your PRINCIPAL_TOKEN<span class="token punctuation">}</span> +iceberg<span class="token punctuation">.</span>rest<span class="token operator">-</span>catalog<span class="token punctuation">.</span>warehouse<span class="token operator">=</span><span class="token punctuation">{</span>your catalog name<span class="token punctuation">}</span> +iceberg<span class="token punctuation">.</span>rest<span class="token operator">-</span>catalog<span class="token punctuation">.</span>uri<span class="token operator">=</span>http<span class="token punctuation">:</span><span class="token operator">/</span><span class="token operator">/</span>localhost<span class="token punctuation">:</span><span class="token number">8181</span><span class="token operator">/</span>api<span class="token operator">/</span>catalog +</code></pre> +<p>Start (or restart) Trino, and <code>SHOW CATALOGS</code> should show the Polaris catalog. +You can then run <code>USE catalogname.schemaname</code> to access, query, or write to +Polaris.</p> +">

At this point, we’ve created a principal and granted it the ability to manage a +catalog. We can now use an external engine to assume that principal, access our +catalog, and store data in that catalog using Apache Iceberg.

Connecting with Spark

-

To use a Polaris-managed catalog in Apache Spark, we can configure Spark to use the Iceberg catalog REST API.

-

This guide uses Apache Spark 3.5, but be sure to find the appropriate iceberg-spark package for your Spark version. With a local Spark clone, we on the branch-3.5 branch we can run the following:

-

Note: the credentials provided here are those for our principal, not the root credentials.

+

To use a Polaris-managed catalog in Apache Spark, +we can configure Spark to use the Iceberg catalog REST API.

+

This guide uses Apache Spark 3.5, +but be sure to find the appropriate iceberg-spark package for your Spark version. +From a local Spark clone on the branch-3.5 branch we can run the following:

+

Note: the credentials provided here are those for our principal, not the root +credentials.

bin/spark-shell \
 --packages org.apache.iceberg:iceberg-spark-runtime-3.5_2.12:1.5.2,org.apache.hadoop:hadoop-aws:3.4.0 \
 --conf spark.sql.extensions=org.apache.iceberg.spark.extensions.IcebergSparkSessionExtensions \
@@ -841,10 +974,15 @@ 

Connecting with Spark

--conf spark.sql.catalog.quickstart_catalog.scope='PRINCIPAL_ROLE:ALL' \ --conf spark.sql.catalog.quickstart_catalog.token-refresh-enabled=true
-

Replace XXXX and YYYY with the client ID and client secret generated when you created the quickstart_user principal.

-

Similar to the CLI commands above, this configures Spark to use the Polaris running at localhost:8181 as a catalog. If your Polaris server is running elsewhere, but sure to update the configuration appropriately.

-

Finally, note that we include the hadoop-aws package here. If your table is using a different filesystem, be sure to include the appropriate dependency.

-

Once the Spark session starts, we can create a namespace and table within the catalog:

+

Replace XXXX and YYYY with the client ID and client secret generated when +you created the quickstart_user principal.

+

Similar to the CLI commands above, this configures Spark to use the Polaris +running at localhost:8181 as a catalog. If your Polaris server is running +elsewhere, but sure to update the configuration appropriately.

+

Finally, note that we include the hadoop-aws package here. If your table is +using a different filesystem, be sure to include the appropriate dependency.

+

Once the Spark session starts, we can create a namespace and table within the +catalog:

spark.sql("USE quickstart_catalog")
 spark.sql("CREATE NAMESPACE IF NOT EXISTS quickstart_namespace")
 spark.sql("CREATE NAMESPACE IF NOT EXISTS quickstart_namespace.schema")
@@ -871,10 +1009,10 @@ 

Connecting with Spark

--client-id ${CLIENT_ID} \ --client-secret ${CLIENT_SECRET} \ privileges \ - --catalog quickstart_catalog \ - --catalog-role quickstart_catalog_role \ catalog \ revoke \ + --catalog quickstart_catalog \ + --catalog-role quickstart_catalog_role \ CATALOG_MANAGE_CONTENT

Spark will lose access to the table:

@@ -882,6 +1020,28 @@

Connecting with Spark

org.apache.iceberg.exceptions.ForbiddenException: Forbidden: Principal 'quickstart_user' with activated PrincipalRoles '[]' and activated ids '[6, 7]' is not authorized for op LOAD_TABLE_WITH_READ_DELEGATION
+

Connecting with Trino

+

To use a Polaris-managed catalog in Trino, you can +configure Trino to use the Iceberg REST API.

+

You'll need to have Trino installed, so download the latest version of Trino, +and you can follow the Trino docs +to install it. You'll also need to create a catalog per the instructions above +and generate and export a PRINCIPAL_TOKEN per the +README.

+

Once Trino is installed and you have your PRINCIPAL_TOKEN, create a catalog +properties file, polaris.properties, in the etc/catalog/ directory of your +Trino installation. This is the file where you can configure Trino's Iceberg +connector. Edit it to:

+
connector.name=iceberg
+iceberg.catalog.type=rest
+iceberg.rest-catalog.security=OAUTH2
+iceberg.rest-catalog.oauth2.token={the value of your PRINCIPAL_TOKEN}
+iceberg.rest-catalog.warehouse={your catalog name}
+iceberg.rest-catalog.uri=http://localhost:8181/api/catalog
+
+

Start (or restart) Trino, and SHOW CATALOGS should show the Polaris catalog. +You can then run USE catalogname.schemaname to access, query, or write to +Polaris.

Polaris Catalog Overview

Access control

For more information, see Access control.

Polaris Catalog Entities

Access control WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. See the License for the specific language governing permissions and limitations under the License. - --> <p>This page documents various entities that can be managed in Polaris.</p> ">

This page documents various entities that can be managed in Polaris.

@@ -1343,10 +1499,10 @@

Storage Type

All catalogs in Polaris are associated with a storage type. Valid Storage Types are S3, Azure, and GCS. The FILE type is also additionally available for testing. Each of these types relates to a different storage provider where data within the catalog may reside. Depending on the storage type, various other configurations may be set for a catalog including credentials to be used when accessing data inside the catalog.

For details on how to use Storage Types in the REST API, see the API docs.

Namespace

A namespace is a logical entity that resides within a catalog and can contain other entities such as tables or views. Some other systems may refer to namespaces as schemas or databases.

-

In Polaris, namespaces can be nested up to 16 levels. For example, a.b.c.d.e.f.g is a valid namespace. b is said to reside within a, and so on.

+

In Polaris, namespaces can be nested. For example, a.b.c.d.e.f.g is a valid namespace. b is said to reside within a, and so on.

For information on managing namespaces with the REST API or for more information on what data can be associated with a namespace, see the API docs.

Table

Catalog privileges

For example, a catalog client may be configured with client credentials from the OAuth2 Authorization flow. This client would exchange its client ID and secret for an access token using the client credentials request with this endpoint (1). Subsequent requests would then use that access token.

Some clients may also handle sessions that have additional user context. These clients would use the token exchange flow to exchange a user token (the "subject" token) from the session for a more specific access token for that user, using the catalog's access token as the "actor" token (2). The user ID token is the "subject" token and can be any token type allowed by the OAuth2 token exchange flow, including a unsecured JWT token with a sub claim. This request should use the catalog's bearer token in the "Authorization" header.

Clients may also use the token exchange flow to refresh a token that is about to expire by sending a token exchange request (3). The request's "subject" token should be the expiring token. This request should use the subject token in the "Authorization" header.

-
Authorizations:
Apache_Iceberg_REST_Catalog_API_BearerAuth
Request Body schema: application/x-www-form-urlencoded
required
Any of
grant_type
required
string
Value: "client_credentials"
scope
string
client_id
required
string
Authorizations:
Apache_Iceberg_REST_Catalog_API_BearerAuth
header Parameters
Authorization
string
Request Body schema: application/x-www-form-urlencoded
required
Any of
grant_type
required
string
Value: "client_credentials"
scope
string
client_id
required
string

Client ID

This can be sent in the request body, but OAuth2 recommends sending it in a Basic Authorization header.

@@ -3470,7 +3626,7 @@

Catalog privileges

" class="sc-euGpHm sc-exayXG fwfkcU jYGAQp">

Generic base server URL, with all parts configurable

{scheme}://{host}:{port}/{basePath}/v1/{prefix}/views/rename

Request samples

Content type
application/json
{
  • "source": {
    },
  • "destination": {
    }
}

Response samples

Content type
application/json
{
  • "error": {
    }
}