Skip to content

Conversation

@colebow
Copy link

@colebow colebow commented Jul 31, 2024

Description

This adds documentation to the quickstart.md guide on using Polaris with the Trino query engine. It also references Trino as an option in README.md.

Type of change

  • Documentation

How Has This Been Tested?

This instructions have been tested locally to get Trino running with Polaris.

Please delete options that are not relevant.

  • I have performed a self-review of my code
  • I have signed and submitted the ICLA and if needed, the CCLA. See Contributing for details.

@colebow colebow requested a review from a team as a code owner August 1, 2024 17:36
Copy link
Contributor

@flyrain flyrain left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM overall. Left minor comments. Thanks a lot to add the Trino part.

README.md Outdated
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Maybe we can leave out these style-only changes for now?

Copy link
Author

@colebow colebow Aug 5, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

They're in a different commit, so I can remove it if necessary. I figured it was generally an improvement worth making while we're here, though.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

+1 to separate style-only changes to make review easier.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

When I tried this, I had trouble creating a table. Let's make sure the quickstart contains a working end-to-end example

```

Start (or restart) Trino, and `SHOW CATALOGS` should show the Polaris catalog.
You can then run `USE catalogname.schemaname` to access, query, or write to
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yeah, I'd love if this was end-to-end and involved creating a table and interacting with it. I noticed in my testing that Trino requires some additional permissions that are not granted above.

Copy link
Contributor

@flyrain flyrain left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks @colebow for working on it. Can you rebase it?

## Prerequisites

This guide covers building Polaris, deploying it locally or via [Docker](https://www.docker.com/), and interacting with it using the command-line interface and [Apache Spark](https://spark.apache.org/). Before proceeding with Polaris, be sure to satisfy the relevant prerequisites listed here.
This guide covers building Polaris, deploying it locally or via [Docker](https://www.docker.com/),
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Change formatting make harder to review the PR (not focusing on the core change).

url: ./img/logos/polaris-catalog-stacked-logo.svg
altText: Polaris Catalog Logo
description: "<!--\n\n Copyright (c) 2024 Snowflake Computing Inc.\n \n Licensed under the Apache License, Version 2.0 (the \"License\");\n you may not use this file except in compliance with the License.\n You may obtain a copy of the License at\n \n http://www.apache.org/licenses/LICENSE-2.0\n \n Unless required by applicable law or agreed to in writing, software\n distributed under the License is distributed on an \"AS IS\" BASIS,\n WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.\n See the License for the specific language governing permissions and\n limitations under the License.\n\n-->\n\n# Quick Start\n\nThis guide serves as a introduction to several key entities that can be managed with Polaris, describes how to build and deploy Polaris locally, and finally includes examples of how to use Polaris with Spark and Trino.\n\n## Prerequisites\n\nThis guide covers building Polaris, deploying it locally or via [Docker](https://www.docker.com/), and interacting with it using the command-line interface and [Apache Spark](https://spark.apache.org/). Before proceeding with Polaris, be sure to satisfy the relevant prerequisites listed here. \n\n### Building and Deploying Polaris\n\nTo get the latest Polaris code, you'll need to clone the repository using [git](https://git-scm.com/). You can install git using [homebrew](https://brew.sh/):\n\n```\nbrew install git\n```\n\nThen, use git to clone the Polaris repo:\n\n```\ncd ~\ngit clone https://github.com/polaris-catalog/polaris.git\n```\n\n#### With Docker\n\nIf you plan to deploy Polaris inside [Docker](https://www.docker.com/)], you'll need to install docker itself. For can be done using [homebrew](https://brew.sh/):\n\n```\nbrew install docker\n```\n\nOnce installed, make sure Docker is running. This can be done on macOS with:\n\n```\nopen -a Docker\n```\n\n#### From Source\n\nIf you plan to build Polaris from source yourself, you will need to satisfy a few prerequisites first.\n\nPolaris is built using [gradle](https://gradle.org/) and is compatible with Java 21. We recommend the use of [jenv](https://www.jenv.be/) to manage multiple Java versions. For example, to install Java 21 via [homebre]w(https://brew.sh/) and configure it with jenv: \n\n```\ncd ~/polaris\njenv local 21\nbrew install openjdk@21 gradle@8 jenv\njenv add $(brew --prefix openjdk@21)\njenv local 21\n```\n\n### Connecting to Polaris\n\nPolaris is compatible with any [Apache Iceberg](https://iceberg.apache.org/) client that supports the REST API. Depending on the client you plan to use, refer to the prerequisites below.\n\n#### With Spark\n\nIf you want to connect to Polaris with [Apache Spark](https://spark.apache.org/), you'll need to start by cloning Spark. As [above](#building-and-deploying-polaris), make sure [git](https://git-scm.com/) is installed first. You can install it with [homebrew](https://brew.sh/):\n\n```\nbrew install git\n```\n\nThen, clone Spark and check out a versioned branch. This guide uses [Spark 3.5.0](https://spark.apache.org/releases/spark-release-3-5-0.html).\n\n```\ncd ~\ngit clone https://github.com/apache/spark.git\ncd ~/spark\ngit checkout branch-3.5.0\n```\n\n## Deploying Polaris \n\nPolaris can be deployed via a lightweight docker image or as a standalone process. Before starting, be sure that you've satisfied the relevant [prerequisites](#building-and-deploying-polaris) detailed above.\n\n### Docker Image\n\nTo start using Polaris in Docker, launch Polaris while Docker is running:\n\n```\ncd ~/polaris\ndocker compose -f docker-compose.yml up --build\n```\n\nOnce the `polaris-polaris` container is up, you can continue to [Defining a Catalog](#defining-a-catalog).\n\n### Building Polaris\n\nRun Polaris locally with:\n\n```\ncd ~/polaris\n./gradlew runApp\n```\n\nYou should see output for some time as Polaris builds and starts up. Eventually, you won’t see any more logs and should see messages that resemble the following:\n\n```\nINFO [...] [main] [] o.e.j.s.handler.ContextHandler: Started i.d.j.MutableServletContextHandler@...\nINFO [...] [main] [] o.e.j.server.AbstractConnector: Started application@...\nINFO [...] [main] [] o.e.j.server.AbstractConnector: Started admin@...\nINFO [...] [main] [] o.eclipse.jetty.server.Server: Started Server@...\n```\n\nAt this point, Polaris is running.\n\n## Bootstrapping Polaris\n\nFor this tutorial, we'll launch an instance of Polaris that stores entities only in-memory. This means that any entities that you define will be destroyed when Polaris is shut down. It also means that Polaris will automatically bootstrap itself with root credentials. For more information on how to configure Polaris for production usage, see the [docs](./configuring-polaris-for-production.md).\n\nWhen Polaris is launched using in-memory mode the root `CLIENT_ID` and `CLIENT_SECRET` can be found in stdout on initial startup. For example:\n\n```\nBootstrapped with credentials: {\"client-id\": \"XXXX\", \"client-secret\": \"YYYY\"}\n```\n\nBe sure to note of these credentials as we'll be using them below.\n\n## Defining a Catalog\n\nIn Polaris, the [catalog](./entities/catalog.md) is the top-level entity that objects like [tables](./entities.md#table) and [views](./entities.md#view) are organized under. With a Polaris service running, you can create a catalog like so:\n\n```\ncd ~/polaris\n\n./polaris \\\n --client-id ${CLIENT_ID} \\\n --client-secret ${CLIENT_SECRET} \\\n catalogs \\\n create \\\n --storage-type s3 \\\n --default-base-location ${DEFAULT_BASE_LOCATION} \\\n --role-arn ${ROLE_ARN} \\\n quickstart_catalog\n```\n\nThis will create a new catalog called **quickstart_catalog**. \n\nThe `DEFAULT_BASE_LOCATION` you provide will be the default location that objects in this catalog should be stored in, and the `ROLE_ARN` you provide should be a [Role ARN](https://docs.aws.amazon.com/IAM/latest/UserGuide/reference-arns.html) with access to read and write data in that location. These credentials will be provided to engines reading data from the catalog once they have authenticated with Polaris using credentials that have access to those resources.\n\nIf you’re using a storage type other than S3, such as Azure, you’ll provide a different type of credential than a Role ARN. For more details on supported storage types, see the [docs](./entities.md#storage-type). \n\nAdditionally, if Polaris is running somewhere other than `localhost:8181`, you can specify the correct hostname and port by providing `--host` and `--port` flags. For the full set of options supported by the CLI, please refer to the [docs](./command-line-interface.md).\n\n\n### Creating a Principal and Assigning it Privileges\n\nWith a catalog created, we can create a [principal](./entities.md#principal) that has access to manage that catalog. For details on how to configure the Polaris CLI, see [the section above](#defining-a-catalog) or refer to the [docs](./command-line-interface.md).\n\n```\n./polaris \\\n --client-id ${CLIENT_ID} \\\n --client-secret ${CLIENT_SECRET} \\\n principals \\\n create \\\n quickstart_user\n\n./polaris \\\n --client-id ${CLIENT_ID} \\\n --client-secret ${CLIENT_SECRET} \\\n principal-roles \\\n create \\\n quickstart_user_role\n\n./polaris \\\n --client-id ${CLIENT_ID} \\\n --client-secret ${CLIENT_SECRET} \\\n catalog-roles \\\n create \\\n --catalog quickstart_catalog \\\n quickstart_catalog_role\n```\n\n\nBe sure to provide the necessary credentials, hostname, and port as before.\n\nWhen the `principals create` command completes successfully, it will return the credentials for this new principal. Be sure to note these down for later. For example:\n\n```\n./polaris ... principals create example\n{\"clientId\": \"XXXX\", \"clientSecret\": \"YYYY\"}\n```\n\nNow, we grant the principal the [principal role](./entities.md#principal-role) we created, and grant the [catalog role](./entities.md#catalog-role) the principal role we created. For more information on these entities, please refer to the linked documentation.\n\n```\n./polaris \\\n --client-id ${CLIENT_ID} \\\n --client-secret ${CLIENT_SECRET} \\\n principal-roles \\\n grant \\\n --principal quickstart_user \\\n quickstart_user_role\n\n./polaris \\\n --client-id ${CLIENT_ID} \\\n --client-secret ${CLIENT_SECRET} \\\n catalog-roles \\\n grant \\\n --catalog quickstart_catalog \\\n --principal-role quickstart_user_role \\\n quickstart_catalog_role\n```\n\nNow, we’ve linked our principal to the catalog via roles like so:\n\n![Principal to Catalog](./img/quickstart/privilege-illustration-1.png \"Principal to Catalog\")\n\nIn order to give this principal the ability to interact with the catalog, we must assign some [privileges](./entities.md#privileges). For the time being, we will give this principal the ability to fully manage content in our new catalog. We can do this with the CLI like so:\n\n```\n./polaris \\\n --client-id ${CLIENT_ID} \\\n --client-secret ${CLIENT_SECRET} \\\n privileges \\\n --catalog quickstart_catalog \\\n --catalog-role quickstart_catalog_role \\\n catalog \\\n grant \\\n CATALOG_MANAGE_CONTENT\n```\n\nThis grants the [catalog privileges](./entities.md#privilege) `CATALOG_MANAGE_CONTENT` to our catalog role, linking everything together like so:\n\n![Principal to Catalog with Catalog Role](./img/quickstart/privilege-illustration-2.png \"Principal to Catalog with Catalog Role\")\n\n`CATALOG_MANAGE_CONTENT` has create/list/read/write privileges on all entities within the catalog. The same privilege could be granted to a namespace, in which case the principal could create/list/read/write any entity under that namespace.\n\n## Using Iceberg & Polaris\n\nAt this point, we’ve created a principal and granted it the ability to manage a catalog. We can now use an external engine to assume that principal, access our catalog, and store data in that catalog using [Apache Iceberg](https://iceberg.apache.org/).\n\n### Connecting with Spark\n\nTo use a Polaris-managed catalog in [Apache Spark](https://spark.apache.org/), we can configure Spark to use the Iceberg catalog REST API. \n\nThis guide uses [Apache Spark 3.5](https://spark.apache.org/releases/spark-release-3-5-0.html), but be sure to find [the appropriate iceberg-spark package for your Spark version](https://mvnrepository.com/artifact/org.apache.iceberg/iceberg-spark). With a local Spark clone, we on the `branch-3.5` branch we can run the following:\n\n_Note: the credentials provided here are those for our principal, not the root credentials._\n\n```\nbin/spark-shell \\\n--packages org.apache.iceberg:iceberg-spark-runtime-3.5_2.12:1.5.2,org.apache.hadoop:hadoop-aws:3.4.0 \\\n--conf spark.sql.extensions=org.apache.iceberg.spark.extensions.IcebergSparkSessionExtensions \\\n--conf spark.sql.catalog.quickstart_catalog.warehouse=quickstart_catalog \\\n--conf spark.sql.catalog.quickstart_catalog.header.X-Iceberg-Access-Delegation=true \\\n--conf spark.sql.catalog.quickstart_catalog=org.apache.iceberg.spark.SparkCatalog \\\n--conf spark.sql.catalog.quickstart_catalog.catalog-impl=org.apache.iceberg.rest.RESTCatalog \\\n--conf spark.sql.catalog.quickstart_catalog.uri=http://localhost:8181/api/catalog \\\n--conf spark.sql.catalog.quickstart_catalog.credential='XXXX:YYYY' \\\n--conf spark.sql.catalog.quickstart_catalog.scope='PRINCIPAL_ROLE:ALL' \\\n--conf spark.sql.catalog.quickstart_catalog.token-refresh-enabled=true\n```\n\n\nReplace `XXXX` and `YYYY` with the client ID and client secret generated when you created the `quickstart_user` principal.\n\nSimilar to the CLI commands above, this configures Spark to use the Polaris running at `localhost:8181` as a catalog. If your Polaris server is running elsewhere, but sure to update the configuration appropriately.\n\nFinally, note that we include the `hadoop-aws` package here. If your table is using a different filesystem, be sure to include the appropriate dependency.\n\nOnce the Spark session starts, we can create a namespace and table within the catalog:\n\n```\nspark.sql(\"USE quickstart_catalog\")\nspark.sql(\"CREATE NAMESPACE IF NOT EXISTS quickstart_namespace\")\nspark.sql(\"CREATE NAMESPACE IF NOT EXISTS quickstart_namespace.schema\")\nspark.sql(\"USE NAMESPACE quickstart_namespace.schema\")\nspark.sql(\"\"\"\n\tCREATE TABLE IF NOT EXISTS quickstart_table (\n\t\tid BIGINT, data STRING\n\t) \nUSING ICEBERG\n\"\"\")\n```\n\nWe can now use this table like any other:\n\n```\nspark.sql(\"INSERT INTO quickstart_table VALUES (1, 'some data')\")\nspark.sql(\"SELECT * FROM quickstart_table\").show(false)\n. . .\n+---+---------+\n|id |data |\n+---+---------+\n|1 |some data|\n+---+---------+\n```\n\nIf at any time access is revoked...\n\n```\n./polaris \\\n --client-id ${CLIENT_ID} \\\n --client-secret ${CLIENT_SECRET} \\\n privileges \\\n --catalog quickstart_catalog \\\n --catalog-role quickstart_catalog_role \\\n catalog \\\n revoke \\\n CATALOG_MANAGE_CONTENT\n```\n\nSpark will lose access to the table:\n\n```\nspark.sql(\"SELECT * FROM quickstart_table\").show(false)\n\norg.apache.iceberg.exceptions.ForbiddenException: Forbidden: Principal 'quickstart_user' with activated PrincipalRoles '[]' and activated ids '[6, 7]' is not authorized for op LOAD_TABLE_WITH_READ_DELEGATION\n```\n"
description: "<!--\n Copyright (c) 2024 Snowflake Computing Inc.\n \n Licensed under the Apache License, Version 2.0 (the \"License\");\n you may not use this file except in compliance with the License.\n You may obtain a copy of the License at\n \n http://www.apache.org/licenses/LICENSE-2.0\n \n Unless required by applicable law or agreed to in writing, software\n distributed under the License is distributed on an \"AS IS\" BASIS,\n WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.\n See the License for the specific language governing permissions and\n limitations under the License.\n-->\n\n# Quick Start\n\nThis guide serves as a introduction to several key entities that can be managed\nwith Polaris, describes how to build and deploy Polaris locally, and finally\nincludes examples of how to use Polaris with Apache Spark and Trino.\n\n## Prerequisites\n\nThis guide covers building Polaris, deploying it locally or via [Docker](https://www.docker.com/),\nand interacting with it using the command-line interface and\n[Apache Spark](https://spark.apache.org/). Before proceeding with Polaris, be\nsure to satisfy the relevant prerequisites listed here. \n\n### Building and Deploying Polaris\n\nTo get the latest Polaris code, you'll need to clone the repository using\n[git](https://git-scm.com/). You can install git using [homebrew](https://brew.sh/):\n\n```\nbrew install git\n```\n\nThen, use git to clone the Polaris repo:\n\n```\ncd ~\ngit clone https://github.com/polaris-catalog/polaris.git\n```\n\n#### With Docker\n\nIf you plan to deploy Polaris inside [Docker](https://www.docker.com/)], you'll\nneed to install docker itself. For example, this can be done using\n[homebrew](https://brew.sh/):\n\n```\nbrew install --cask docker\n```\n\nOnce installed, make sure Docker is running.\n\n#### From Source\n\nIf you plan to build Polaris from source yourself, you will need to satisfy a\nfew prerequisites first.\n\nPolaris is built using [gradle](https://gradle.org/) and is compatible with\nJava 21. We recommend the use of [jenv](https://www.jenv.be/) to manage multiple\nJava versions. For example, to install Java 21 via [homebrew](https://brew.sh/)\nand configure it with jenv:\n\n```\ncd ~/polaris\njenv local 21\nbrew install openjdk@21 gradle@8 jenv\njenv add $(brew --prefix openjdk@21)\njenv local 21\n```\n\n### Connecting to Polaris\n\nPolaris is compatible with any [Apache Iceberg](https://iceberg.apache.org/)\nclient that supports the REST API. Depending on the client you plan to use,\nrefer to the prerequisites below.\n\n#### With Spark\n\nIf you want to connect to Polaris with [Apache Spark](https://spark.apache.org/),\nyou'll need to start by cloning Spark. As [above](#building-and-deploying-polaris),\nmake sure [git](https://git-scm.com/) is installed first. You can install it with [homebrew](https://brew.sh/):\n\n```\nbrew install git\n```\n\nThen, clone Spark and check out a versioned branch. This guide uses [Spark 3.5](https://spark.apache.org/releases/spark-release-3-5-0.html).\n\n```\ncd ~\ngit clone https://github.com/apache/spark.git\ncd ~/spark\ngit checkout branch-3.5\n```\n\n## Deploying Polaris \n\nPolaris can be deployed via a lightweight docker image or as a standalone\nprocess. Before starting, be sure that you've satisfied the relevant\n[prerequisites](#building-and-deploying-polaris) detailed above.\n\n### Docker Image\n\nTo start using Polaris in Docker, launch Polaris while Docker is running:\n\n```\ncd ~/polaris\ndocker compose -f docker-compose.yml up --build\n```\n\nOnce the `polaris-polaris` container is up, you can continue to\n[Defining a Catalog](#defining-a-catalog).\n\n### Building Polaris\n\nRun Polaris locally with:\n\n```\ncd ~/polaris\n./gradlew runApp\n```\n\nYou should see output for some time as Polaris builds and starts up. Eventually,\nyou won’t see any more logs and should see messages that resemble the following:\n\n```\nINFO [...] [main] [] o.e.j.s.handler.ContextHandler: Started i.d.j.MutableServletContextHandler@...\nINFO [...] [main] [] o.e.j.server.AbstractConnector: Started application@...\nINFO [...] [main] [] o.e.j.server.AbstractConnector: Started admin@...\nINFO [...] [main] [] o.eclipse.jetty.server.Server: Started Server@...\n```\n\nAt this point, Polaris is running.\n\n## Bootstrapping Polaris\n\nFor this tutorial, we'll launch an instance of Polaris that stores entities only\nin-memory. This means that any entities that you define will be destroyed when\nPolaris is shut down. It also means that Polaris will automatically bootstrap\nitself with root credentials. For more information on how to configure Polaris\nfor production usage, see the [docs](./configuring-polaris-for-production.md).\n\nWhen Polaris is launched using in-memory mode the root `CLIENT_ID` and\n`CLIENT_SECRET` can be found in stdout on initial startup. For example:\n\n```\nrealm: default-realm root principal credentials: XXXX:YYYY\n```\n\nBe sure to note of these credentials as we'll be using them below.\n\n## Defining a Catalog\n\nIn Polaris, the [catalog](./entities/catalog.md) is the top-level entity that\nobjects like [tables](./entities.md#table) and [views](./entities.md#view) are\norganized under. With a Polaris service running, you can create a catalog like\nso:\n\n```\ncd ~/polaris\n\n./polaris \\\n --client-id ${CLIENT_ID} \\\n --client-secret ${CLIENT_SECRET} \\\n catalogs \\\n create \\\n --storage-type s3 \\\n --default-base-location ${DEFAULT_BASE_LOCATION} \\\n --role-arn ${ROLE_ARN} \\\n quickstart_catalog\n```\n\nThis will create a new catalog called **quickstart_catalog**. \n\nThe `DEFAULT_BASE_LOCATION` you provide will be the default location that\nobjects in this catalog should be stored in, and the `ROLE_ARN` you provide\nshould be a [Role ARN](https://docs.aws.amazon.com/IAM/latest/UserGuide/reference-arns.html)\nwith access to read and write data in that location. These credentials will be\nprovided to engines reading data from the catalog once they have authenticated\nwith Polaris using credentials that have access to those resources.\n\nIf you’re using a storage type other than S3, such as Azure, you’ll provide a\ndifferent type of credential than a Role ARN. For more details on supported\nstorage types, see the [docs](./entities.md#storage-type). \n\nAdditionally, if Polaris is running somewhere other than `localhost:8181`, you\ncan specify the correct hostname and port by providing `--host` and `--port`\nflags. For the full set of options supported by the CLI, please refer to the\n[docs](./command-line-interface.md).\n\n\n### Creating a Principal and Assigning it Privileges\n\nWith a catalog created, we can create a [principal](./entities.md#principal)\nthat has access to manage that catalog. For details on how to configure the\nPolaris CLI, see [the section above](#defining-a-catalog) or refer to the\n[docs](./command-line-interface.md).\n\n```\n./polaris \\\n --client-id ${CLIENT_ID} \\\n --client-secret ${CLIENT_SECRET} \\\n principals \\\n create \\\n quickstart_user\n\n./polaris \\\n --client-id ${CLIENT_ID} \\\n --client-secret ${CLIENT_SECRET} \\\n principal-roles \\\n create \\\n quickstart_user_role\n\n./polaris \\\n --client-id ${CLIENT_ID} \\\n --client-secret ${CLIENT_SECRET} \\\n catalog-roles \\\n create \\\n --catalog quickstart_catalog \\\n quickstart_catalog_role\n```\n\n\nBe sure to provide the necessary credentials, hostname, and port as before.\n\nWhen the `principals create` command completes successfully, it will return the\ncredentials for this new principal. Be sure to note these down for later. For\nexample:\n\n```\n./polaris ... principals create example\n{\"clientId\": \"XXXX\", \"clientSecret\": \"YYYY\"}\n```\n\nNow, we grant the principal the [principal role](./entities.md#principal-role)\nwe created, and grant the [catalog role](./entities.md#catalog-role) the principal role we created. For\nmore information on these entities, please refer to the linked documentation.\n\n```\n./polaris \\\n --client-id ${CLIENT_ID} \\\n --client-secret ${CLIENT_SECRET} \\\n principal-roles \\\n grant \\\n --principal quickstart_user \\\n quickstart_user_role\n\n./polaris \\\n --client-id ${CLIENT_ID} \\\n --client-secret ${CLIENT_SECRET} \\\n catalog-roles \\\n grant \\\n --catalog quickstart_catalog \\\n --principal-role quickstart_user_role \\\n quickstart_catalog_role\n```\n\nNow, we’ve linked our principal to the catalog via roles like so:\n\n![Principal to Catalog](./img/quickstart/privilege-illustration-1.png \"Principal to Catalog\")\n\nIn order to give this principal the ability to interact with the catalog, we\nmust assign some [privileges](./entities.md#privileges). For the time being, we\nwill give this principal the ability to fully manage content in our new catalog.\nWe can do this with the CLI like so:\n\n```\n./polaris \\\n --client-id ${CLIENT_ID} \\\n --client-secret ${CLIENT_SECRET} \\\n privileges \\\n catalog \\\n grant \\\n --catalog quickstart_catalog \\\n --catalog-role quickstart_catalog_role \\\n CATALOG_MANAGE_CONTENT\n```\n\nThis grants the [catalog privileges](./entities.md#privilege) `CATALOG_MANAGE_CONTENT` to our\ncatalog role, linking everything together like so:\n\n![Principal to Catalog with Catalog Role](./img/quickstart/privilege-illustration-2.png \"Principal to Catalog with Catalog Role\")\n\n`CATALOG_MANAGE_CONTENT` has create/list/read/write privileges on all entities\nwithin the catalog. The same privilege could be granted to a namespace, in which\ncase the principal could create/list/read/write any entity under that namespace.\n\n## Using Iceberg & Polaris\n\nAt this point, we’ve created a principal and granted it the ability to manage a\ncatalog. We can now use an external engine to assume that principal, access our\ncatalog, and store data in that catalog using [Apache Iceberg](https://iceberg.apache.org/).\n\n### Connecting with Spark\n\nTo use a Polaris-managed catalog in [Apache Spark](https://spark.apache.org/),\nwe can configure Spark to use the Iceberg catalog REST API. \n\nThis guide uses [Apache Spark 3.5](https://spark.apache.org/releases/spark-release-3-5-0.html),\nbut be sure to find [the appropriate iceberg-spark package for your Spark version](https://mvnrepository.com/artifact/org.apache.iceberg/iceberg-spark).\nFrom a local Spark clone on the `branch-3.5` branch we can run the following:\n\n_Note: the credentials provided here are those for our principal, not the root\ncredentials._\n\n```\nbin/spark-shell \\\n--packages org.apache.iceberg:iceberg-spark-runtime-3.5_2.12:1.5.2,org.apache.hadoop:hadoop-aws:3.4.0 \\\n--conf spark.sql.extensions=org.apache.iceberg.spark.extensions.IcebergSparkSessionExtensions \\\n--conf spark.sql.catalog.quickstart_catalog.warehouse=quickstart_catalog \\\n--conf spark.sql.catalog.quickstart_catalog.header.X-Iceberg-Access-Delegation=true \\\n--conf spark.sql.catalog.quickstart_catalog=org.apache.iceberg.spark.SparkCatalog \\\n--conf spark.sql.catalog.quickstart_catalog.catalog-impl=org.apache.iceberg.rest.RESTCatalog \\\n--conf spark.sql.catalog.quickstart_catalog.uri=http://localhost:8181/api/catalog \\\n--conf spark.sql.catalog.quickstart_catalog.credential='XXXX:YYYY' \\\n--conf spark.sql.catalog.quickstart_catalog.scope='PRINCIPAL_ROLE:ALL' \\\n--conf spark.sql.catalog.quickstart_catalog.token-refresh-enabled=true\n```\n\n\nReplace `XXXX` and `YYYY` with the client ID and client secret generated when\nyou created the `quickstart_user` principal.\n\nSimilar to the CLI commands above, this configures Spark to use the Polaris\nrunning at `localhost:8181` as a catalog. If your Polaris server is running\nelsewhere, but sure to update the configuration appropriately.\n\nFinally, note that we include the `hadoop-aws` package here. If your table is\nusing a different filesystem, be sure to include the appropriate dependency.\n\nOnce the Spark session starts, we can create a namespace and table within the\ncatalog:\n\n```\nspark.sql(\"USE quickstart_catalog\")\nspark.sql(\"CREATE NAMESPACE IF NOT EXISTS quickstart_namespace\")\nspark.sql(\"CREATE NAMESPACE IF NOT EXISTS quickstart_namespace.schema\")\nspark.sql(\"USE NAMESPACE quickstart_namespace.schema\")\nspark.sql(\"\"\"\n\tCREATE TABLE IF NOT EXISTS quickstart_table (\n\t\tid BIGINT, data STRING\n\t) \nUSING ICEBERG\n\"\"\")\n```\n\nWe can now use this table like any other:\n\n```\nspark.sql(\"INSERT INTO quickstart_table VALUES (1, 'some data')\")\nspark.sql(\"SELECT * FROM quickstart_table\").show(false)\n. . .\n+---+---------+\n|id |data |\n+---+---------+\n|1 |some data|\n+---+---------+\n```\n\nIf at any time access is revoked...\n\n```\n./polaris \\\n --client-id ${CLIENT_ID} \\\n --client-secret ${CLIENT_SECRET} \\\n privileges \\\n catalog \\\n revoke \\\n --catalog quickstart_catalog \\\n --catalog-role quickstart_catalog_role \\\n CATALOG_MANAGE_CONTENT\n```\n\nSpark will lose access to the table:\n\n```\nspark.sql(\"SELECT * FROM quickstart_table\").show(false)\n\norg.apache.iceberg.exceptions.ForbiddenException: Forbidden: Principal 'quickstart_user' with activated PrincipalRoles '[]' and activated ids '[6, 7]' is not authorized for op LOAD_TABLE_WITH_READ_DELEGATION\n```\n\n### Connecting with Trino\n\nTo use a Polaris-managed catalog in [Trino](https://trino.io/), you can\nconfigure Trino to use the Iceberg REST API.\n\nYou'll need to have Trino installed, so download the [latest version of Trino](https://trino.io/download),\nand you can follow [the Trino docs](https://trino.io/docs/current/installation.html)\nto install it. You'll also need to create a catalog per the instructions above\nand generate and export a `PRINCIPAL_TOKEN` per the\n[README](/README.md#creating-a-catalog-manually).\n\nOnce Trino is installed and you have your `PRINCIPAL_TOKEN`, create a catalog\nproperties file, `polaris.properties`, in the `etc/catalog/` directory of your\nTrino installation. This is the file where you can configure Trino's Iceberg\nconnector. Edit it to:\n\n```\nconnector.name=iceberg\niceberg.catalog.type=rest\niceberg.rest-catalog.security=OAUTH2\niceberg.rest-catalog.oauth2.token={the value of your PRINCIPAL_TOKEN}\niceberg.rest-catalog.warehouse={your catalog name}\niceberg.rest-catalog.uri=http://localhost:8181/api/catalog\n```\n\nStart (or restart) Trino, and `SHOW CATALOGS` should show the Polaris catalog.\nYou can then run `USE catalogname.schemaname` to access, query, or write to\nPolaris."
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This file has been removed. Can you please rebase ?

@cgpoh
Copy link
Contributor

cgpoh commented Sep 1, 2024

@colebow , for my trino configuration working with ADLS, I need to have the following properties:

test_bronze: |-
        connector.name=iceberg
        iceberg.catalog.type=rest
        iceberg.rest-catalog.security=OAUTH2
        iceberg.rest-catalog.oauth2.credential=key:secret
        iceberg.rest-catalog.oauth2.scope=PRINCIPAL_ROLE:ALL
        iceberg.rest-catalog.warehouse=test_bronze
        iceberg.rest-catalog.vended-credentials-enabled=false
        iceberg.rest-catalog.uri=http://polaris-catalog.test.svc:8181/api/catalog
        fs.native-azure.enabled=true
        azure.auth-type=OAUTH
        azure.oauth.tenant-id=tenant-id
        azure.oauth.endpoint=https://login.microsoftonline.com/tenant-id/oauth2/token
        azure.oauth.client-id=client-id
        azure.oauth.secret=client-secret

Do we need a separate PR to address this ADLS Trino integration?

@mayankvadariya
Copy link

Do we still need this PR? given its already documented at https://github.com/apache/polaris/blob/main/getting-started/trino/README.md

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

8 participants