Skip to content

Commit 06d5231

Browse files
authored
docs: Update S3 getting started guides (#2652)
1 parent 44568da commit 06d5231

File tree

7 files changed

+60
-116
lines changed

7 files changed

+60
-116
lines changed

site/content/in-dev/unreleased/getting-started/creating-a-catalog/s3/catalog-aws.md

Lines changed: 13 additions & 8 deletions
Original file line numberDiff line numberDiff line change
@@ -23,16 +23,21 @@ type: docs
2323
weight: 100
2424
---
2525

26+
When creating a catalog based on AWS S3 storage only the `role-arn` is a required parameter. However, usually
27+
one also provides the `region` and
28+
[external-id](https://docs.aws.amazon.com/IAM/latest/UserGuide/id_roles_common-scenarios_third-party.html).
2629

27-
### example
30+
Note: the name `quickstart_catalog` from the example below is referenced in other Getting Started examples,
31+
but of course, it can be any valid catalog name.
2832

2933
```shell
30-
CLIENT_ID=root \
31-
CLIENT_SECRET=s3cr3t \
32-
DEFAULT_BASE_LOCATION=s3://example-bucket/my_data \
33-
ROLE_ARN=arn:aws:iam::111122223333:role/ExampleCorpRole \
34-
REGION=us-west-2 \
35-
EXTERNAL_ID=12345678901234567890 \
34+
CLIENT_ID=root
35+
CLIENT_SECRET=s3cr3t
36+
DEFAULT_BASE_LOCATION=s3://example-bucket/my_data
37+
ROLE_ARN=arn:aws:iam::111122223333:role/ExampleCorpRole
38+
REGION=us-west-2
39+
EXTERNAL_ID=12345678901234567890
40+
3641
./polaris \
3742
--client-id ${CLIENT_ID} \
3843
--client-secret ${CLIENT_SECRET} \
@@ -43,5 +48,5 @@ EXTERNAL_ID=12345678901234567890 \
4348
--role-arn ${ROLE_ARN} \
4449
--region ${REGION} \
4550
--external-id ${EXTERNAL_ID} \
46-
my_aws_catalog
51+
quickstart_catalog
4752
```

site/content/in-dev/unreleased/getting-started/creating-a-catalog/s3/catalog-minio.md

Lines changed: 30 additions & 83 deletions
Original file line numberDiff line numberDiff line change
@@ -23,94 +23,41 @@ type: docs
2323
weight: 200
2424
---
2525

26-
In this guide we walk through setting up a simple Polaris Server with local [MinIO](https://www.min.io/) storage.
26+
When creating a catalog based on MinIO storage it is important to configure the `endpoint` property to point
27+
to your own MinIO cluster. If the `endpoint` property is not set, Polaris will attempt to contact AWS
28+
storage services (which is certain to fail in this case).
2729

28-
Similar configurations are expected to work with other S3-compatible systems that also have the
29-
[STS](https://docs.aws.amazon.com/STS/latest/APIReference/welcome.html) API.
30+
Note: the region setting is not required by MinIO, but it is set in this example for the sake of
31+
simplicity as it is usually required by the AWS SDK (used internally by Polaris). One can also
32+
set the `AWS_REGION` environment variable in the Polaris server process and avoid setting region
33+
as a catalog property.
3034

31-
# Setup
32-
33-
Clone the Polaris source repository, then build a docker image for Polaris.
35+
Note: the name `quickstart_catalog` from the example below is referenced in other Getting Started examples,
36+
but of course, it can be any valid catalog name.
3437

3538
```shell
36-
./gradlew :polaris-server:assemble -Dquarkus.container-image.build=true
37-
```
38-
39-
Start MinIO with Polaris using the `docker compose` example.
40-
41-
```shell
42-
docker compose -f getting-started/minio/docker-compose.yml up
43-
```
44-
45-
The compose script will start MinIO on default ports (API on 9000, UI on 9001)
46-
plus a Polaris Server pre-configured to that MinIO instance.
47-
48-
In this example the `root` principal has its password set to `s3cr3t`.
49-
50-
# Connecting from Spark
51-
52-
Start Spark.
53-
54-
```shell
55-
export AWS_REGION=us-west-2
56-
57-
bin/spark-sql \
58-
--packages org.apache.iceberg:iceberg-spark-runtime-3.5_2.12:1.9.0,org.apache.iceberg:iceberg-aws-bundle:1.9.0 \
59-
--conf spark.sql.extensions=org.apache.iceberg.spark.extensions.IcebergSparkSessionExtensions \
60-
--conf spark.sql.catalog.polaris=org.apache.iceberg.spark.SparkCatalog \
61-
--conf spark.sql.catalog.polaris.type=rest \
62-
--conf spark.sql.catalog.polaris.uri=http://localhost:8181/api/catalog \
63-
--conf spark.sql.catalog.polaris.token-refresh-enabled=false \
64-
--conf spark.sql.catalog.polaris.warehouse=quickstart_catalog \
65-
--conf spark.sql.catalog.polaris.scope=PRINCIPAL_ROLE:ALL \
66-
--conf spark.sql.catalog.polaris.header.X-Iceberg-Access-Delegation=vended-credentials \
67-
--conf spark.sql.catalog.polaris.credential=root:s3cr3t
39+
CLIENT_ID=root
40+
CLIENT_SECRET=s3cr3t
41+
DEFAULT_BASE_LOCATION=s3://example-bucket/my_data
42+
REGION=us-west-2
43+
44+
./polaris \
45+
--client-id ${CLIENT_ID} \
46+
--client-secret ${CLIENT_SECRET} \
47+
catalogs \
48+
create \
49+
--storage-type s3 \
50+
--endpoint http://127.0.0.1:9100
51+
--default-base-location ${DEFAULT_BASE_LOCATION} \
52+
--region ${REGION} \
53+
quickstart_catalog
6854
```
6955

70-
Note: `AWS_REGION` is required by the AWS SDK used by Spark, but the value is irrelevant in this case.
71-
72-
Create a table in Spark.
73-
74-
```sql
75-
use polaris;
76-
create namespace ns;
77-
create table ns.t1 as select 'abc';
78-
select * from ns.t1;
79-
```
80-
81-
# Connecting from MinIO client
82-
83-
```shell
84-
mc alias set pol http://localhost:9000 minio_root m1n1opwd
85-
mc ls pol/bucket123/ns/t1
86-
[2025-08-13 18:52:38 EDT] 0B data/
87-
[2025-08-13 18:52:38 EDT] 0B metadata/
88-
```
89-
90-
Note: the values of `minio_root`, `m1n1opwd` and `bucket123` are defined in the docker compose file.
91-
92-
# Notes on Storage Configuation
93-
94-
In this example the Polaris Catalog is defined as (excluding uninteresting properties):
95-
96-
```json
97-
{
98-
"name": "quickstart_catalog",
99-
"storageConfigInfo": {
100-
"endpoint": "http://localhost:9000",
101-
"endpointInternal": "http://minio:9000",
102-
"pathStyleAccess": true,
103-
"storageType": "S3",
104-
"allowedLocations": [
105-
"s3://bucket123"
106-
]
107-
}
108-
}
109-
```
56+
In more complex deployments it may be necessary to configure different endpoints for S3 requests
57+
and for STS (AssumeRole) requests. This can be achieved via the `--sts-endpoint` CLI option.
11058

111-
Note that the `roleArn` parameter, which is required for AWS storage, does not need to be set for MinIO.
59+
Additionally, the `--endpoint-internal` CLI option cane be used to set the S3 endpoint for use by
60+
the Polaris Server itself, if it needs to be different from the endpoint used by clients / engines.
11261

113-
Note the two endpoint values. `endpointInternal` is used by the Polaris Server, while `endpoint` is communicated
114-
to clients (such as Spark) in Iceberg REST API responses. This distinction allows the system to work smoothly
115-
when the clients and the server have different views of the network (in this example the host name `minio` is
116-
resolvable only inside the docker compose environment).
62+
A usable MinIO example for `docker-compose` is available in the Polaris source code under the
63+
[getting-started/minio](https://github.com/apache/polaris/tree/main/getting-started/minio) module.

site/content/in-dev/unreleased/getting-started/deploying-polaris/cloud-deploy/deploy-aws.md

Lines changed: 3 additions & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -45,7 +45,9 @@ export CLIENT_SECRET=s3cr3t
4545
```
4646

4747
## Next Steps
48-
Congrats, you now have a running instance of1 Polaris! For details on how to use Polaris, check out the [Using Polaris]({{% relref "../../using-polaris" %}}) page.
48+
Congrats, you now have a running instance of Polaris! For further information regarding how to use Polaris,
49+
check out the [Creating a Catalog]({{% ref "../../creating-a-catalog" %}}) and
50+
[Using Polaris]({{% relref "../../using-polaris" %}}) pages.
4951

5052
## Cleanup Instructions
5153
To shut down the Polaris server, run the following commands:

site/content/in-dev/unreleased/getting-started/deploying-polaris/cloud-deploy/deploy-azure.md

Lines changed: 3 additions & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -40,7 +40,9 @@ export CLIENT_SECRET=s3cr3t
4040
```
4141

4242
## Next Steps
43-
Congrats, you now have a running instance of Polaris! For further information regarding how to use Polaris, check out the [Using Polaris]({{% relref "../../using-polaris" %}}) page.
43+
Congrats, you now have a running instance of Polaris! For further information regarding how to use Polaris,
44+
check out the [Creating a Catalog]({{% ref "../../creating-a-catalog" %}}) and
45+
[Using Polaris]({{% relref "../../using-polaris" %}}) pages.
4446

4547
## Cleanup Instructions
4648
To shut down the Polaris server, run the following commands:

site/content/in-dev/unreleased/getting-started/deploying-polaris/cloud-deploy/deploy-gcp.md

Lines changed: 3 additions & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -40,7 +40,9 @@ export CLIENT_SECRET=s3cr3t
4040
```
4141

4242
## Next Steps
43-
Congrats, you now have a running instance of Polaris! For further information regarding how to use Polaris, check out the [Using Polaris]({{% relref "../../using-polaris" %}}) page.
43+
Congrats, you now have a running instance of Polaris! For further information regarding how to use Polaris,
44+
check out the [Creating a Catalog]({{% ref "../../creating-a-catalog" %}}) and
45+
[Using Polaris]({{% relref "../../using-polaris" %}}) pages.
4446

4547
## Cleanup Instructions
4648
To shut down the Polaris server, run the following commands:

site/content/in-dev/unreleased/getting-started/deploying-polaris/local-deploy.md

Lines changed: 3 additions & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -114,4 +114,6 @@ docker run --name trino -d -p 8080:8080 trinodb/trino
114114
```
115115

116116
## Next Steps
117-
Congrats, you now have a running instance of Polaris! For further information regarding how to use Polaris, check out the [Using Polaris]({{% ref "../using-polaris" %}}) page.
117+
Congrats, you now have a running instance of Polaris! For further information regarding how to use Polaris,
118+
check out the [Creating a Catalog]({{% ref "../creating-a-catalog" %}}) and
119+
[Using Polaris]({{% ref "../using-polaris" %}}) pages.

site/content/in-dev/unreleased/getting-started/using-polaris.md

Lines changed: 5 additions & 21 deletions
Original file line numberDiff line numberDiff line change
@@ -31,29 +31,13 @@ export CLIENT_ID=YOUR_CLIENT_ID
3131
export CLIENT_SECRET=YOUR_CLIENT_SECRET
3232
```
3333

34-
## Defining a Catalog
34+
Refer to the [Creating a Catalog]({{% ref "creating-a-catalog" %}}) page for instructions on defining a
35+
catalog for your specific storage type. The following examples assume the catalog's name is `quickstart_catalog`.
3536

36-
In Polaris, the [catalog]({{% relref "../entities#catalog" %}}) is the top-level entity that objects like [tables]({{% relref "../entities#table" %}}) and [views]({{% relref "../entities#view" %}}) are organized under. With a Polaris service running, you can create a catalog like so:
37+
In Polaris, the [catalog]({{% relref "../entities#catalog" %}}) is the top-level entity that objects like [tables]({{% relref "../entities#table" %}}) and [views]({{% relref "../entities#view" %}}) are organized under.
3738

38-
```shell
39-
cd ~/polaris
40-
41-
./polaris \
42-
--client-id ${CLIENT_ID} \
43-
--client-secret ${CLIENT_SECRET} \
44-
catalogs \
45-
create \
46-
--storage-type s3 \
47-
--default-base-location ${DEFAULT_BASE_LOCATION} \
48-
--role-arn ${ROLE_ARN} \
49-
quickstart_catalog
50-
```
51-
52-
This will create a new catalog called **quickstart_catalog**. If you are using one of the Getting Started locally-built Docker images, we have already created a catalog named `quickstart_catalog` for you.
53-
54-
The `DEFAULT_BASE_LOCATION` you provide will be the default location that objects in this catalog should be stored in, and the `ROLE_ARN` you provide should be a [Role ARN](https://docs.aws.amazon.com/IAM/latest/UserGuide/reference-arns.html) with access to read and write data in that location. These credentials will be provided to engines reading data from the catalog once they have authenticated with Polaris using credentials that have access to those resources.
55-
56-
If you’re using a storage type other than S3, such as Azure, you’ll provide a different type of credential than a Role ARN. For more details on supported storage types, see the [docs]({{% relref "../entities#storage-type" %}}).
39+
The `DEFAULT_BASE_LOCATION` value you provided at catalog creation time will be the default location that objects in
40+
this catalog should be stored in.
5741

5842
Additionally, if Polaris is running somewhere other than `localhost:8181`, you can specify the correct hostname and port by providing `--host` and `--port` flags. For the full set of options supported by the CLI, please refer to the [docs]({{% relref "../command-line-interface" %}}).
5943

0 commit comments

Comments
 (0)