Skip to content

Commit 2db2f10

Browse files
authored
[Site] Simplify the doc directory structure (#2033)
1 parent c8b5036 commit 2db2f10

30 files changed

+168
-228
lines changed

README.md

Lines changed: 1 addition & 2 deletions
Original file line numberDiff line numberDiff line change
@@ -104,8 +104,7 @@ using different configurations. Check the `./getting-started` directory for more
104104
#### Configuring Polaris
105105

106106
Polaris Servers can be configured using a variety of ways.
107-
Please see the [Configuration Guide](site/content/in-dev/unreleased/configuration.md)
108-
for more information.
107+
Please see the [Configuration Guide](site/content/in-dev/configuration.md) for more information.
109108

110109
Default configuration values can be found in `runtime/defaults/src/main/resources/application.properties`.
111110

docs

Lines changed: 1 addition & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -1 +1 @@
1-
site/content/in-dev/unreleased/
1+
site/content/in-dev/

site/content/in-dev/_index.md

Lines changed: 164 additions & 4 deletions
Original file line numberDiff line numberDiff line change
@@ -17,8 +17,168 @@
1717
# specific language governing permissions and limitations
1818
# under the License.
1919
#
20-
# Hide `/in-dev/`
21-
toc_hide: true
22-
hide_summary: true
23-
exclude_search: true
20+
linkTitle: 'In Development'
21+
title: 'Overview'
22+
type: docs
23+
weight: 200
24+
params:
25+
top_hidden: true
26+
show_page_toc: false
27+
cascade:
28+
type: docs
29+
params:
30+
show_page_toc: true
31+
# This file will NOT be copied into a new release's versioned docs folder.
2432
---
33+
34+
> [!WARNING]
35+
> These pages refer to the current state of the main branch, which is still under active development.
36+
>
37+
> Functionalities can be changed, removed or added without prior notice.
38+
39+
Apache Polaris (Incubating) is a catalog implementation for Apache Iceberg™ tables and is built on the open source Apache Iceberg™ REST protocol.
40+
41+
With Polaris, you can provide centralized, secure read and write access to your Iceberg tables across different REST-compatible query engines.
42+
43+
![Conceptual diagram of Apache Polaris (Incubating).](/img/overview.svg "Apache Polaris (Incubating) overview")
44+
45+
## Key concepts
46+
47+
This section introduces key concepts associated with using Apache Polaris (Incubating).
48+
49+
In the following diagram, a sample [Apache Polaris (Incubating) structure](#catalog) with nested [namespaces](#namespace) is shown for Catalog1. No tables
50+
or namespaces have been created yet for Catalog2 or Catalog3.
51+
52+
![Diagram that shows an example Apache Polaris (Incubating) structure.](/img/sample-catalog-structure.svg "Sample Apache Polaris (Incubating) structure")
53+
54+
### Catalog
55+
56+
In Polaris, you can create one or more catalog resources to organize Iceberg tables.
57+
58+
Configure your catalog by setting values in the storage configuration for S3, Azure, or Google Cloud Storage. An Iceberg catalog enables a
59+
query engine to manage and organize tables. The catalog forms the first architectural layer in the [Apache Iceberg™ table specification](https://iceberg.apache.org/spec/#overview) and must support the following tasks:
60+
61+
- Storing the current metadata pointer for one or more Iceberg tables. A metadata pointer maps a table name to the location of that table's
62+
current metadata file.
63+
64+
- Performing atomic operations so that you can update the current metadata pointer for a table to the metadata pointer of a new version of
65+
the table.
66+
67+
To learn more about Iceberg catalogs, see the [Apache Iceberg™ documentation](https://iceberg.apache.org/concepts/catalog/).
68+
69+
#### Catalog types
70+
71+
A catalog can be one of the following two types:
72+
73+
- Internal: The catalog is managed by Polaris. Tables from this catalog can be read and written in Polaris.
74+
75+
- External: The catalog is externally managed by another Iceberg catalog provider (for example, Snowflake, Glue, Dremio Arctic). Tables from
76+
this catalog are synced to Polaris. These tables are read-only in Polaris.
77+
78+
A catalog is configured with a storage configuration that can point to S3, Azure storage, or GCS.
79+
80+
### Namespace
81+
82+
You create *namespaces* to logically group Iceberg tables within a catalog. A catalog can have multiple namespaces. You can also create
83+
nested namespaces. Iceberg tables belong to namespaces.
84+
85+
> [!Important]
86+
> For the access privileges defined for a catalog to be enforced correctly, the following conditions must be met:
87+
>
88+
> - The directory only contains the data files that belong to a single table.
89+
> - The directory hierarchy matches the namespace hierarchy for the catalog.
90+
>
91+
> For example, if a catalog includes the following items:
92+
>
93+
> - Top-level namespace namespace1
94+
> - Nested namespace namespace1a
95+
> - A customers table, which is grouped under nested namespace namespace1a
96+
> - An orders table, which is grouped under nested namespace namespace1a
97+
>
98+
> The directory hierarchy for the catalog must follow this structure:
99+
>
100+
> - /namespace1/namespace1a/customers/<files for the customers table *only*>
101+
> - /namespace1/namespace1a/orders/<files for the orders table *only*>
102+
103+
### Storage configuration
104+
105+
A storage configuration stores a generated identity and access management (IAM) entity for your cloud storage and is created
106+
when you create a catalog. The storage configuration is used to set the values to connect Polaris to your cloud storage. During the
107+
catalog creation process, an IAM entity is generated and used to create a trust relationship between the cloud storage provider and Polaris
108+
Catalog.
109+
110+
When you create a catalog, you supply the following information about your cloud storage:
111+
112+
| Cloud storage provider | Information |
113+
| -----------------------| ----------- |
114+
| Amazon S3 | <ul><li>Default base location for your Amazon S3 bucket</li><li>Locations for your Amazon S3 bucket</li><li>S3 role ARN</li><li>External ID (optional)</li></ul> |
115+
| Google Cloud Storage (GCS) | <ul><li>Default base location for your GCS bucket</li><li>Locations for your GCS bucket</li></ul> |
116+
| Azure | <ul><li>Default base location for your Microsoft Azure container</li><li>Locations for your Microsoft Azure container</li><li>Azure tenant ID</li></ul> |
117+
118+
## Example workflow
119+
120+
In the following example workflow, Bob creates an Apache Iceberg&trade; table named Table1 and Alice reads data from Table1.
121+
122+
1. Bob uses Apache Spark&trade; to create the Table1 table under the
123+
Namespace1 namespace in the Catalog1 catalog and insert values into
124+
Table1.
125+
126+
Bob can create Table1 and insert data into it because he is using a
127+
service connection with a service principal that has
128+
the privileges to perform these actions.
129+
130+
2. Alice uses Snowflake to read data from Table1.
131+
132+
Alice can read data from Table1 because she is using a service
133+
connection with a service principal with a catalog integration that
134+
has the privileges to perform this action. Alice
135+
creates an unmanaged table in Snowflake to read data from Table1.
136+
137+
![Diagram that shows an example workflow for Apache Polaris (Incubating)](/img/example-workflow.svg "Example workflow for Apache Polaris (Incubating)")
138+
139+
## Security and access control
140+
141+
### Credential vending
142+
143+
To secure interactions with service connections, Polaris vends temporary storage credentials to the query engine during query
144+
execution. These credentials allow the query engine to run the query without requiring access to your cloud storage for
145+
Iceberg tables. This process is called credential vending.
146+
147+
As of now, the following limitation is known regarding Apache Iceberg support:
148+
149+
- **remove_orphan_files:** Apache Spark can't use credential vending
150+
for this due to a known issue. See [apache/iceberg#7914](https://github.com/apache/iceberg/pull/7914) for details.
151+
152+
### Identity and access management (IAM)
153+
154+
Polaris uses the identity and access management (IAM) entity to securely connect to your storage for accessing table data, Iceberg
155+
metadata, and manifest files that store the table schema, partitions, and other metadata. Polaris retains the IAM entity for your
156+
storage location.
157+
158+
### Access control
159+
160+
Polaris enforces the access control that you configure across all tables registered with the service and governs security for all
161+
queries from query engines in a consistent manner.
162+
163+
Polaris uses a role-based access control (RBAC) model that lets you centrally configure access for Polaris service principals to catalogs,
164+
namespaces, and tables.
165+
166+
Polaris RBAC uses two different role types to delegate privileges:
167+
168+
- **Principal roles:** Granted to Polaris service principals and
169+
analogous to roles in other access control systems that you grant to
170+
service principals.
171+
172+
- **Catalog roles:** Configured with certain privileges on Polaris
173+
catalog resources and granted to principal roles.
174+
175+
For more information, see [Access control]({{% ref "access-control" %}}).
176+
177+
## Legal Notices
178+
179+
Apache&reg;, Apache Iceberg&trade;, Apache Spark&trade;, Apache Flink&reg;, and Flink&reg; are either registered trademarks or trademarks of the Apache Software Foundation in the United States and/or other countries.
180+
181+
182+
<!--
183+
Testing the `releaseVersion` shortcode here: version is: {{< releaseVersion >}}
184+
-->
File renamed without changes.
File renamed without changes.
File renamed without changes.
File renamed without changes.
File renamed without changes.
File renamed without changes.
File renamed without changes.

0 commit comments

Comments
 (0)