|
17 | 17 | # specific language governing permissions and limitations |
18 | 18 | # under the License. |
19 | 19 | # |
20 | | -linkTitle: 'In Development' |
21 | | -title: 'Overview' |
22 | | -type: docs |
23 | | -weight: 200 |
24 | | -params: |
25 | | - top_hidden: true |
26 | | - show_page_toc: false |
27 | | -cascade: |
28 | | - type: docs |
29 | | - params: |
30 | | - show_page_toc: true |
31 | | -# This file will NOT be copied into a new release's versioned docs folder. |
| 20 | +# Hide `/in-dev/` |
| 21 | +toc_hide: true |
| 22 | +hide_summary: true |
| 23 | +exclude_search: true |
32 | 24 | --- |
33 | | - |
34 | | -> [!WARNING] |
35 | | -> These pages refer to the current state of the main branch, which is still under active development. |
36 | | -> |
37 | | -> Functionalities can be changed, removed or added without prior notice. |
38 | | -
|
39 | | -Apache Polaris (Incubating) is a catalog implementation for Apache Iceberg™ tables and is built on the open source Apache Iceberg™ REST protocol. |
40 | | - |
41 | | -With Polaris, you can provide centralized, secure read and write access to your Iceberg tables across different REST-compatible query engines. |
42 | | - |
43 | | - overview") |
44 | | - |
45 | | -## Key concepts |
46 | | - |
47 | | -This section introduces key concepts associated with using Apache Polaris (Incubating). |
48 | | - |
49 | | -In the following diagram, a sample [Apache Polaris (Incubating) structure](#catalog) with nested [namespaces](#namespace) is shown for Catalog1. No tables |
50 | | -or namespaces have been created yet for Catalog2 or Catalog3. |
51 | | - |
52 | | - structure") |
53 | | - |
54 | | -### Catalog |
55 | | - |
56 | | -In Polaris, you can create one or more catalog resources to organize Iceberg tables. |
57 | | - |
58 | | -Configure your catalog by setting values in the storage configuration for S3, Azure, or Google Cloud Storage. An Iceberg catalog enables a |
59 | | -query engine to manage and organize tables. The catalog forms the first architectural layer in the [Apache Iceberg™ table specification](https://iceberg.apache.org/spec/#overview) and must support the following tasks: |
60 | | - |
61 | | -- Storing the current metadata pointer for one or more Iceberg tables. A metadata pointer maps a table name to the location of that table's |
62 | | - current metadata file. |
63 | | - |
64 | | -- Performing atomic operations so that you can update the current metadata pointer for a table to the metadata pointer of a new version of |
65 | | - the table. |
66 | | - |
67 | | -To learn more about Iceberg catalogs, see the [Apache Iceberg™ documentation](https://iceberg.apache.org/concepts/catalog/). |
68 | | - |
69 | | -#### Catalog types |
70 | | - |
71 | | -A catalog can be one of the following two types: |
72 | | - |
73 | | -- Internal: The catalog is managed by Polaris. Tables from this catalog can be read and written in Polaris. |
74 | | - |
75 | | -- External: The catalog is externally managed by another Iceberg catalog provider (for example, Snowflake, Glue, Dremio Arctic). Tables from |
76 | | - this catalog are synced to Polaris. These tables are read-only in Polaris. |
77 | | - |
78 | | -A catalog is configured with a storage configuration that can point to S3, Azure storage, or GCS. |
79 | | - |
80 | | -### Namespace |
81 | | - |
82 | | -You create *namespaces* to logically group Iceberg tables within a catalog. A catalog can have multiple namespaces. You can also create |
83 | | -nested namespaces. Iceberg tables belong to namespaces. |
84 | | - |
85 | | -> [!Important] |
86 | | -> For the access privileges defined for a catalog to be enforced correctly, the following conditions must be met: |
87 | | -> |
88 | | -> - The directory only contains the data files that belong to a single table. |
89 | | -> - The directory hierarchy matches the namespace hierarchy for the catalog. |
90 | | -> |
91 | | -> For example, if a catalog includes the following items: |
92 | | -> |
93 | | -> - Top-level namespace namespace1 |
94 | | -> - Nested namespace namespace1a |
95 | | -> - A customers table, which is grouped under nested namespace namespace1a |
96 | | -> - An orders table, which is grouped under nested namespace namespace1a |
97 | | -> |
98 | | -> The directory hierarchy for the catalog must follow this structure: |
99 | | -> |
100 | | -> - /namespace1/namespace1a/customers/<files for the customers table *only*> |
101 | | -> - /namespace1/namespace1a/orders/<files for the orders table *only*> |
102 | | -
|
103 | | -### Storage configuration |
104 | | - |
105 | | -A storage configuration stores a generated identity and access management (IAM) entity for your cloud storage and is created |
106 | | -when you create a catalog. The storage configuration is used to set the values to connect Polaris to your cloud storage. During the |
107 | | -catalog creation process, an IAM entity is generated and used to create a trust relationship between the cloud storage provider and Polaris |
108 | | -Catalog. |
109 | | - |
110 | | -When you create a catalog, you supply the following information about your cloud storage: |
111 | | - |
112 | | -| Cloud storage provider | Information | |
113 | | -| -----------------------| ----------- | |
114 | | -| Amazon S3 | <ul><li>Default base location for your Amazon S3 bucket</li><li>Locations for your Amazon S3 bucket</li><li>S3 role ARN</li><li>External ID (optional)</li></ul> | |
115 | | -| Google Cloud Storage (GCS) | <ul><li>Default base location for your GCS bucket</li><li>Locations for your GCS bucket</li></ul> | |
116 | | -| Azure | <ul><li>Default base location for your Microsoft Azure container</li><li>Locations for your Microsoft Azure container</li><li>Azure tenant ID</li></ul> | |
117 | | - |
118 | | -## Example workflow |
119 | | - |
120 | | -In the following example workflow, Bob creates an Apache Iceberg™ table named Table1 and Alice reads data from Table1. |
121 | | - |
122 | | -1. Bob uses Apache Spark™ to create the Table1 table under the |
123 | | - Namespace1 namespace in the Catalog1 catalog and insert values into |
124 | | - Table1. |
125 | | - |
126 | | - Bob can create Table1 and insert data into it because he is using a |
127 | | - service connection with a service principal that has |
128 | | - the privileges to perform these actions. |
129 | | - |
130 | | -2. Alice uses Snowflake to read data from Table1. |
131 | | - |
132 | | - Alice can read data from Table1 because she is using a service |
133 | | - connection with a service principal with a catalog integration that |
134 | | - has the privileges to perform this action. Alice |
135 | | - creates an unmanaged table in Snowflake to read data from Table1. |
136 | | - |
137 | | -") |
138 | | - |
139 | | -## Security and access control |
140 | | - |
141 | | -### Credential vending |
142 | | - |
143 | | -To secure interactions with service connections, Polaris vends temporary storage credentials to the query engine during query |
144 | | -execution. These credentials allow the query engine to run the query without requiring access to your cloud storage for |
145 | | -Iceberg tables. This process is called credential vending. |
146 | | - |
147 | | -As of now, the following limitation is known regarding Apache Iceberg support: |
148 | | - |
149 | | -- **remove_orphan_files:** Apache Spark can't use credential vending |
150 | | - for this due to a known issue. See [apache/iceberg#7914](https://github.com/apache/iceberg/pull/7914) for details. |
151 | | - |
152 | | -### Identity and access management (IAM) |
153 | | - |
154 | | -Polaris uses the identity and access management (IAM) entity to securely connect to your storage for accessing table data, Iceberg |
155 | | -metadata, and manifest files that store the table schema, partitions, and other metadata. Polaris retains the IAM entity for your |
156 | | -storage location. |
157 | | - |
158 | | -### Access control |
159 | | - |
160 | | -Polaris enforces the access control that you configure across all tables registered with the service and governs security for all |
161 | | -queries from query engines in a consistent manner. |
162 | | - |
163 | | -Polaris uses a role-based access control (RBAC) model that lets you centrally configure access for Polaris service principals to catalogs, |
164 | | -namespaces, and tables. |
165 | | - |
166 | | -Polaris RBAC uses two different role types to delegate privileges: |
167 | | - |
168 | | -- **Principal roles:** Granted to Polaris service principals and |
169 | | - analogous to roles in other access control systems that you grant to |
170 | | - service principals. |
171 | | - |
172 | | -- **Catalog roles:** Configured with certain privileges on Polaris |
173 | | - catalog resources and granted to principal roles. |
174 | | - |
175 | | -For more information, see [Access control]({{% ref "access-control" %}}). |
176 | | - |
177 | | -## Legal Notices |
178 | | - |
179 | | -Apache®, Apache Iceberg™, Apache Spark™, Apache Flink®, and Flink® are either registered trademarks or trademarks of the Apache Software Foundation in the United States and/or other countries. |
180 | | - |
181 | | - |
182 | | -<!-- |
183 | | -Testing the `releaseVersion` shortcode here: version is: {{< releaseVersion >}} |
184 | | ---> |
0 commit comments