Skip to content

Commit 48e7e88

Browse files
authored
Add webpage for Generic Table support (#1889)
* add change * add comment * address feedback * update limitations * update docs * update doc * address feedback
1 parent 5441bb6 commit 48e7e88

File tree

1 file changed

+169
-0
lines changed

1 file changed

+169
-0
lines changed
Lines changed: 169 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,169 @@
1+
---
2+
#
3+
# Licensed to the Apache Software Foundation (ASF) under one
4+
# or more contributor license agreements. See the NOTICE file
5+
# distributed with this work for additional information
6+
# regarding copyright ownership. The ASF licenses this file
7+
# to you under the Apache License, Version 2.0 (the
8+
# "License"); you may not use this file except in compliance
9+
# with the License. You may obtain a copy of the License at
10+
#
11+
# http://www.apache.org/licenses/LICENSE-2.0
12+
#
13+
# Unless required by applicable law or agreed to in writing,
14+
# software distributed under the License is distributed on an
15+
# "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY
16+
# KIND, either express or implied. See the License for the
17+
# specific language governing permissions and limitations
18+
# under the License.
19+
#
20+
title: Generic Table (Beta)
21+
type: docs
22+
weight: 435
23+
---
24+
25+
The Generic Table in Apache Polaris is designed to provide support for non-Iceberg tables across different table formats includes delta, csv etc. It currently provides the following capabilities:
26+
- Create a generic table under a namespace
27+
- Load a generic table
28+
- Drop a generic table
29+
- List all generic tables under a namespace
30+
31+
**NOTE** The current generic table is in beta release. Please use it with caution and report any issue if encountered.
32+
33+
## What is a Generic Table?
34+
35+
A generic table in Polaris is an entity that defines the following fields:
36+
37+
- **name** (required): A unique identifier for the table within a namespace
38+
- **format** (required): The format for the generic table, i.e. "delta", "csv"
39+
- **base-location** (optional): Table base location in URI format. For example: s3://<my-bucket>/path/to/table
40+
- The table base location is a location that includes all files for the table
41+
- A table with multiple disjoint locations (i.e. containing files that are outside the configured base location) is not compliant with the current generic table support in Polaris.
42+
- If no location is provided, clients or users are responsible for managing the location.
43+
- **properties** (optional): Properties for the generic table passed on creation.
44+
- Currently, there is no reserved property key defined.
45+
- The property definition and interpretation is delegated to client or engine implementations.
46+
- **doc** (optional): Comment or description for the table
47+
48+
## Generic Table API Vs. Iceberg Table API
49+
50+
Generic Table provides a different set of APIs to operate on the generic table entities while Iceberg APIs operates on
51+
the Iceberg table entities.
52+
53+
| Operations | **Iceberg Table API** | **Generic Table API** |
54+
|--------------|---------------------------------------------------------------------------------------------------------------------------------------------------------------------|-------------------------------------------------------------------------------------------------------------------------------|
55+
| Create Table | Create an Iceberg table | Create a generic table |
56+
| Load Table | Load an Iceberg table. If the table to load is a generic table, you need to call the Generic Table loadTable API, otherwise a TableNotFoundException will be thrown | Load a generic table. Similarly, try to load an Iceberg table through Generic Table API will thrown a TableNotFoundException. |
57+
| Drop Table | Drop an Iceberg table. Similar as load table, if the table to drop is a Generic table, a tableNotFoundException will be thrown. | Drop a generic table. Drop an Iceberg table through Generic table endpoint will thrown an TableNotFound Exception |
58+
| List Table | List all Iceberg tables | List all generic tables |
59+
60+
Note that generic table shares the same namespace with Iceberg tables, the table name has to be unique under the same namespace. Furthermore, since
61+
there is currently no support for Update Generic Table, any update to the existing table requires a drop and re-create.
62+
63+
## Working with Generic Table
64+
65+
There are two ways to work with Polaris Generic Tables today:
66+
1) Directly communicate with Polaris through REST API calls using tools such as `curl`. Details will be described in the later section.
67+
2) Use the Spark client provided if you are working with Spark. Please refer to [Polaris Spark Client]({{% ref "polaris-spark-client" %}}) for detailed instructions.
68+
69+
### Create a Generic Table
70+
71+
To create a generic table, you need to provide the corresponding fields as described in [What is a Generic Table](#what-is-a-generic-table).
72+
73+
The REST API for creating a generic Table is `POST /polaris/v1/{prefix}/namespaces/{namespace}/generic-tables`, and the
74+
request body looks like the following:
75+
76+
```json
77+
{
78+
"name": "<table_name>",
79+
"format": "<table_format>",
80+
"base-location": "<table_base_location>",
81+
"doc": "<comment or description for table>",
82+
"properties": {
83+
"<property-key>": "<property-value>"
84+
}
85+
}
86+
```
87+
88+
Here is an example to create a generic table with name `delta_table` and format as `delta` under a namespace `delta_ns`
89+
for catalog `delta_catalog` using curl:
90+
91+
```shell
92+
curl -X POST http://localhost:8181/api/catalog/polaris/v1/delta_catalog/namespaces/delta_ns/generic-tables \
93+
-H "Content-Type: application/json" \
94+
-d '{
95+
"name": "delta_table",
96+
"format": "delta",
97+
"base-location": "s3://<my-bucket>/path/to/table",
98+
"doc": "delta table example",
99+
"properties": {
100+
"key1": "value1"
101+
}
102+
}'
103+
```
104+
105+
### Load a Generic Table
106+
The REST endpoint for load a generic table is `GET /polaris/v1/{prefix}/namespaces/{namespace}/generic-tables/{generic-table}`.
107+
108+
Here is an example to load the table `delta_table` using curl:
109+
```shell
110+
curl -X GET http://localhost:8181/api/catalog/polaris/v1/delta_catalog/namespaces/delta_ns/generic-tables/delta_table
111+
```
112+
And the response looks like the following:
113+
```json
114+
{
115+
"table": {
116+
"name": "delta_table",
117+
"format": "delta",
118+
"base-location": "s3://<my-bucket>/path/to/table",
119+
"doc": "delta table example",
120+
"properties": {
121+
"key1": "value1"
122+
}
123+
}
124+
}
125+
```
126+
127+
### List Generic Tables
128+
The REST endpoint for listing the generic tables under a given
129+
namespace is `GET /polaris/v1/{prefix}/namespaces/{namespace}/generic-tables/`.
130+
131+
Following curl command lists all tables under namespace delta_namespace:
132+
```shell
133+
curl -X GET http://localhost:8181/api/catalog/polaris/v1/delta_catalog/namespaces/delta_ns/generic-tables/
134+
```
135+
Example Response:
136+
```json
137+
{
138+
"identifiers": [
139+
{
140+
"namespace": ["delta_ns"],
141+
"name": "delta_table"
142+
}
143+
],
144+
"next-page-token": null
145+
}
146+
```
147+
148+
### Drop a Generic Table
149+
The drop generic table REST endpoint is `DELETE /polaris/v1/{prefix}/namespaces/{namespace}/generic-tables/{generic-table}`
150+
151+
The following curl call drops the table `delat_table`:
152+
```shell
153+
curl -X DELETE http://localhost:8181/api/catalog/polaris/v1/delta_catalog/namespaces/delta_ns/generic-tables/{generic-table}
154+
```
155+
156+
### API Reference
157+
158+
For the complete and up-to-date API specification, see the [Catalog API Spec](https://editor-next.swagger.io/?url=https://raw.githubusercontent.com/apache/polaris/refs/heads/main/spec/generated/bundled-polaris-catalog-service.yaml).
159+
160+
## Limitations
161+
162+
Current limitations of Generic Table support:
163+
1) Limited spec information. Currently, there is no spec for information like Schema, Partition etc.
164+
2) No commit coordination or update capability provided at the catalog service level.
165+
166+
Therefore, the catalog itself is unaware of anything about the underlying table except some of the loosely defined metadata.
167+
It is the responsibility of the engine (and plugins used by the engine) to determine exactly how loading or commiting data
168+
should look like based on the metadata. For example, with the delta support, th delta log serialization, deserialization
169+
and update all happens at client side.

0 commit comments

Comments
 (0)