-
Notifications
You must be signed in to change notification settings - Fork 330
Add policies for metadata compaction, orphan file removal and snapshot retention #969
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Changes from all commits
File filter
Filter by extension
Conversations
Jump to
Diff view
Diff view
There are no files selected for viewing
| Original file line number | Diff line number | Diff line change |
|---|---|---|
| @@ -0,0 +1,37 @@ | ||
| { | ||
| "license": "Licensed under the Apache License, Version 2.0 (http://www.apache.org/licenses/LICENSE-2.0)", | ||
| "$id": "https://polaris.apache.org/schemas/policies/system/metadata-compaction/2025-02-03.json", | ||
| "title": "Metadata Compaction Policy", | ||
| "description": "Inheritable Polaris policy schema for Iceberg table metadata compaction", | ||
| "type": "object", | ||
| "properties": { | ||
| "version": { | ||
| "type": "string", | ||
| "const": "2025-02-03", | ||
| "description": "Schema version" | ||
| }, | ||
| "enable": { | ||
| "type": "boolean", | ||
| "description": "Enable or disable metadata compaction." | ||
| }, | ||
| "config": { | ||
| "type": "object", | ||
| "description": "A map containing custom configuration properties. Please note that interoperability is not guaranteed.", | ||
| "additionalProperties": { | ||
| "type": ["string", "number", "boolean"] | ||
| } | ||
| } | ||
| }, | ||
| "required": ["enable"], | ||
| "additionalProperties": false, | ||
| "examples": [ | ||
| { | ||
| "version": "2025-02-03", | ||
| "enable": true, | ||
| "config": { | ||
| "spec_id": 1, | ||
| "key1": "value1" | ||
| } | ||
| } | ||
| ] | ||
| } | ||
| Original file line number | Diff line number | Diff line change |
|---|---|---|
| @@ -0,0 +1,50 @@ | ||
| { | ||
| "license": "Licensed under the Apache License, Version 2.0 (http://www.apache.org/licenses/LICENSE-2.0)", | ||
| "$id": "https://polaris.apache.org/schemas/policies/system/orphan-file-removal/2025-02-03.json", | ||
| "title": "Orphan File Removal Policy", | ||
| "description": "Inheritable Polaris policy schema for Iceberg table orphan file removal", | ||
| "type": "object", | ||
| "properties": { | ||
| "version": { | ||
| "type": "string", | ||
| "const": "2025-02-03", | ||
| "description": "Schema version" | ||
| }, | ||
| "enable": { | ||
| "type": "boolean", | ||
| "description": "Enable or disable orphan file removal." | ||
| }, | ||
| "max_orphan_file_age_in_days": { | ||
|
Member
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. @flyrain this also struggles me a bit, remove orphan policy can be even expressed in more than in just file age. We don't we opt for the
Contributor
Author
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more.
Can you name them? We can put them into schema if they are commonly used, otherwise, the config map would be the best place to be.
Member
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. I'd vote for the config map
Contributor
Author
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. If no extra field is suggested, we could keep it as is. |
||
| "type": "number", | ||
| "description": "Specifies the maximum age (in days) for orphaned files before they are eligible for removal." | ||
| }, | ||
| "locations": { | ||
| "type": "array", | ||
| "items": { | ||
| "type": "string" | ||
| }, | ||
| "description": "Specifies a list of custom directories to search for files instead of the default table location. Use with caution—if set to a broad location (e.g., s3://my-bucket instead of s3://my-bucket/my-table-location), all unreferenced files in that path may be permanently deleted, including files from other tables. Following best practices, tables should be stored in separate locations to avoid accidental data loss." | ||
| }, | ||
| "config": { | ||
| "type": "object", | ||
| "description": "A map containing custom configuration properties. Note that interoperability is not guaranteed.", | ||
| "additionalProperties": { | ||
| "type": ["string", "number", "boolean"] | ||
| } | ||
| } | ||
| }, | ||
| "required": ["enable"], | ||
| "additionalProperties": false, | ||
| "examples": [ | ||
| { | ||
| "version": "2025-02-03", | ||
| "enable": true, | ||
| "max_orphan_file_age_in_days": 30, | ||
| "location": "s3://my-bucket/my-table-location", | ||
| "config": { | ||
| "prefix_mismatch_mode": "ignore", | ||
| "key1": "value1" | ||
| } | ||
| } | ||
| ] | ||
| } | ||
| Original file line number | Diff line number | Diff line change |
|---|---|---|
| @@ -0,0 +1,39 @@ | ||
| { | ||
| "license": "Licensed under the Apache License, Version 2.0 (http://www.apache.org/licenses/LICENSE-2.0)", | ||
| "$id": "https://polaris.apache.org/schemas/policies/system/snapshot-retention/2025-02-03.json", | ||
| "title": "Snapshot Retention Policy", | ||
| "description": "Inheritable Polaris policy schema for Iceberg table snapshot retention", | ||
| "type": "object", | ||
| "properties": { | ||
| "version": { | ||
| "type": "string", | ||
| "const": "2025-02-03", | ||
| "description": "Schema version" | ||
| }, | ||
| "enable": { | ||
| "type": "boolean", | ||
| "description": "Enable or disable snapshot retention." | ||
| }, | ||
| "config": { | ||
| "type": "object", | ||
| "description": "A map containing custom configuration properties. Please note that interoperability is not guaranteed.", | ||
| "additionalProperties": { | ||
| "type": ["string", "number", "boolean"] | ||
| } | ||
| } | ||
| }, | ||
| "required": ["enable"], | ||
| "additionalProperties": false, | ||
| "examples": [ | ||
| { | ||
| "version": "2025-02-03", | ||
| "enable": true, | ||
| "config": { | ||
| "min_snapshot_to_keep": 1, | ||
| "max_snapshot_age_days": 2, | ||
| "max_ref_age_days": 3, | ||
| "key1": "value1" | ||
| } | ||
| } | ||
| ] | ||
| } |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Does the JSON magically land at the location specified in the ID somehow? Or do we always need a followup PR?
Also, it looks a little funny to use dates here given that the date in the PR may not align with the date the schema actually becomes effective. In the worst case, we could merge two versions in one day. Maybe just an incrementing number is easier?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Unfortunately no. I hope I can publish it once for all based on the directory structure.
I want to keep them as the same for date as these are first batch, should be fine as nobody is using it. Once we release them at 1.0. We should follow the date schema strictly.