-
Notifications
You must be signed in to change notification settings - Fork 330
Refactors to support generic tables #1147
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Changes from all commits
4f06ed0
ba481bf
fc49085
f47bfaf
c55fc54
fc1e716
517da82
589d801
84277aa
7774527
626ad2d
37049fe
ab63c47
a0a320a
d126fcf
cec652c
213bb9d
481e02c
6c71751
ffed2bd
File filter
Filter by extension
Conversations
Jump to
Diff view
Diff view
There are no files selected for viewing
| Original file line number | Diff line number | Diff line change |
|---|---|---|
| @@ -0,0 +1,86 @@ | ||
| /* | ||
| * Licensed to the Apache Software Foundation (ASF) under one | ||
| * or more contributor license agreements. See the NOTICE file | ||
| * distributed with this work for additional information | ||
| * regarding copyright ownership. The ASF licenses this file | ||
| * to you under the Apache License, Version 2.0 (the | ||
| * "License"); you may not use this file except in compliance | ||
| * with the License. You may obtain a copy of the License at | ||
| * | ||
| * http://www.apache.org/licenses/LICENSE-2.0 | ||
| * | ||
| * Unless required by applicable law or agreed to in writing, | ||
| * software distributed under the License is distributed on an | ||
| * "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY | ||
| * KIND, either express or implied. See the License for the | ||
| * specific language governing permissions and limitations | ||
| * under the License. | ||
| */ | ||
| package org.apache.polaris.core.entity; | ||
|
|
||
| import com.fasterxml.jackson.annotation.JsonIgnore; | ||
| import org.apache.iceberg.catalog.Namespace; | ||
| import org.apache.iceberg.catalog.TableIdentifier; | ||
| import org.apache.iceberg.rest.RESTUtil; | ||
|
|
||
| /** | ||
| * A {@link PolarisEntity} implementation for generic tables. These tables are not Iceberg-like in | ||
| * that they may not have a schema or base location. Similarly to {@link IcebergTableLikeEntity} | ||
| * however, these tables have an identifier and a parent namespace. | ||
| */ | ||
| public class GenericTableEntity extends PolarisEntity { | ||
|
|
||
| public static final String FORMAT_KEY = "format"; | ||
|
Contributor
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. do you need another key for table properties? i believe the InternalProperties and properties fields in Polaris Entity is not the same as table properties? We can also add this in a separate pr, maybe just left a TODO here
Contributor
Author
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. The
Contributor
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. sounds good. |
||
|
|
||
| public GenericTableEntity(PolarisBaseEntity sourceEntity) { | ||
| super(sourceEntity); | ||
| } | ||
|
|
||
| public static GenericTableEntity of(PolarisBaseEntity sourceEntity) { | ||
| if (sourceEntity != null) { | ||
| return new GenericTableEntity(sourceEntity); | ||
| } | ||
| return null; | ||
| } | ||
|
|
||
| @JsonIgnore | ||
| public String getFormat() { | ||
| return getInternalPropertiesAsMap().get(GenericTableEntity.FORMAT_KEY); | ||
|
Contributor
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. I am not sure what is the usage distinguish between InternalProperties and Properties, i assume the InternalProperties is used to store fields that we will not return in any response to user like metadata location, but properties are used to store fields that will be directly surface back to user, like base location. if that is the case, i assume format should be stored in property map.
Contributor
Author
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. It's the other way around AFAIK
Contributor
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. I see. |
||
| } | ||
|
|
||
| public static class Builder | ||
| extends PolarisEntity.BaseBuilder<GenericTableEntity, GenericTableEntity.Builder> { | ||
| public Builder(TableIdentifier tableIdentifier, String format) { | ||
| super(); | ||
| setType(PolarisEntityType.GENERIC_TABLE); | ||
| setTableIdentifier(tableIdentifier); | ||
| setFormat(format); | ||
| } | ||
|
|
||
| public GenericTableEntity.Builder setFormat(String format) { | ||
| // TODO in the future, we may validate the format and require certain properties | ||
| internalProperties.put(GenericTableEntity.FORMAT_KEY, format); | ||
| return this; | ||
| } | ||
|
|
||
| public GenericTableEntity.Builder setTableIdentifier(TableIdentifier identifier) { | ||
| Namespace namespace = identifier.namespace(); | ||
| setParentNamespace(namespace); | ||
| setName(identifier.name()); | ||
| return this; | ||
| } | ||
|
|
||
| public GenericTableEntity.Builder setParentNamespace(Namespace namespace) { | ||
| if (namespace != null && !namespace.isEmpty()) { | ||
| internalProperties.put( | ||
| NamespaceEntity.PARENT_NAMESPACE_KEY, RESTUtil.encodeNamespace(namespace)); | ||
| } | ||
| return this; | ||
| } | ||
|
|
||
| @Override | ||
| public GenericTableEntity build() { | ||
| return new GenericTableEntity(buildBase()); | ||
| } | ||
| } | ||
| } | ||
| Original file line number | Diff line number | Diff line change |
|---|---|---|
|
|
@@ -24,7 +24,7 @@ | |
| import org.apache.iceberg.catalog.TableIdentifier; | ||
| import org.apache.iceberg.rest.RESTUtil; | ||
|
|
||
| public class TableLikeEntity extends PolarisEntity { | ||
| public class IcebergTableLikeEntity extends PolarisEntity { | ||
| // For applicable types, this key on the "internalProperties" map will return the location | ||
| // of the internalProperties JSON file. | ||
| public static final String METADATA_LOCATION_KEY = "metadata-location"; | ||
|
Contributor
Author
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. This is the reason that we don't just make GENERIC a subType of TABLE_LIKE -- TABLE_LIKE assumes a metadata location (among other things), and we probably don't want to bake that assumption into generic tables. Beyond that, this leads us to refactors in resolution that should set the stage for volumes which would have a similar problem as generic tables. |
||
|
|
@@ -35,13 +35,13 @@ public class TableLikeEntity extends PolarisEntity { | |
| public static final String LAST_ADMITTED_NOTIFICATION_TIMESTAMP_KEY = | ||
| "last-notification-timestamp"; | ||
|
|
||
| public TableLikeEntity(PolarisBaseEntity sourceEntity) { | ||
| public IcebergTableLikeEntity(PolarisBaseEntity sourceEntity) { | ||
| super(sourceEntity); | ||
| } | ||
|
|
||
| public static TableLikeEntity of(PolarisBaseEntity sourceEntity) { | ||
| public static IcebergTableLikeEntity of(PolarisBaseEntity sourceEntity) { | ||
| if (sourceEntity != null) { | ||
| return new TableLikeEntity(sourceEntity); | ||
| return new IcebergTableLikeEntity(sourceEntity); | ||
| } | ||
| return null; | ||
| } | ||
|
|
@@ -79,21 +79,21 @@ public String getBaseLocation() { | |
| return getPropertiesAsMap().get(PolarisEntityConstants.ENTITY_BASE_LOCATION); | ||
| } | ||
|
|
||
| public static class Builder extends PolarisEntity.BaseBuilder<TableLikeEntity, Builder> { | ||
| public static class Builder extends PolarisEntity.BaseBuilder<IcebergTableLikeEntity, Builder> { | ||
| public Builder(TableIdentifier identifier, String metadataLocation) { | ||
| super(); | ||
| setType(PolarisEntityType.TABLE_LIKE); | ||
| setType(PolarisEntityType.ICEBERG_TABLE_LIKE); | ||
| setTableIdentifier(identifier); | ||
| setMetadataLocation(metadataLocation); | ||
| } | ||
|
|
||
| public Builder(TableLikeEntity original) { | ||
| public Builder(IcebergTableLikeEntity original) { | ||
| super(original); | ||
| } | ||
|
|
||
| @Override | ||
| public TableLikeEntity build() { | ||
| return new TableLikeEntity(buildBase()); | ||
| public IcebergTableLikeEntity build() { | ||
| return new IcebergTableLikeEntity(buildBase()); | ||
| } | ||
|
|
||
| public Builder setTableIdentifier(TableIdentifier identifier) { | ||
|
|
||
| Original file line number | Diff line number | Diff line change |
|---|---|---|
|
|
@@ -41,22 +41,22 @@ public enum PolarisPrivilege { | |
| TABLE_CREATE(6, PolarisEntityType.NAMESPACE), | ||
| VIEW_CREATE(7, PolarisEntityType.NAMESPACE), | ||
| NAMESPACE_DROP(8, PolarisEntityType.NAMESPACE), | ||
| TABLE_DROP(9, PolarisEntityType.TABLE_LIKE, PolarisEntitySubType.TABLE), | ||
| VIEW_DROP(10, PolarisEntityType.TABLE_LIKE, PolarisEntitySubType.VIEW), | ||
| TABLE_DROP(9, PolarisEntityType.ICEBERG_TABLE_LIKE, PolarisEntitySubType.TABLE), | ||
|
Contributor
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. One thing we mentioned in the doc is that we will reuse the current table privilege, so that people don't need to grant both TABLE_DROP and GENERIC_TABLE_DROP, which could be quite confusing to users. With this setup, i don't think it is easy to extend the definition, it seems more nature to me that we can define the privilege with set of subtype, but one main type. Maybe what we can do is to have the base TableLikeEntity with just identifier, and then we can extend it to have IcebergTableLikeEntity and GenericTableEntity, in the future if we want to have a DeltaTable or general VIEW entity, we can all extend on top of the TableLikeEntity.
Contributor
Author
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. The authn happens at a higher level, such as here -- so I think with a different entity, we should still be able to enforce the existing permissions model.
Contributor
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. I am not quite sure how we plan to do this at authentication level, if we are going to do some special check to allow the operation with TABLE_DROP, I felt it is kind of hack. The privilege here seems defines that it is for the Iceberg table only, if we want to extend the definition, and have a different entity type there, we might have to take a list of <entity type, entity sub type>. compare with let the definition take an entity type, and list of subtype, the latter seems making more sense to me. i don't think I am suggesting to completely reuse the TABLE LIKE entity here, i am just suggesting to refactor the TABLE_LIKE entity to a base entity class, and extend it to ICEBERG_TABLE_LIKE entity and GENERIC_TABLE entity, so we can still address the concern about having methods or keywords that only make sense for ICEBERG also appear as part of GENERIC_TABLE entity. Also, since all those have to be unique under the same namespace, i think it make sense to have them to share the same entity type, but use different subtype.
Contributor
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. I guess we can make things work either way, but might want to hear little bit more about how the entity type and sub entity type is designed to be used in Polaris.
Member
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. TBH, it looks quite ugly to clearly define that these privileges are for The term "generic table" implies table semantics, but is it really a table or rather a view or a UDF? IMHO we either go down the full route and have privileges by Iceberg-table, Iceberg-view + generic-"thing" or remove the type specific and just have privileges that are not tied to a specific entity type.
Contributor
Author
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. @snazy this sounds like more of a comment on the design rather than this PR
Member
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Not really, this PR changes all this - so it's actually both.
Contributor
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. @snazy I don't think this PR currently does setup for GENERIC_TABLE privilege yet, and the change in this file is just a renaming to me. And you are right, the DROP_TABLE and DROP_VIEW privilege is only setup for Iceberg table and View now. The generic tables are tables, not view, and during the design, we mentioned that we will reuse the *_TABLE privilege since they are all tables, and having DROP_TABLE and DROP_GENERIC_TABLE privilege seems awkward to users. If we need more fine grained control in the future, it make more sense to introduce privileges like DORP_ICEBERG_TABLE, DROP_GENERIC_TABLE. There is definitely changes needed to set up the privilege correctly, but i don't think that is the purpose of the current PR. The current PR are mainly doing refactor/rename, and with some basic class setup to demo the purpose of rename/refactor |
||
| VIEW_DROP(10, PolarisEntityType.ICEBERG_TABLE_LIKE, PolarisEntitySubType.VIEW), | ||
| NAMESPACE_LIST(11, PolarisEntityType.NAMESPACE), | ||
| TABLE_LIST(12, PolarisEntityType.NAMESPACE), | ||
| VIEW_LIST(13, PolarisEntityType.NAMESPACE), | ||
| NAMESPACE_READ_PROPERTIES(14, PolarisEntityType.NAMESPACE), | ||
| TABLE_READ_PROPERTIES(15, PolarisEntityType.TABLE_LIKE, PolarisEntitySubType.TABLE), | ||
| VIEW_READ_PROPERTIES(16, PolarisEntityType.TABLE_LIKE, PolarisEntitySubType.VIEW), | ||
| TABLE_READ_PROPERTIES(15, PolarisEntityType.ICEBERG_TABLE_LIKE, PolarisEntitySubType.TABLE), | ||
| VIEW_READ_PROPERTIES(16, PolarisEntityType.ICEBERG_TABLE_LIKE, PolarisEntitySubType.VIEW), | ||
| NAMESPACE_WRITE_PROPERTIES(17, PolarisEntityType.NAMESPACE), | ||
| TABLE_WRITE_PROPERTIES(18, PolarisEntityType.TABLE_LIKE, PolarisEntitySubType.TABLE), | ||
| VIEW_WRITE_PROPERTIES(19, PolarisEntityType.TABLE_LIKE, PolarisEntitySubType.VIEW), | ||
| TABLE_READ_DATA(20, PolarisEntityType.TABLE_LIKE, PolarisEntitySubType.TABLE), | ||
| TABLE_WRITE_DATA(21, PolarisEntityType.TABLE_LIKE, PolarisEntitySubType.TABLE), | ||
| TABLE_WRITE_PROPERTIES(18, PolarisEntityType.ICEBERG_TABLE_LIKE, PolarisEntitySubType.TABLE), | ||
| VIEW_WRITE_PROPERTIES(19, PolarisEntityType.ICEBERG_TABLE_LIKE, PolarisEntitySubType.VIEW), | ||
| TABLE_READ_DATA(20, PolarisEntityType.ICEBERG_TABLE_LIKE, PolarisEntitySubType.TABLE), | ||
| TABLE_WRITE_DATA(21, PolarisEntityType.ICEBERG_TABLE_LIKE, PolarisEntitySubType.TABLE), | ||
| NAMESPACE_FULL_METADATA(22, PolarisEntityType.NAMESPACE), | ||
| TABLE_FULL_METADATA(23, PolarisEntityType.TABLE_LIKE, PolarisEntitySubType.TABLE), | ||
| VIEW_FULL_METADATA(24, PolarisEntityType.TABLE_LIKE, PolarisEntitySubType.VIEW), | ||
| TABLE_FULL_METADATA(23, PolarisEntityType.ICEBERG_TABLE_LIKE, PolarisEntitySubType.TABLE), | ||
| VIEW_FULL_METADATA(24, PolarisEntityType.ICEBERG_TABLE_LIKE, PolarisEntitySubType.VIEW), | ||
| CATALOG_CREATE(25, PolarisEntityType.ROOT), | ||
| CATALOG_DROP(26, PolarisEntityType.CATALOG), | ||
| CATALOG_LIST(27, PolarisEntityType.ROOT), | ||
|
|
@@ -70,12 +70,14 @@ public enum PolarisPrivilege { | |
| CATALOG_ROLE_LIST_GRANTS(35, PolarisEntityType.PRINCIPAL), | ||
| CATALOG_LIST_GRANTS(36, PolarisEntityType.CATALOG), | ||
| NAMESPACE_LIST_GRANTS(37, PolarisEntityType.NAMESPACE), | ||
| TABLE_LIST_GRANTS(38, PolarisEntityType.TABLE_LIKE, PolarisEntitySubType.TABLE), | ||
| VIEW_LIST_GRANTS(39, PolarisEntityType.TABLE_LIKE, PolarisEntitySubType.VIEW), | ||
| TABLE_LIST_GRANTS(38, PolarisEntityType.ICEBERG_TABLE_LIKE, PolarisEntitySubType.TABLE), | ||
| VIEW_LIST_GRANTS(39, PolarisEntityType.ICEBERG_TABLE_LIKE, PolarisEntitySubType.VIEW), | ||
| CATALOG_MANAGE_GRANTS_ON_SECURABLE(40, PolarisEntityType.CATALOG), | ||
| NAMESPACE_MANAGE_GRANTS_ON_SECURABLE(41, PolarisEntityType.NAMESPACE), | ||
| TABLE_MANAGE_GRANTS_ON_SECURABLE(42, PolarisEntityType.TABLE_LIKE, PolarisEntitySubType.TABLE), | ||
| VIEW_MANAGE_GRANTS_ON_SECURABLE(43, PolarisEntityType.TABLE_LIKE, PolarisEntitySubType.VIEW), | ||
| TABLE_MANAGE_GRANTS_ON_SECURABLE( | ||
| 42, PolarisEntityType.ICEBERG_TABLE_LIKE, PolarisEntitySubType.TABLE), | ||
| VIEW_MANAGE_GRANTS_ON_SECURABLE( | ||
| 43, PolarisEntityType.ICEBERG_TABLE_LIKE, PolarisEntitySubType.VIEW), | ||
| PRINCIPAL_CREATE(44, PolarisEntityType.ROOT), | ||
| PRINCIPAL_DROP(45, PolarisEntityType.PRINCIPAL), | ||
| PRINCIPAL_LIST(46, PolarisEntityType.ROOT), | ||
|
|
||
| Original file line number | Diff line number | Diff line change |
|---|---|---|
|
|
@@ -373,7 +373,8 @@ public ResolverStatus resolveAll() { | |
|
|
||
| // validate input | ||
| diagnostics.check( | ||
| entityType != PolarisEntityType.NAMESPACE && entityType != PolarisEntityType.TABLE_LIKE, | ||
|
Contributor
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. do you know why this function does not handle TABLE_LIKE, the comment is not clear about why, I assume the GENEIC_TABLE should work similar as TABLE_LIKE.
Contributor
Author
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. It was just a hard-coded assumption that we are always resolving a TABLE_LIKE (view or table) at the end of a chain of NAMESPACEs |
||
| entityType != PolarisEntityType.NAMESPACE | ||
| && entityType != PolarisEntityType.ICEBERG_TABLE_LIKE, | ||
| "cannot_be_path"); | ||
| diagnostics.check( | ||
| entityType.isTopLevel() || this.referenceCatalogName != null, "reference_catalog_expected"); | ||
|
|
||
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Did we settle on the term "Generic Table" in the design doc (I genuinely do not recall)?
Conceptually, I'd expect Iceberg tables to be a sub-type of "generic table", but it is not so...
How about
UnstructuredTable?There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We did have a consensus there (and another PR has merged that bears this name) but personally I am not opposed to a change if we find a better alternative. "Generic" was my suggestion however, so I will say a few things in defense of it:
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I believe we did reach to a consensus about "generic", because we do want to potentially allow this set of APIs to manage various types of table, includes iceberg.
When I look at the ICEBERG_TABLE_LIKE_ENTITY, and GENERIC_TABLE_ENTITY, it is literally two different ways to represent a table, and based on the representation, different operations can be performed. Like Eric has mentioned, the Generic representation only takes a format, and the format could potentially be iceberg in the future. Or even step back, we can also allow Iceberg as a format, the only thing is we can not provide full Iceberg support (commit coordination) with this representation.
For name like unstructured, i have the same feeling like Eric, which is kind of too general, almost indicates just a set of files to me.
Therefore i do think generic fits better for the case we intended to support.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
(changing the topic of this thread slightly)
The argument about "representations" makes sense to me, but only at the REST API layer. These entity classes, however, relate to data stored in Polaris persistence. Unless we revamp the Persistence API, I do not see how it is possible to store a ICEBERG_TABLE_LIKE_ENTITY but load a GENERIC_TABLE_ENTITY from persistence 🤔
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Were you thinking about the case that if in the future we make the GENERIC_TABLE_ENTITY rich enough to be able to represent full Iceberg information, we can deprecate the use of ICEBERG_TABLE_LIKE_ENTITY at application layer, and then we will need to load the old ICEBERG_TABLE_LIKE_ENTITY as GENERIC_TABLE_ENTITY?
For an entity stored in one format, if we want to load it as another format, I think you are right, the conversion has to happen at some where, it could also potentially happen after the load like at the application layer.
However, I am not sure if that is the right thing to do. To me, one table should just have one representation, either ICEBERG_TABLE_LIKE or GENERIC_TABLE. Even if in the future we make generic table rich enough to deal with full iceberg support, i think we should make sure we can deal with both representations correctly until we can safely deprecate the legacy one. That is just my current opinion, I believe there is definitely other ways to handle that, and we can discuss this when it comes to that point.
Sorry if my previous comment raised any confusion, all I was trying to say is our intention for GENERIC_TABLE is potentially make it a general representation for various format of tables.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I am thinking about a future where you make a call to load an iceberg table and the server loads a generic table (via conversion) and constructs an Iceberg loadTableResponse.
Or, a case where you make a call to load a generic table which happens to be an Iceberg table; it may be persisted as an Iceberg table.
Basically, I want to avoid coupling the persistence with the user-facing aspects of these tables. Still, I think we need two persistence entities for now.