Skip to content

Conversation

@amaliujia
Copy link
Contributor

@amaliujia amaliujia commented Jun 17, 2022

What changes were proposed in this pull request?

  1. add 3l namespace test for CacheTable, isCache.
  2. make UncacheTable be 3l namespace compatible.
  3. add new API setCurrentCatalog, currentCatalog, listCatalogs which are useful for 3l namespace.

Why are the changes needed?

This is a part of effort for 3l namespace work.

Does this PR introduce any user-facing change?

Yes. There are 3 new API added in this PR (setCurrentCatalog, currentCatalog, listCatalogs) and unCacheTable API will support 3l namespace.

How was this patch tested?

UT

@amaliujia
Copy link
Contributor Author

R: @cloud-fan

@github-actions github-actions bot added the SQL label Jun 17, 2022
@amaliujia amaliujia changed the title [SPARK-39506] Make CacheTable, isCached, UncacheTable, setCurrentCatalog, currentCatalog, listCatalogs 3l namespace compatible [SPARK-39506][SQL] Make CacheTable, isCached, UncacheTable, setCurrentCatalog, currentCatalog, listCatalogs 3l namespace compatible Jun 17, 2022
@AmplabJenkins
Copy link

Can one of the admins verify this patch?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

if it's simply a string, shall we just let listCatalogs return Dataset[String]?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We might design the class like this

case class CatalogMetadata(
    name: String,
    description: String,
    owner: Option[String] = None,
    catalogType: Option[CatalogType] = None,
    createdAt: Option[Long] = None,
    createdBy: Option[String] = None,
    updatedAt: Option[Long] = None,
    updatedBy: Option[String] = None
)

However, I am not sure if there are extra information we can fetch for a catalog. Do you know if we have such rich metadata for a catalog in Spark?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If Spark only knows the catalog name, then probably Dataset[String] would be fine.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

ok let's keep it then, and add a description field so that it doesn't look so weird.

About the naming, how about just Catalog? to match Database and Table in this file.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Sounds good for keeping the class and additionally adding a description field.

Regrading renaming to Catalog, it will have a naming conflict with

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

ah I see!

…log, currentCatalog, listCatalogs 3l namespace compatible.
@amaliujia
Copy link
Contributor Author

Also I am seeing this

[info] spark-streaming-kinesis-asl: mimaPreviousArtifacts not set, not analyzing binary compatibility
[error] spark-sql: Failed binary compatibility check against org.apache.spark:spark-sql_2.12:3.2.0! Found 3 potential problems (filtered 602)
[error]  * abstract method currentCatalog()java.lang.String in class org.apache.spark.sql.catalog.Catalog is present only in current version
[error]    filter with: ProblemFilters.exclude[ReversedMissingMethodProblem]("org.apache.spark.sql.catalog.Catalog.currentCatalog")
[error]  * abstract method setCurrentCatalog(java.lang.String)Unit in class org.apache.spark.sql.catalog.Catalog is present only in current version
[error]    filter with: ProblemFilters.exclude[ReversedMissingMethodProblem]("org.apache.spark.sql.catalog.Catalog.setCurrentCatalog")
[error]  * abstract method listCatalogs()org.apache.spark.sql.Dataset in class org.apache.spark.sql.catalog.Catalog is present only in current version
[error]    filter with: ProblemFilters.exclude[ReversedMissingMethodProblem]("org.apache.spark.sql.catalog.Catalog.listCatalogs")

Any idea how to make the new API pass such API compatibility check?

@HyukjinKwon
Copy link
Member

Adding @zhengruifeng FYI

@github-actions github-actions bot added the BUILD label Jun 23, 2022
assert(spark.catalog.currentCatalog().equals("testcat"))
spark.catalog.setCurrentCatalog("spark_catalog")
assert(spark.catalog.currentCatalog().equals("spark_catalog"))
assert(spark.catalog.listCatalogs().collect().map(c => c.name).toSet == Set("testcat"))
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

not related to this PR, but we should figure out why spark_catalog is missed here.

@cloud-fan
Copy link
Contributor

The last commit only updates a comment, I'm merging it to master, thanks!

@cloud-fan cloud-fan closed this in 299cdfa Jun 24, 2022
@amaliujia
Copy link
Contributor Author

@cloud-fan thank you!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Projects

None yet

Development

Successfully merging this pull request may close these issues.

4 participants