-
Notifications
You must be signed in to change notification settings - Fork 28.9k
[SPARK-17402][SQL] separate the management of temp views and metastore tables/views in SessionCatalog #14962
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
|
also cc @srinathshankar , it will be very easy to implement global temp view after this |
|
Test build #64943 has finished for PR 14962 at commit
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Can we avoid adding TempViewManager?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
What is the reason? Thanks!
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think it's easier to implement and reason about the thread-safe semantic for temp views if we put temp view management into one place.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Why not just name it tempViewManager?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Since the goal of this PR is to add some view related API. So I think refactoring using TempViewManager is not the major goal?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
If we change it back to HashMap, we need to add synchronized back. Is my understanding right?
Same to all the other related functions.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
yea, now we let TempViewManager to implement the thread-safe semantic
|
Test build #64973 has finished for PR 14962 at commit
|
7314a33 to
0075f9d
Compare
|
Test build #64984 has finished for PR 14962 at commit
|
|
Test build #64987 has finished for PR 14962 at commit
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Drop Table is unable to drop a temp view, right?
spark.range(10).createTempView("tempView")
sql("DESC tempView").show()
sql("DROP TABLE tempView")
sql("DESC tempView").show()There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Actually I noticed this and fixed it before, but it breaks a lot of tests, because we call "temp view" as "temp table" before. I'd like to keep this behaviour as it was, we can discuss how to fix it in follow-ups.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I see. Thanks!
|
Found a common bug in the following ALTER TABLE commands: We need to issue an exception when the tableType is |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
In the description of TempViewManager, could we mention the name of temp view is always case sensitive? The caller is responsible for handling case-related issues.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
yea good idea
|
The biggest external change is in the table resolution order. Now, three common In addition, we need to carefully review all the places that are using these three functions. For example, the analyzer rule |
let's fix them in separated PRs.
yea, we should review them carefully(I have walked through all of them). |
|
Sorry, I did not explain it clearly. In the analyzer rule |
isn't it true before this PR? |
cec6b3e to
01659dc
Compare
|
Test build #65100 has finished for PR 14962 at commit
|
|
Test build #65101 has finished for PR 14962 at commit
|
|
Test build #65102 has finished for PR 14962 at commit
|
|
Found another issue we need to fix. Before this PR, the results are correct After this PR, the results is incorrect. |
Before this PR, yeah, it behaves the same. After this PR, we expect |
@gatorsmile , why it's an issue? In this case, we create a temp view called |
|
@gatorsmile It's actually another bug in previous code. When we use |
|
Test build #65153 has finished for PR 14962 at commit
|
|
uh... yeah, that is another existing bug. : ) Let me write a test suite for all the DDL statements when the temp view with the same name exists. We might hit more bugs... |
|
wow... we are allowed to insert data into temporary views... Updates: when writing the test cases for insertable temp views, I found a bug in my previous PR. When data sources do not extend |
|
Test build #65186 has finished for PR 14962 at commit
|
| private def lookupTableFromCatalog(u: UnresolvedRelation): LogicalPlan = { | ||
| try { | ||
| catalog.lookupRelation(u.tableIdentifier, u.alias) | ||
| catalog.lookupTempViewOrRelation(u.tableIdentifier, u.alias) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This is also for view, right? Should we just keep the old name?
|
Is it possible to first have a PR to fix the bugs? |
| assert(spark.table("default.same_name").collect().isEmpty) | ||
| } | ||
| } | ||
| } |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This is a regression test.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Let's add comments to explain what this test is for in case we accidentally delete it in future.
…-name temp view exists ## What changes were proposed in this pull request? In `SessionCatalog`, we have several operations(`tableExists`, `dropTable`, `loopupRelation`, etc) that handle both temp views and metastore tables/views. This brings some bugs to DDL commands that want to handle temp view only or metastore table/view only. These bugs are: 1. `CREATE TABLE USING` will fail if a same-name temp view exists 2. `Catalog.dropTempView`will un-cache and drop metastore table if a same-name table exists 3. `saveAsTable` will fail or have unexpected behaviour if a same-name temp view exists. These bug fixes are pulled out from #14962 and targets both master and 2.0 branch ## How was this patch tested? new regression tests Author: Wenchen Fan <[email protected]> Closes #15099 from cloud-fan/fix-view.
…-name temp view exists In `SessionCatalog`, we have several operations(`tableExists`, `dropTable`, `loopupRelation`, etc) that handle both temp views and metastore tables/views. This brings some bugs to DDL commands that want to handle temp view only or metastore table/view only. These bugs are: 1. `CREATE TABLE USING` will fail if a same-name temp view exists 2. `Catalog.dropTempView`will un-cache and drop metastore table if a same-name table exists 3. `saveAsTable` will fail or have unexpected behaviour if a same-name temp view exists. These bug fixes are pulled out from #14962 and targets both master and 2.0 branch new regression tests Author: Wenchen Fan <[email protected]> Closes #15099 from cloud-fan/fix-view. (cherry picked from commit 3fe630d) Signed-off-by: Wenchen Fan <[email protected]>
…-name temp view exists ## What changes were proposed in this pull request? In `SessionCatalog`, we have several operations(`tableExists`, `dropTable`, `loopupRelation`, etc) that handle both temp views and metastore tables/views. This brings some bugs to DDL commands that want to handle temp view only or metastore table/view only. These bugs are: 1. `CREATE TABLE USING` will fail if a same-name temp view exists 2. `Catalog.dropTempView`will un-cache and drop metastore table if a same-name table exists 3. `saveAsTable` will fail or have unexpected behaviour if a same-name temp view exists. These bug fixes are pulled out from apache#14962 and targets both master and 2.0 branch ## How was this patch tested? new regression tests Author: Wenchen Fan <[email protected]> Closes apache#15099 from cloud-fan/fix-view.
|
Maybe update this PR now? The only issue left is the atomicity issues. It sounds like we do not need to backport the fix, since the temporary view is session-specific. Right? We have to be very careful when implementing the global temporary views. |
|
I'll update this PR after the global temp view |
|
closing it as most of them were fixed in other PRs |
What changes were proposed in this pull request?
In
SessionCatalog, we have several operations(getTableMetadata,tableExists,renameTable,dropTable,loopupRelation) that handle both temp views and metastore tables/views. They can save some lines of code for some commands that need to deal with both temp views and metastore tables/views, but also introduce bugs for other commands, because the operation names say nothing about temp views and are very likely to be misused:DataFrameWriter.saveAsTable/CREATE TABLE USINGwill fail if a same-name temp view exitsDataFrameWriter.saveAsTablewith overwrite mode will mistakenly drop a same-name temp view.Catalog.dropTempViewmay drop metastore table mistakenlyALTER TABLE RECOVER PARTITIONS/LOAD DATA/TRUNCATE TABLE/SHOW CREATE TABLEshould report "table not found" instead of "temp view is not supported", if a same-name temp view exists, because these commands don't need to deal with temp views.In some commands we support temp views mistakenly without mentioning it in document:
ShowTablePropertiesCommand,ShowColumnsCommand,Catalog.listColumns.This PR remove the temp view support in
ShowTablePropertiesCommand, as temp view doesn't have properties. And explicitly document thatShowColumnsCommandandCatalog.listColumnssupport temp view.Mixing the handling of temp views and metastore tables/views also makes it harder to implement thread-safe operations. e.g.
AlterViewAsCommandchecksisTemporaryTablefirst thencreateTempView, which is not atomic. Most temp view related operations inSessionCataloghold a lock on theSessionCatalogobject, which is unnecessary.This PR separates the management of temp views and metastore tables/views in
SessionCatalog, any commands that need to deal with temp views should explicitly call temp view related operations inSessionCatalog, to fix existing bugs and prevent future bug like this.How was this patch tested?
existing tests and 3 new tests.