-
Notifications
You must be signed in to change notification settings - Fork 28.9k
[SPARK-19331][SQL][TESTS] Improve the test coverage of SQLViewSuite #16674
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
|
Test build #71806 has finished for PR 16674 at commit
|
|
I think we are having a big gap for testing view supports without using Hive metastore. So far, we only have a single test case you added. My proposal here is to move the whole test suite to the SQL module from the Hive module. Then, create another test suite in Hive that extends the suite in SQL. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
What is the reason?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
How about creating a new test case? First, create a view using a regular SQL statement, and then fetch the metadata and verify the fetched metadata is still matching what we expect?
|
I saw the PR deleted a few test cases. Could you post the reasons you did? |
|
@gatorsmile The only test cast that I removed is For testing without using Hive metastore, I'll update the test suites follow your suggestion, maybe later in a few days, since you know, the Chinese New Year is coming :) |
|
Happy New Year! : ) I double checked all the test case change. The changes makes sense to me. Could you resolve the conflicts? BTW, a general suggestion. Could you please add the comments when you move the codes in the PR? It can help others review the codes. |
60d927d to
2e201c2
Compare
|
Test build #72567 has started for PR 16674 at commit |
| } | ||
| } | ||
|
|
||
| test("create hive view for json table") { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
These test cases have been moved to HiveSQLViewSuite.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
hmmm? these tests are still here
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The comment of @jiangxb1987 is out of dated. After leaving this comment, he moved the test cases to SQLViewSuite based on my comments.
| test("Using view after switching current database") { | ||
| withView("v") { | ||
| sql("CREATE VIEW v AS SELECT * FROM src") | ||
| sql("CREATE VIEW v AS SELECT * FROM jt") |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Change table src to jt, so we only have to create one single table in beforeAll().
| } | ||
| } | ||
|
|
||
| test("create hive view for joined tables") { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
These test cases have been moved to HiveSQLViewSuite.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
actually you moved it here
| } | ||
| } | ||
|
|
||
| test("correctly resolve a nested view") { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This test case is rewritten to test("correctly parse a nested view")
| } | ||
| } | ||
|
|
||
| test("correctly resolve a view with CTE") { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This test case is duplicated with test("CTE within view")
| } | ||
| } | ||
|
|
||
| test("correctly resolve a view in a self join") { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This test case is rewritten to test("correctly resolve a view in a self join").
| } | ||
| } | ||
|
|
||
| test("make sure we can resolve view created by old version of Spark") { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This test case has been moved to HiveSQLViewSuite.
| test("resolve a view with custom column names") { | ||
| withTable("testTable") { | ||
| spark.range(1, 10).selectExpr("id", "id + 1 id1").write.saveAsTable("testTable") | ||
| withTable("tab1") { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The name testTable has been existed, so we change the table name to tab1.
|
retest this please. |
|
Test build #72573 has finished for PR 16674 at commit
|
|
@gatorsmile @cloud-fan Could you please look at this when you have time? Thanks! |
| } | ||
| } | ||
|
|
||
| test("create hive view for joined tables") { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Any reason why this is Hive only?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I moved this to HiveSQLViewSuite just because the test name describes itself to be hive specific.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
: ) Could you ignore the test case names and make HiveSQLViewSuite as small as possible?
If we move them to SQLViewSuite, they will also be executed in HiveSQLViewSuite.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Sure - let me do this.
|
Test build #72677 has finished for PR 16674 at commit
|
| u.failAnalysis(s"Table or view not found: ${u.tableName}") | ||
| // If the database is not valid, also throw an AnalysisException. | ||
| case _: NoSuchDatabaseException => | ||
| u.failAnalysis(s"Table or view not found: ${u.tableName}") |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
shall we use the qualified name? or users are hard to find out that the database doesn't exist, given only the table name.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Also improved the error message to let users know the database is not found?
| checkAnswer(sql("SELECT * FROM view2 ORDER BY id"), (1 to 9).map(i => Row(i, i))) | ||
|
|
||
| // Alter the referenced view. | ||
| sql("ALTER VIEW view1 AS SELECT i AS x, j AS y FROM jt2") |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
nit: ALTER VIEW view1 AS SELECT id AS x, id1 + 1 As y FROM jt, then we don't need to create jt2
| spark.range(1, 10).selectExpr("id", "id + 1 id1").write.saveAsTable("tab1") | ||
| withView("testView1", "testView2") { | ||
| // Correctly create a view with custom column names | ||
| sql("CREATE VIEW testView1(x, y) AS SELECT * FROM tab1") |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
can we just use jt and avoid creating tab1?
| } | ||
| } | ||
|
|
||
| test("resolve a view with custom column names") { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
is this test duplicated with https://github.com/apache/spark/pull/16674/files#diff-e09027394aebb11c880b375693f59fafR273 ?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'm afraid the former test case is on CREATE VIEW, and this test case is on view resolution, so they are not duplicated.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
can we merge them? Seems these 2 tests do have some common part.
| test("resolve a view when the dataTypes of referenced table columns changed") { | ||
| withTable("testTable") { | ||
| spark.range(1, 10).selectExpr("id", "id + 1 id1").write.saveAsTable("testTable") | ||
| withTable("tab1") { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
can we use jt instead?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
For this test case I think we'd better create a new table, because we would overwrite the table in the test case, if we use jt then we have to recover the table content and that may be error prone.
| override def beforeAll(): Unit = { | ||
| super.beforeAll() | ||
| // Create a simple table with two columns: id and id1 | ||
| spark.range(1, 10).selectExpr("id", "id id1").write.format("json").saveAsTable("jt") |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
can we put this in SQLViewSuite? then we don't need to duplicate this logic in SimpleSQLViewSuite and HiveSQLViewSuite
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The problem is that SharedSQLContext also overrides beforeAll() and afterAll(), I haven't figure out a good way to resolve the issue, do you have any suggestions?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
what do you mean? In SQLViewSuite.beforeAll, we can still call super.beforeAll
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yes, we can remove both by doing something like
abstract class SQLViewSuite extends QueryTest with SQLTestUtils {
import testImplicits._
protected override def beforeAll(): Unit = {
super.beforeAll()
// Create a simple table with two columns: id and id1
spark.range(1, 10).selectExpr("id", "id id1").write.format("json").saveAsTable("jt")
}
protected override def afterAll(): Unit = {
try {
spark.sql(s"DROP TABLE IF EXISTS jt")
} finally {
super.afterAll()
}
}There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The call of SharedSQLContext.afterAll() is before that of SQLViewSuite.afterAll(), so the variable spark has been set to null in SQLViewSuite.afterAll().
|
Test build #72742 has finished for PR 16674 at commit
|
| u: UnresolvedRelation, | ||
| defaultDatabase: Option[String] = None): LogicalPlan = { | ||
| val tableIdentWithDb = u.tableIdentifier.copy( | ||
| database = u.tableIdentifier.database.orElse(defaultDatabase)) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
u.tableIdentifier.database.orElse(defaultDatabase).orElse(Some(conf.defaultDB)), or we can't guarantee we have the database part
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
val db = u.tableIdentifier.database.orElse(defaultDatabase).orElse(Some(conf.defaultDB))Then we can use it in the error message.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
For a temp view, we should not set the database of the tableIdentifier. Perhaps we can change the case statement to:
case _: NoSuchDatabaseException if db.isDefined =>
u.failAnalysis(s"Table or view not found: ${tableIdentWithDb.unquotedString}, the " +
s"database ${db.get} doesn't exsits.")
| } | ||
| } | ||
|
|
||
| test("SPARK-14933 - create view from hive parquet table") { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The following removed test cases are moved to HiveSQLViewSuite , because they are Hive specific.
|
Test build #72774 has finished for PR 16674 at commit
|
| */ | ||
| class SQLViewSuite extends QueryTest with SQLTestUtils with TestHiveSingleton { | ||
| import spark.implicits._ | ||
| abstract class SQLViewSuite extends QueryTest with SQLTestUtils { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
how about we make a trait SparkSessionProvider and make SharedSparkContext and TestHiveSingleton both extend it? then we can make SQLViewSuite extends SparkSessionProvider and fix this problem.
|
Test build #72853 has finished for PR 16674 at commit
|
| u.failAnalysis(s"Table or view not found: ${tableIdentWithDb.unquotedString}") | ||
| // If the database is defined and that database is not found, throw an AnalysisException. | ||
| // Note that if the database is not defined, it is possible we are looking up a temp view. | ||
| case _: NoSuchDatabaseException if db.isDefined => |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
if users don't specify database, and the current database is removed by other sessions, then they will hit NoSuchDatabaseException. In this case, I think we should also throw AnalysisException for consistency.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
let's remove this if, since we can't get the current database in Analyzer
| import org.apache.spark.sql.test.{SharedSQLContext, SQLTestUtils} | ||
|
|
||
| class SimpleSQLViewSuite extends SQLViewSuite with SharedSQLContext { | ||
| } |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
the { } is not needed
|
Test build #72914 has finished for PR 16674 at commit
|
| private def lookupTableFromCatalog( | ||
| u: UnresolvedRelation, | ||
| defaultDatabase: Option[String] = None): LogicalPlan = { | ||
| val tableIdentWithDb = u.tableIdentifier.copy(database = u.tableIdentifier.database |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
please follow the existing style: https://github.com/apache/spark/pull/16674/files#diff-57b3d87be744b7d79a9beacf8e5e5eb2L622
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
nvm, I fixed it while merging
|
thanks, merging to master! |
|
A late LGTM : ) |
Move `SQLViewSuite` from `sql/hive` to `sql/core`, so we can test the view supports without hive metastore. Also moved the test cases that specified to hive to `HiveSQLViewSuite`. Improve the test coverage of SQLViewSuite, cover the following cases: 1. view resolution(possibly a referenced table/view have changed after the view creation); 2. handle a view with user specified column names; 3. improve the test cases for a nested view. Also added a test case for cyclic view reference, which is a known issue that is not fixed yet. N/A Author: jiangxingbo <[email protected]> Closes apache#16674 from jiangxb1987/view-test.
What changes were proposed in this pull request?
Move
SQLViewSuitefromsql/hivetosql/core, so we can test the view supports without hive metastore. Also moved the test cases that specified to hive toHiveSQLViewSuite.Improve the test coverage of SQLViewSuite, cover the following cases:
Also added a test case for cyclic view reference, which is a known issue that is not fixed yet.
How was this patch tested?
N/A