Skip to content

Conversation

@jiangxb1987
Copy link
Contributor

@jiangxb1987 jiangxb1987 commented Jan 22, 2017

What changes were proposed in this pull request?

Move SQLViewSuite from sql/hive to sql/core, so we can test the view supports without hive metastore. Also moved the test cases that specified to hive to HiveSQLViewSuite.

Improve the test coverage of SQLViewSuite, cover the following cases:

  1. view resolution(possibly a referenced table/view have changed after the view creation);
  2. handle a view with user specified column names;
  3. improve the test cases for a nested view.

Also added a test case for cyclic view reference, which is a known issue that is not fixed yet.

How was this patch tested?

N/A

@SparkQA
Copy link

SparkQA commented Jan 22, 2017

Test build #71806 has finished for PR 16674 at commit 60d927d.

  • This patch passes all tests.
  • This patch merges cleanly.
  • This patch adds no public classes.

@gatorsmile
Copy link
Member

I think we are having a big gap for testing view supports without using Hive metastore. So far, we only have a single test case you added. My proposal here is to move the whole test suite to the SQL module from the Hive module. Then, create another test suite in Hive that extends the suite in SQL.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

What is the reason?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

How about creating a new test case? First, create a view using a regular SQL statement, and then fetch the metadata and verify the fetched metadata is still matching what we expect?

@gatorsmile
Copy link
Member

I saw the PR deleted a few test cases. Could you post the reasons you did?

@jiangxb1987
Copy link
Contributor Author

@gatorsmile The only test cast that I removed is test("correctly resolve a view with CTE"), which is duplicate with the existing test("CTE within view").

For testing without using Hive metastore, I'll update the test suites follow your suggestion, maybe later in a few days, since you know, the Chinese New Year is coming :)

@gatorsmile
Copy link
Member

Happy New Year! : )

I double checked all the test case change. The changes makes sense to me. Could you resolve the conflicts?

BTW, a general suggestion. Could you please add the comments when you move the codes in the PR? It can help others review the codes.

@SparkQA
Copy link

SparkQA commented Feb 8, 2017

Test build #72567 has started for PR 16674 at commit 2e201c2.

}
}

test("create hive view for json table") {
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

These test cases have been moved to HiveSQLViewSuite.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

hmmm? these tests are still here

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The comment of @jiangxb1987 is out of dated. After leaving this comment, he moved the test cases to SQLViewSuite based on my comments.

test("Using view after switching current database") {
withView("v") {
sql("CREATE VIEW v AS SELECT * FROM src")
sql("CREATE VIEW v AS SELECT * FROM jt")
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Change table src to jt, so we only have to create one single table in beforeAll().

}
}

test("create hive view for joined tables") {
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

These test cases have been moved to HiveSQLViewSuite.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

actually you moved it here

}
}

test("correctly resolve a nested view") {
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This test case is rewritten to test("correctly parse a nested view")

}
}

test("correctly resolve a view with CTE") {
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This test case is duplicated with test("CTE within view")

}
}

test("correctly resolve a view in a self join") {
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This test case is rewritten to test("correctly resolve a view in a self join").

}
}

test("make sure we can resolve view created by old version of Spark") {
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This test case has been moved to HiveSQLViewSuite.

test("resolve a view with custom column names") {
withTable("testTable") {
spark.range(1, 10).selectExpr("id", "id + 1 id1").write.saveAsTable("testTable")
withTable("tab1") {
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The name testTable has been existed, so we change the table name to tab1.

@jiangxb1987
Copy link
Contributor Author

retest this please.

@SparkQA
Copy link

SparkQA commented Feb 8, 2017

Test build #72573 has finished for PR 16674 at commit 2e201c2.

  • This patch passes all tests.
  • This patch merges cleanly.
  • This patch adds no public classes.

@jiangxb1987
Copy link
Contributor Author

@gatorsmile @cloud-fan Could you please look at this when you have time? Thanks!

}
}

test("create hive view for joined tables") {
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Any reason why this is Hive only?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I moved this to HiveSQLViewSuite just because the test name describes itself to be hive specific.

Copy link
Member

@gatorsmile gatorsmile Feb 9, 2017

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

: ) Could you ignore the test case names and make HiveSQLViewSuite as small as possible?

If we move them to SQLViewSuite, they will also be executed in HiveSQLViewSuite.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Sure - let me do this.

@SparkQA
Copy link

SparkQA commented Feb 10, 2017

Test build #72677 has finished for PR 16674 at commit 664b53a.

  • This patch passes all tests.
  • This patch merges cleanly.
  • This patch adds no public classes.

u.failAnalysis(s"Table or view not found: ${u.tableName}")
// If the database is not valid, also throw an AnalysisException.
case _: NoSuchDatabaseException =>
u.failAnalysis(s"Table or view not found: ${u.tableName}")
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

shall we use the qualified name? or users are hard to find out that the database doesn't exist, given only the table name.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Also improved the error message to let users know the database is not found?

checkAnswer(sql("SELECT * FROM view2 ORDER BY id"), (1 to 9).map(i => Row(i, i)))

// Alter the referenced view.
sql("ALTER VIEW view1 AS SELECT i AS x, j AS y FROM jt2")
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

nit: ALTER VIEW view1 AS SELECT id AS x, id1 + 1 As y FROM jt, then we don't need to create jt2

spark.range(1, 10).selectExpr("id", "id + 1 id1").write.saveAsTable("tab1")
withView("testView1", "testView2") {
// Correctly create a view with custom column names
sql("CREATE VIEW testView1(x, y) AS SELECT * FROM tab1")
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

can we just use jt and avoid creating tab1?

}
}

test("resolve a view with custom column names") {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm afraid the former test case is on CREATE VIEW, and this test case is on view resolution, so they are not duplicated.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

can we merge them? Seems these 2 tests do have some common part.

test("resolve a view when the dataTypes of referenced table columns changed") {
withTable("testTable") {
spark.range(1, 10).selectExpr("id", "id + 1 id1").write.saveAsTable("testTable")
withTable("tab1") {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

can we use jt instead?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

For this test case I think we'd better create a new table, because we would overwrite the table in the test case, if we use jt then we have to recover the table content and that may be error prone.

override def beforeAll(): Unit = {
super.beforeAll()
// Create a simple table with two columns: id and id1
spark.range(1, 10).selectExpr("id", "id id1").write.format("json").saveAsTable("jt")
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

can we put this in SQLViewSuite? then we don't need to duplicate this logic in SimpleSQLViewSuite and HiveSQLViewSuite

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The problem is that SharedSQLContext also overrides beforeAll() and afterAll(), I haven't figure out a good way to resolve the issue, do you have any suggestions?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

what do you mean? In SQLViewSuite.beforeAll, we can still call super.beforeAll

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes, we can remove both by doing something like

abstract class SQLViewSuite extends QueryTest with SQLTestUtils {
  import testImplicits._

  protected override def beforeAll(): Unit = {
    super.beforeAll()
    // Create a simple table with two columns: id and id1
    spark.range(1, 10).selectExpr("id", "id id1").write.format("json").saveAsTable("jt")
  }

  protected override def afterAll(): Unit = {
    try {
      spark.sql(s"DROP TABLE IF EXISTS jt")
    } finally {
      super.afterAll()
    }
  }

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The call of SharedSQLContext.afterAll() is before that of SQLViewSuite.afterAll(), so the variable spark has been set to null in SQLViewSuite.afterAll().

@SparkQA
Copy link

SparkQA commented Feb 11, 2017

Test build #72742 has finished for PR 16674 at commit e8fc8c7.

  • This patch fails Spark unit tests.
  • This patch merges cleanly.
  • This patch adds no public classes.

u: UnresolvedRelation,
defaultDatabase: Option[String] = None): LogicalPlan = {
val tableIdentWithDb = u.tableIdentifier.copy(
database = u.tableIdentifier.database.orElse(defaultDatabase))
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

u.tableIdentifier.database.orElse(defaultDatabase).orElse(Some(conf.defaultDB)), or we can't guarantee we have the database part

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

val db = u.tableIdentifier.database.orElse(defaultDatabase).orElse(Some(conf.defaultDB))

Then we can use it in the error message.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

For a temp view, we should not set the database of the tableIdentifier. Perhaps we can change the case statement to:

        case _: NoSuchDatabaseException if db.isDefined =>
          u.failAnalysis(s"Table or view not found: ${tableIdentWithDb.unquotedString}, the " +
            s"database ${db.get} doesn't exsits.")

}
}

test("SPARK-14933 - create view from hive parquet table") {
Copy link
Member

@gatorsmile gatorsmile Feb 12, 2017

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The following removed test cases are moved to HiveSQLViewSuite , because they are Hive specific.

@SparkQA
Copy link

SparkQA commented Feb 12, 2017

Test build #72774 has finished for PR 16674 at commit 2f00404.

  • This patch fails Spark unit tests.
  • This patch merges cleanly.
  • This patch adds no public classes.

*/
class SQLViewSuite extends QueryTest with SQLTestUtils with TestHiveSingleton {
import spark.implicits._
abstract class SQLViewSuite extends QueryTest with SQLTestUtils {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

how about we make a trait SparkSessionProvider and make SharedSparkContext and TestHiveSingleton both extend it? then we can make SQLViewSuite extends SparkSessionProvider and fix this problem.

@SparkQA
Copy link

SparkQA commented Feb 14, 2017

Test build #72853 has finished for PR 16674 at commit 4872918.

  • This patch passes all tests.
  • This patch merges cleanly.
  • This patch adds no public classes.

u.failAnalysis(s"Table or view not found: ${tableIdentWithDb.unquotedString}")
// If the database is defined and that database is not found, throw an AnalysisException.
// Note that if the database is not defined, it is possible we are looking up a temp view.
case _: NoSuchDatabaseException if db.isDefined =>
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

if users don't specify database, and the current database is removed by other sessions, then they will hit NoSuchDatabaseException. In this case, I think we should also throw AnalysisException for consistency.

Copy link
Contributor

@cloud-fan cloud-fan Feb 14, 2017

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

let's remove this if, since we can't get the current database in Analyzer

import org.apache.spark.sql.test.{SharedSQLContext, SQLTestUtils}

class SimpleSQLViewSuite extends SQLViewSuite with SharedSQLContext {
}
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

the { } is not needed

@SparkQA
Copy link

SparkQA commented Feb 15, 2017

Test build #72914 has finished for PR 16674 at commit aa0cf0a.

  • This patch passes all tests.
  • This patch merges cleanly.
  • This patch adds the following public classes (experimental):
  • class NoSuchDatabaseException(val db: String) extends AnalysisException(s\"Database '$db' not found\")

private def lookupTableFromCatalog(
u: UnresolvedRelation,
defaultDatabase: Option[String] = None): LogicalPlan = {
val tableIdentWithDb = u.tableIdentifier.copy(database = u.tableIdentifier.database
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

nvm, I fixed it while merging

@cloud-fan
Copy link
Contributor

thanks, merging to master!

@asfgit asfgit closed this in 3755da7 Feb 15, 2017
@gatorsmile
Copy link
Member

A late LGTM : )

@jiangxb1987 jiangxb1987 deleted the view-test branch February 16, 2017 03:03
cmonkey pushed a commit to cmonkey/spark that referenced this pull request Feb 16, 2017
Move `SQLViewSuite` from `sql/hive` to `sql/core`, so we can test the view supports without hive metastore. Also moved the test cases that specified to hive to `HiveSQLViewSuite`.

Improve the test coverage of SQLViewSuite, cover the following cases:
1. view resolution(possibly a referenced table/view have changed after the view creation);
2. handle a view with user specified column names;
3. improve the test cases for a nested view.

Also added a test case for cyclic view reference, which is a known issue that is not fixed yet.

N/A

Author: jiangxingbo <[email protected]>

Closes apache#16674 from jiangxb1987/view-test.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

4 participants