[SPARK-24196][SQL] Implement Spark's own GetSchemasOperation #22903

wangyum · 2018-10-31T10:50:15Z

What changes were proposed in this pull request?

This PR fix SQL Client tools can't show DBs by implementing Spark's own GetSchemasOperation.

How was this patch tested?

unit tests and manual tests

wangyum · 2018-10-31T11:03:32Z

...rver/src/test/scala/org/apache/spark/sql/hive/thriftserver/SparkMetadataOperationSuite.scala

+  override def mode: ServerMode.Value = ServerMode.binary
+
+  test("Spark's own GetSchemasOperation(SparkGetSchemasOperation)") {
+    def testGetSchemasOperation(


This test mimic HiveDatabaseMetaData. getSchemas().

SparkQA · 2018-10-31T11:14:33Z

Test build #98311 has finished for PR 22903 at commit e4f7205.

This patch passes all tests.
This patch merges cleanly.
This patch adds no public classes.

wangyum · 2018-11-05T02:13:54Z

cc @gatorsmile

HyukjinKwon · 2018-12-27T09:45:11Z

.../src/main/scala/org/apache/spark/sql/hive/thriftserver/server/SparkSQLOperationManager.scala

+      schemaName: String): GetSchemasOperation = synchronized {
+    val sqlContext = sessionToContexts.get(parentSession.getSessionHandle)
+    require(sqlContext != null, s"Session handle: ${parentSession.getSessionHandle} has not been" +
+      s" initialized or had already closed.")


nit .. : s can be removed.

HyukjinKwon · 2018-12-27T09:53:03Z

...tserver/src/main/scala/org/apache/spark/sql/hive/thriftserver/SparkGetSchemasOperation.scala

+    }
+
+    try {
+      catalog.listDatabases(convertSchemaPattern(schemaName)).foreach { dbName =>


@wangyum, IIRC, not only DBs but other metadata are not being shown as well (correct me if I am wrong because I tested it a while ago so can't exactly remember). Can you double check if only database is missing?

+1, that's true, none of the metadata operations is implemented currently..

Yes. I think implementing GetSchemasOperation, GetTablesOperation and GetColumnsOperation is enough.

mgaido91 · 2018-12-27T10:55:38Z

...tserver/src/main/scala/org/apache/spark/sql/hive/thriftserver/SparkGetSchemasOperation.scala

+  val catalog: SessionCatalog = sqlContext.sessionState.catalog
+
+  private final val RESULT_SET_SCHEMA = new TableSchema()
+    .addStringColumn("TABLE_SCHEM", "Schema name.")


nit: TABLE_SCHEMA?

It's copy from GetSchemasOperation.java#L49 and HiveDatabaseMetaData also use TABLE_SCHEMA.

mgaido91 · 2018-12-27T10:56:56Z

...tserver/src/main/scala/org/apache/spark/sql/hive/thriftserver/SparkGetSchemasOperation.scala

+    }
+
+    try {
+      catalog.listDatabases(convertSchemaPattern(schemaName)).foreach { dbName =>


+1, that's true, none of the metadata operations is implemented currently..

gatorsmile · 2018-12-28T07:05:39Z

...tserver/src/main/scala/org/apache/spark/sql/hive/thriftserver/SparkGetSchemasOperation.scala

+
+  override def getNextRowSet(order: FetchOrientation, maxRows: Long): RowSet = {
+    validateDefaultFetchOrientation(order)
+    assertState(OperationState.FINISHED)


What is the reason you need to change the order between line 87 and 88?

This copy from SparkExecuteStatementOperation. Has reverted it.

spark/sql/hive-thriftserver/src/main/scala/org/apache/spark/sql/hive/thriftserver/SparkExecuteStatementOperation.scala

Lines 112 to 114 in b2e7677

validateDefaultFetchOrientation(order)

assertState(OperationState.FINISHED)

setHasResultSet(true)

gatorsmile · 2018-12-28T07:09:11Z

...tserver/src/main/scala/org/apache/spark/sql/hive/thriftserver/SparkGetSchemasOperation.scala

+
+  override def cancel(): Unit = {
+    logInfo(s"Cancel get schemas with $statementId")
+    setState(OperationState.CANCELED)


Why not calling the default cancel()?

gatorsmile · 2018-12-28T07:09:58Z

...tserver/src/main/scala/org/apache/spark/sql/hive/thriftserver/SparkGetSchemasOperation.scala

+
+  override def close(): Unit = {
+    logInfo(s"Close get schemas with $statementId")
+    setState(OperationState.CLOSED)


The same here.

SparkQA · 2018-12-29T08:05:02Z

Test build #100526 has finished for PR 22903 at commit ca3a767.

This patch fails due to an unknown error code, -9.
This patch merges cleanly.
This patch adds no public classes.

wangyum · 2018-12-29T08:05:31Z

retest this please

SparkQA · 2018-12-29T08:26:36Z

Test build #100527 has finished for PR 22903 at commit ca3a767.

This patch passes all tests.
This patch merges cleanly.
This patch adds no public classes.

gatorsmile · 2018-12-29T16:59:29Z

...tserver/src/main/scala/org/apache/spark/sql/hive/thriftserver/SparkGetSchemasOperation.scala

+    }
+  }
+
+  override def getNextRowSet(orientation: FetchOrientation, maxRows: Long): RowSet = {


What happens if we do not override getNextRowSet ?

It won't get the result:

[info] - Spark's own GetSchemasOperation(SparkGetSchemasOperation) *** FAILED *** (4 seconds, 364 milliseconds) [info] rs.next() was false (SparkMetadataOperationSuite.scala:78) [info] org.scalatest.exceptions.TestFailedException: [info] at org.scalatest.Assertions$class.newAssertionFailedException(Assertions.scala:528) [info] at org.scalatest.FunSuite.newAssertionFailedException(FunSuite.scala:1560) [info] at org.scalatest.Assertions$AssertionsHelper.macroAssert(Assertions.scala:501) [info] at org.apache.spark.sql.hive.thriftserver.SparkMetadataOperationSuite$$anonfun$1$$anonfun$org$apache$spark$sql$hive$thriftserver$SparkMetadataOperationSuite$$anonfun$$checkResult$1$1.apply(SparkMetadataOperationSuite.scala:78) [info] at org.apache.spark.sql.hive.thriftserver.SparkMetadataOperationSuite$$anonfun$1$$anonfun$org$apache$spark$sql$hive$thriftserver$SparkMetadataOperationSuite$$anonfun$$checkResult$1$1.apply(SparkMetadataOperationSuite.scala:77) ...

Let us remove this. You just need to change this line:

spark/sql/hive-thriftserver/src/main/java/org/apache/hive/service/cli/operation/GetSchemasOperation.java

Line 44 in 5264164

private RowSet rowSet;

to

protected RowSet rowSet;

gatorsmile · 2019-01-02T17:44:51Z

...tserver/src/main/scala/org/apache/spark/sql/hive/thriftserver/SparkGetSchemasOperation.scala

+
+  val catalog: SessionCatalog = sqlContext.sessionState.catalog
+
+  private final val RESULT_SET_SCHEMA = new TableSchema()


Also this can be removed too.

gatorsmile · 2019-01-02T17:44:59Z

...tserver/src/main/scala/org/apache/spark/sql/hive/thriftserver/SparkGetSchemasOperation.scala

+    .addStringColumn("TABLE_SCHEM", "Schema name.")
+    .addStringColumn("TABLE_CATALOG", "Catalog name.")
+
+  private val rowSet = RowSetFactory.create(RESULT_SET_SCHEMA, getProtocolVersion)


This can be removed.

gatorsmile · 2019-01-02T17:46:04Z

I think this PR is pretty close to be merged. Thanks for your effort! @wangyum

SparkQA · 2019-01-03T05:08:37Z

Test build #100669 has finished for PR 22903 at commit ecc4e0d.

This patch passes all tests.
This patch merges cleanly.
This patch adds no public classes.

HyukjinKwon · 2019-01-03T11:12:23Z

...ve-thriftserver/src/main/java/org/apache/hive/service/cli/operation/GetSchemasOperation.java

  .addStringColumn("TABLE_CATALOG", "Catalog name.");

-  private RowSet rowSet;
+  protected RowSet rowSet;


Hm ........ but I noticed here is o.a.hive side code, @gatorsmile. Wouldn't it be better to avoid some changes here to prepare to get rid of Hive stuff later completely?

We do not have a plan to remove the thrift-server and use the Hive jar. Instead, I think we need to enhance the current thrift-server implementation.

I actually meant getting rid of Hive fork stuff by upgrading Hive to another version later. Those changes would be some conflicts when upgrading from 1.2.1 to upper Hive versions basically.

Considering it's one line change, okie. I already see some changes are made there. Ok to me.

7feeb82 shows we want to further cleanup and improve the thrift-server. Even if https://issues.apache.org/jira/browse/HIVE-16391 is resolved, we will still keep the Hive thrift-server.

gatorsmile · 2019-01-07T22:38:16Z

retest this please

SparkQA · 2019-01-07T22:59:10Z

Test build #100907 has finished for PR 22903 at commit ecc4e0d.

This patch passes all tests.
This patch merges cleanly.
This patch adds no public classes.

gatorsmile · 2019-01-08T02:58:48Z

Thanks! Merged to master.

## What changes were proposed in this pull request? This PR fix SQL Client tools can't show DBs by implementing Spark's own `GetSchemasOperation`. ## How was this patch tested? unit tests and manual tests ![image](https://user-images.githubusercontent.com/5399861/47782885-3dd5d400-dd3c-11e8-8586-59a8c15c7020.png) ![image](https://user-images.githubusercontent.com/5399861/47782899-4928ff80-dd3c-11e8-9d2d-ba9580ba4301.png) Closes apache#22903 from wangyum/SPARK-24196. Authored-by: Yuming Wang <[email protected]> Signed-off-by: gatorsmile <[email protected]>

shgriffi · 2019-10-04T15:39:24Z

Has this PR shown up in anything except Master? I don't seem to see it in 2.4.4. Any reason?

wangyum · 2019-10-08T04:55:31Z

@shgriffi This is a new feature of Spark 3.0. We will release the Spark 3.0 preview soon.

https://issues.apache.org/jira/browse/SPARK-28426
http://apache-spark-developers-list.1001551.n3.nabble.com/Spark-3-0-preview-release-feature-list-and-major-changes-td28050.html
http://apache-spark-developers-list.1001551.n3.nabble.com/Spark-3-0-preview-release-on-going-features-discussion-td27957.html

shgriffi · 2019-10-08T15:13:49Z

Ah ok thanks @wangyum for the update. I think we'll need to do a manual patch for now in 2.4.4 then until 3.0 comes out. However we'll look at the 3.0 preview as soon as it's available.

Add GetSchemasOperation

e4f7205

wangyum commented Oct 31, 2018

View reviewed changes

HyukjinKwon reviewed Dec 27, 2018

View reviewed changes

mgaido91 reviewed Dec 27, 2018

View reviewed changes

gatorsmile reviewed Dec 28, 2018

View reviewed changes

address comment

ca3a767

gatorsmile reviewed Dec 29, 2018

View reviewed changes

gatorsmile reviewed Jan 2, 2019

View reviewed changes

Make RowSet protected

ecc4e0d

HyukjinKwon reviewed Jan 3, 2019

View reviewed changes

HyukjinKwon approved these changes Jan 8, 2019

View reviewed changes

asfgit closed this in 29a7d2d Jan 8, 2019

wangyum deleted the SPARK-24196 branch October 8, 2019 04:34

	validateDefaultFetchOrientation(order)
	assertState(OperationState.FINISHED)
	setHasResultSet(true)


		val catalog: SessionCatalog = sqlContext.sessionState.catalog

		private final val RESULT_SET_SCHEMA = new TableSchema()

[SPARK-24196][SQL] Implement Spark's own GetSchemasOperation #22903

[SPARK-24196][SQL] Implement Spark's own GetSchemasOperation #22903

Uh oh!

Conversation

wangyum commented Oct 31, 2018

What changes were proposed in this pull request?

How was this patch tested?

Uh oh!

Choose a reason for hiding this comment

Uh oh!

SparkQA commented Oct 31, 2018

Uh oh!

wangyum commented Nov 5, 2018

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

SparkQA commented Dec 29, 2018

Uh oh!

wangyum commented Dec 29, 2018

Uh oh!

SparkQA commented Dec 29, 2018

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

gatorsmile commented Jan 2, 2019 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

SparkQA commented Jan 3, 2019

Uh oh!

HyukjinKwon Jan 3, 2019 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

gatorsmile Jan 7, 2019 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

HyukjinKwon Jan 8, 2019 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

gatorsmile commented Jan 7, 2019

Uh oh!

SparkQA commented Jan 7, 2019

Uh oh!

gatorsmile commented Jan 8, 2019

Uh oh!

shgriffi commented Oct 4, 2019

Uh oh!

wangyum commented Oct 8, 2019

gatorsmile commented Jan 2, 2019 •

edited

Loading

HyukjinKwon Jan 3, 2019 •

edited

Loading

gatorsmile Jan 7, 2019 •

edited

Loading

HyukjinKwon Jan 8, 2019 •

edited

Loading