Skip to content

Conversation

@wangyum
Copy link
Member

@wangyum wangyum commented Oct 31, 2018

What changes were proposed in this pull request?

This PR fix SQL Client tools can't show DBs by implementing Spark's own GetSchemasOperation.

How was this patch tested?

unit tests and manual tests
image
image

override def mode: ServerMode.Value = ServerMode.binary

test("Spark's own GetSchemasOperation(SparkGetSchemasOperation)") {
def testGetSchemasOperation(
Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@SparkQA
Copy link

SparkQA commented Oct 31, 2018

Test build #98311 has finished for PR 22903 at commit e4f7205.

  • This patch passes all tests.
  • This patch merges cleanly.
  • This patch adds no public classes.

@wangyum
Copy link
Member Author

wangyum commented Nov 5, 2018

cc @gatorsmile

schemaName: String): GetSchemasOperation = synchronized {
val sqlContext = sessionToContexts.get(parentSession.getSessionHandle)
require(sqlContext != null, s"Session handle: ${parentSession.getSessionHandle} has not been" +
s" initialized or had already closed.")
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

nit .. : s can be removed.

}

try {
catalog.listDatabases(convertSchemaPattern(schemaName)).foreach { dbName =>
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@wangyum, IIRC, not only DBs but other metadata are not being shown as well (correct me if I am wrong because I tested it a while ago so can't exactly remember). Can you double check if only database is missing?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

+1, that's true, none of the metadata operations is implemented currently..

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes. I think implementing GetSchemasOperation, GetTablesOperation and GetColumnsOperation is enough.

val catalog: SessionCatalog = sqlContext.sessionState.catalog

private final val RESULT_SET_SCHEMA = new TableSchema()
.addStringColumn("TABLE_SCHEM", "Schema name.")
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

nit: TABLE_SCHEMA?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It's copy from GetSchemasOperation.java#L49 and HiveDatabaseMetaData also use TABLE_SCHEMA.

}

try {
catalog.listDatabases(convertSchemaPattern(schemaName)).foreach { dbName =>
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

+1, that's true, none of the metadata operations is implemented currently..


override def getNextRowSet(order: FetchOrientation, maxRows: Long): RowSet = {
validateDefaultFetchOrientation(order)
assertState(OperationState.FINISHED)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

What is the reason you need to change the order between line 87 and 88?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This copy from SparkExecuteStatementOperation. Has reverted it.

validateDefaultFetchOrientation(order)
assertState(OperationState.FINISHED)
setHasResultSet(true)


override def cancel(): Unit = {
logInfo(s"Cancel get schemas with $statementId")
setState(OperationState.CANCELED)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why not calling the default cancel()?


override def close(): Unit = {
logInfo(s"Close get schemas with $statementId")
setState(OperationState.CLOSED)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The same here.

@SparkQA
Copy link

SparkQA commented Dec 29, 2018

Test build #100526 has finished for PR 22903 at commit ca3a767.

  • This patch fails due to an unknown error code, -9.
  • This patch merges cleanly.
  • This patch adds no public classes.

@wangyum
Copy link
Member Author

wangyum commented Dec 29, 2018

retest this please

@SparkQA
Copy link

SparkQA commented Dec 29, 2018

Test build #100527 has finished for PR 22903 at commit ca3a767.

  • This patch passes all tests.
  • This patch merges cleanly.
  • This patch adds no public classes.

}
}

override def getNextRowSet(orientation: FetchOrientation, maxRows: Long): RowSet = {
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

What happens if we do not override getNextRowSet ?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It won't get the result:

[info] - Spark's own GetSchemasOperation(SparkGetSchemasOperation) *** FAILED *** (4 seconds, 364 milliseconds)
[info]   rs.next() was false (SparkMetadataOperationSuite.scala:78)
[info]   org.scalatest.exceptions.TestFailedException:
[info]   at org.scalatest.Assertions$class.newAssertionFailedException(Assertions.scala:528)
[info]   at org.scalatest.FunSuite.newAssertionFailedException(FunSuite.scala:1560)
[info]   at org.scalatest.Assertions$AssertionsHelper.macroAssert(Assertions.scala:501)
[info]   at org.apache.spark.sql.hive.thriftserver.SparkMetadataOperationSuite$$anonfun$1$$anonfun$org$apache$spark$sql$hive$thriftserver$SparkMetadataOperationSuite$$anonfun$$checkResult$1$1.apply(SparkMetadataOperationSuite.scala:78)
[info]   at org.apache.spark.sql.hive.thriftserver.SparkMetadataOperationSuite$$anonfun$1$$anonfun$org$apache$spark$sql$hive$thriftserver$SparkMetadataOperationSuite$$anonfun$$checkResult$1$1.apply(SparkMetadataOperationSuite.scala:77)
...

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Let us remove this. You just need to change this line:

to

protected RowSet rowSet;


val catalog: SessionCatalog = sqlContext.sessionState.catalog

private final val RESULT_SET_SCHEMA = new TableSchema()
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Also this can be removed too.

.addStringColumn("TABLE_SCHEM", "Schema name.")
.addStringColumn("TABLE_CATALOG", "Catalog name.")

private val rowSet = RowSetFactory.create(RESULT_SET_SCHEMA, getProtocolVersion)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This can be removed.

@gatorsmile
Copy link
Member

gatorsmile commented Jan 2, 2019

I think this PR is pretty close to be merged. Thanks for your effort! @wangyum

@SparkQA
Copy link

SparkQA commented Jan 3, 2019

Test build #100669 has finished for PR 22903 at commit ecc4e0d.

  • This patch passes all tests.
  • This patch merges cleanly.
  • This patch adds no public classes.

.addStringColumn("TABLE_CATALOG", "Catalog name.");

private RowSet rowSet;
protected RowSet rowSet;
Copy link
Member

@HyukjinKwon HyukjinKwon Jan 3, 2019

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hm ........ but I noticed here is o.a.hive side code, @gatorsmile. Wouldn't it be better to avoid some changes here to prepare to get rid of Hive stuff later completely?

Copy link
Member

@gatorsmile gatorsmile Jan 7, 2019

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We do not have a plan to remove the thrift-server and use the Hive jar. Instead, I think we need to enhance the current thrift-server implementation.

Copy link
Member

@HyukjinKwon HyukjinKwon Jan 8, 2019

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I actually meant getting rid of Hive fork stuff by upgrading Hive to another version later. Those changes would be some conflicts when upgrading from 1.2.1 to upper Hive versions basically.

Considering it's one line change, okie. I already see some changes are made there. Ok to me.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

7feeb82 shows we want to further cleanup and improve the thrift-server. Even if https://issues.apache.org/jira/browse/HIVE-16391 is resolved, we will still keep the Hive thrift-server.

@gatorsmile
Copy link
Member

retest this please

@SparkQA
Copy link

SparkQA commented Jan 7, 2019

Test build #100907 has finished for PR 22903 at commit ecc4e0d.

  • This patch passes all tests.
  • This patch merges cleanly.
  • This patch adds no public classes.

@gatorsmile
Copy link
Member

Thanks! Merged to master.

@asfgit asfgit closed this in 29a7d2d Jan 8, 2019
jackylee-ch pushed a commit to jackylee-ch/spark that referenced this pull request Feb 18, 2019
## What changes were proposed in this pull request?

This PR fix SQL Client tools can't show DBs by implementing Spark's own `GetSchemasOperation`.

## How was this patch tested?
unit tests and manual tests
![image](https://user-images.githubusercontent.com/5399861/47782885-3dd5d400-dd3c-11e8-8586-59a8c15c7020.png)
![image](https://user-images.githubusercontent.com/5399861/47782899-4928ff80-dd3c-11e8-9d2d-ba9580ba4301.png)

Closes apache#22903 from wangyum/SPARK-24196.

Authored-by: Yuming Wang <[email protected]>
Signed-off-by: gatorsmile <[email protected]>
kk17 pushed a commit to kk17/spark that referenced this pull request May 25, 2019
## What changes were proposed in this pull request?

This PR fix SQL Client tools can't show DBs by implementing Spark's own `GetSchemasOperation`.

## How was this patch tested?
unit tests and manual tests
![image](https://user-images.githubusercontent.com/5399861/47782885-3dd5d400-dd3c-11e8-8586-59a8c15c7020.png)
![image](https://user-images.githubusercontent.com/5399861/47782899-4928ff80-dd3c-11e8-9d2d-ba9580ba4301.png)

Closes apache#22903 from wangyum/SPARK-24196.

Authored-by: Yuming Wang <[email protected]>
Signed-off-by: gatorsmile <[email protected]>
@shgriffi
Copy link

shgriffi commented Oct 4, 2019

Has this PR shown up in anything except Master? I don't seem to see it in 2.4.4. Any reason?

@wangyum wangyum deleted the SPARK-24196 branch October 8, 2019 04:34
@shgriffi
Copy link

shgriffi commented Oct 8, 2019

Ah ok thanks @wangyum for the update. I think we'll need to do a manual patch for now in 2.4.4 then until 3.0 comes out. However we'll look at the 3.0 preview as soon as it's available.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

6 participants