-
Notifications
You must be signed in to change notification settings - Fork 28.9k
[SPARK-24196][SQL] Implement Spark's own GetSchemasOperation #22903
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
| override def mode: ServerMode.Value = ServerMode.binary | ||
|
|
||
| test("Spark's own GetSchemasOperation(SparkGetSchemasOperation)") { | ||
| def testGetSchemasOperation( |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This test mimic HiveDatabaseMetaData. getSchemas().
|
Test build #98311 has finished for PR 22903 at commit
|
|
cc @gatorsmile |
| schemaName: String): GetSchemasOperation = synchronized { | ||
| val sqlContext = sessionToContexts.get(parentSession.getSessionHandle) | ||
| require(sqlContext != null, s"Session handle: ${parentSession.getSessionHandle} has not been" + | ||
| s" initialized or had already closed.") |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
nit .. : s can be removed.
| } | ||
|
|
||
| try { | ||
| catalog.listDatabases(convertSchemaPattern(schemaName)).foreach { dbName => |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@wangyum, IIRC, not only DBs but other metadata are not being shown as well (correct me if I am wrong because I tested it a while ago so can't exactly remember). Can you double check if only database is missing?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
+1, that's true, none of the metadata operations is implemented currently..
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yes. I think implementing GetSchemasOperation, GetTablesOperation and GetColumnsOperation is enough.
| val catalog: SessionCatalog = sqlContext.sessionState.catalog | ||
|
|
||
| private final val RESULT_SET_SCHEMA = new TableSchema() | ||
| .addStringColumn("TABLE_SCHEM", "Schema name.") |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
nit: TABLE_SCHEMA?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It's copy from GetSchemasOperation.java#L49 and HiveDatabaseMetaData also use TABLE_SCHEMA.
| } | ||
|
|
||
| try { | ||
| catalog.listDatabases(convertSchemaPattern(schemaName)).foreach { dbName => |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
+1, that's true, none of the metadata operations is implemented currently..
|
|
||
| override def getNextRowSet(order: FetchOrientation, maxRows: Long): RowSet = { | ||
| validateDefaultFetchOrientation(order) | ||
| assertState(OperationState.FINISHED) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
What is the reason you need to change the order between line 87 and 88?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This copy from SparkExecuteStatementOperation. Has reverted it.
Lines 112 to 114 in b2e7677
| validateDefaultFetchOrientation(order) | |
| assertState(OperationState.FINISHED) | |
| setHasResultSet(true) |
|
|
||
| override def cancel(): Unit = { | ||
| logInfo(s"Cancel get schemas with $statementId") | ||
| setState(OperationState.CANCELED) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Why not calling the default cancel()?
|
|
||
| override def close(): Unit = { | ||
| logInfo(s"Close get schemas with $statementId") | ||
| setState(OperationState.CLOSED) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The same here.
|
Test build #100526 has finished for PR 22903 at commit
|
|
retest this please |
|
Test build #100527 has finished for PR 22903 at commit
|
| } | ||
| } | ||
|
|
||
| override def getNextRowSet(orientation: FetchOrientation, maxRows: Long): RowSet = { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
What happens if we do not override getNextRowSet ?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It won't get the result:
[info] - Spark's own GetSchemasOperation(SparkGetSchemasOperation) *** FAILED *** (4 seconds, 364 milliseconds)
[info] rs.next() was false (SparkMetadataOperationSuite.scala:78)
[info] org.scalatest.exceptions.TestFailedException:
[info] at org.scalatest.Assertions$class.newAssertionFailedException(Assertions.scala:528)
[info] at org.scalatest.FunSuite.newAssertionFailedException(FunSuite.scala:1560)
[info] at org.scalatest.Assertions$AssertionsHelper.macroAssert(Assertions.scala:501)
[info] at org.apache.spark.sql.hive.thriftserver.SparkMetadataOperationSuite$$anonfun$1$$anonfun$org$apache$spark$sql$hive$thriftserver$SparkMetadataOperationSuite$$anonfun$$checkResult$1$1.apply(SparkMetadataOperationSuite.scala:78)
[info] at org.apache.spark.sql.hive.thriftserver.SparkMetadataOperationSuite$$anonfun$1$$anonfun$org$apache$spark$sql$hive$thriftserver$SparkMetadataOperationSuite$$anonfun$$checkResult$1$1.apply(SparkMetadataOperationSuite.scala:77)
...There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Let us remove this. You just need to change this line:
Line 44 in 5264164
| private RowSet rowSet; |
to
protected RowSet rowSet;|
|
||
| val catalog: SessionCatalog = sqlContext.sessionState.catalog | ||
|
|
||
| private final val RESULT_SET_SCHEMA = new TableSchema() |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Also this can be removed too.
| .addStringColumn("TABLE_SCHEM", "Schema name.") | ||
| .addStringColumn("TABLE_CATALOG", "Catalog name.") | ||
|
|
||
| private val rowSet = RowSetFactory.create(RESULT_SET_SCHEMA, getProtocolVersion) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This can be removed.
|
I think this PR is pretty close to be merged. Thanks for your effort! @wangyum |
|
Test build #100669 has finished for PR 22903 at commit
|
| .addStringColumn("TABLE_CATALOG", "Catalog name."); | ||
|
|
||
| private RowSet rowSet; | ||
| protected RowSet rowSet; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Hm ........ but I noticed here is o.a.hive side code, @gatorsmile. Wouldn't it be better to avoid some changes here to prepare to get rid of Hive stuff later completely?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We do not have a plan to remove the thrift-server and use the Hive jar. Instead, I think we need to enhance the current thrift-server implementation.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I actually meant getting rid of Hive fork stuff by upgrading Hive to another version later. Those changes would be some conflicts when upgrading from 1.2.1 to upper Hive versions basically.
Considering it's one line change, okie. I already see some changes are made there. Ok to me.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
7feeb82 shows we want to further cleanup and improve the thrift-server. Even if https://issues.apache.org/jira/browse/HIVE-16391 is resolved, we will still keep the Hive thrift-server.
|
retest this please |
|
Test build #100907 has finished for PR 22903 at commit
|
|
Thanks! Merged to master. |
## What changes were proposed in this pull request? This PR fix SQL Client tools can't show DBs by implementing Spark's own `GetSchemasOperation`. ## How was this patch tested? unit tests and manual tests   Closes apache#22903 from wangyum/SPARK-24196. Authored-by: Yuming Wang <[email protected]> Signed-off-by: gatorsmile <[email protected]>
## What changes were proposed in this pull request? This PR fix SQL Client tools can't show DBs by implementing Spark's own `GetSchemasOperation`. ## How was this patch tested? unit tests and manual tests   Closes apache#22903 from wangyum/SPARK-24196. Authored-by: Yuming Wang <[email protected]> Signed-off-by: gatorsmile <[email protected]>
|
Has this PR shown up in anything except Master? I don't seem to see it in 2.4.4. Any reason? |
|
@shgriffi This is a new feature of Spark 3.0. We will release the Spark 3.0 preview soon. https://issues.apache.org/jira/browse/SPARK-28426 |
|
Ah ok thanks @wangyum for the update. I think we'll need to do a manual patch for now in 2.4.4 then until 3.0 comes out. However we'll look at the 3.0 preview as soon as it's available. |
What changes were proposed in this pull request?
This PR fix SQL Client tools can't show DBs by implementing Spark's own
GetSchemasOperation.How was this patch tested?
unit tests and manual tests

