Skip to content

Conversation

@gatorsmile
Copy link
Member

What changes were proposed in this pull request?

Specifying the table schema in DDL formats is needed for different scenarios. For example,

These two PRs need users to use the JSON format to specify the table schema. This is not user friendly.

This PR is to provide a parseTableSchema API in ParserInterface.

How was this patch tested?

Added a test suite TableSchemaParserSuite

@SparkQA
Copy link

SparkQA commented Mar 6, 2017

Test build #73965 has finished for PR 17171 at commit 50f74d2.

  • This patch passes all tests.
  • This patch merges cleanly.
  • This patch adds no public classes.

@gatorsmile
Copy link
Member Author

cc @hvanhovell @cloud-fan

}
}

checkTableSchema("a int", (new StructType).add("a", "int"))
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

nit: new StructType().add...

val e = intercept[ParseException] {
CatalystSqlParser.parseTableSchema("a INT b long")
}.getMessage
assert(e.contains("mismatched input 'b' expecting {<EOF>, '(', ',', 'COMMENT'}"))
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I am not sure we should check this. This is an ANTLR error message.

def parseTableIdentifier(sqlText: String): TableIdentifier

/** Creates StructType for a given SQL string. */
def parseTableSchema(sqlText: String): StructType
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

So the difference between the parseDataType and the parseTableSchema functions is that the latter allows you to parse a comma separated list of field definitions which will preserve the correct Hive metadata? Could you document this?

* Create top level table schema.
*/
protected def createSchema(ctx: ColTypeListContext): StructType = {
def createSchema(ctx: ColTypeListContext): StructType = {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We need to be a little bit careful here. I think we may need to add a rule similar to singleDataType to the grammar and expose its implementation. This to prevent us from silently dropping trailing input, or having a weird error on trailing input; it seems we get a somewhat vague error ATM...

* definitions which will preserve the correct Hive metadata.
*/
override def parseTableSchema(sqlText: String): StructType = parse(sqlText) { parser =>
StructType(astBuilder.visitColTypeList(parser.colTypeList()))
Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@hvanhovell How about this? I am not sure whether I address your concern. Feel free to let me know if you have any concern

@SparkQA
Copy link

SparkQA commented Mar 7, 2017

Test build #74048 has finished for PR 17171 at commit 3ec8483.

  • This patch passes all tests.
  • This patch merges cleanly.
  • This patch adds no public classes.

@SparkQA
Copy link

SparkQA commented Mar 7, 2017

Test build #74052 has finished for PR 17171 at commit 22b7db8.

  • This patch passes all tests.
  • This patch merges cleanly.
  • This patch adds no public classes.

@gatorsmile
Copy link
Member Author

retest this please

@SparkQA
Copy link

SparkQA commented Mar 11, 2017

Test build #74390 has finished for PR 17171 at commit 22b7db8.

  • This patch passes all tests.
  • This patch merges cleanly.
  • This patch adds no public classes.

@gatorsmile
Copy link
Member Author

ping @hvanhovell @cloud-fan

def parse(sql: String): StructType = CatalystSqlParser.parseTableSchema(sql)

def checkTableSchema(tableSchemaString: String, expectedDataType: DataType): Unit = {
test(s"parse ${tableSchemaString.replace("\n", "")}") {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

seems tableSchemaString will never contains \n?

}
}

def intercept(sql: String): Unit =
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'd like to name it assertError

@cloud-fan
Copy link
Contributor

LGTM

@SparkQA
Copy link

SparkQA commented Mar 16, 2017

Test build #74635 has finished for PR 17171 at commit b18ae84.

  • This patch passes all tests.
  • This patch merges cleanly.
  • This patch adds no public classes.

@cloud-fan
Copy link
Contributor

thanks, merging to master!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

4 participants