Skip to content

Conversation

@wangyum
Copy link
Member

@wangyum wangyum commented Sep 7, 2018

What changes were proposed in this pull request?

How to reproduce:

spark.sql("CREATE TABLE tbl(id long)")
spark.sql("INSERT OVERWRITE TABLE tbl VALUES 4")
spark.sql("CREATE VIEW view1 AS SELECT id FROM tbl")
spark.sql(s"INSERT OVERWRITE LOCAL DIRECTORY '/tmp/spark/parquet' " +
  "STORED AS PARQUET SELECT ID FROM view1")
spark.read.parquet("/tmp/spark/parquet").schema
scala> spark.read.parquet("/tmp/spark/parquet").schema
res10: org.apache.spark.sql.types.StructType = StructType(StructField(id,LongType,true))

The schema should be StructType(StructField(ID,LongType,true)) as we SELECT ID FROM view1.

This pr fix this issue.

How was this patch tested?

unit tests

@wangyum
Copy link
Member Author

wangyum commented Sep 7, 2018

cc @gengliangwang

}

test("Insert overwrite directory should output correct schema") {
withSQLConf(CONVERT_METASTORE_PARQUET.key -> "false") {
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Add withTable("tbl") { here.

Copy link
Member

@gengliangwang gengliangwang left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM. Thanks for the fix!

@SparkQA
Copy link

SparkQA commented Sep 7, 2018

Test build #95788 has finished for PR 22359 at commit ff78fdb.

  • This patch passes all tests.
  • This patch merges cleanly.
  • This patch adds no public classes.

@SparkQA
Copy link

SparkQA commented Sep 7, 2018

Test build #95791 has finished for PR 22359 at commit 8e60b98.

  • This patch passes all tests.
  • This patch merges cleanly.
  • This patch adds no public classes.

@dongjoon-hyun
Copy link
Member

Hi, @wangyum .

  • Ur, I know SPARK-25313 has some information, but could you make the PR description more complete? The following PR description is just a repetition of the title. :)
Fix InsertIntoHiveDirCommand output schema issue.
  • nit, FOLLOW-UP] -> [FOLLOW-UP]?

}
}

test("Insert overwrite directory should output correct schema") {
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Since this is a bug fix, can we have SPARK-25313 prefix?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Also added here?

test("Insert overwrite Hive table should output correct schema") {

test("Create Hive table as select should output correct schema") {

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

In this PR, let's handle this test case only.

@wangyum wangyum changed the title [SPARK-25313][SQL]FOLLOW-UP] Fix InsertIntoHiveDirCommand output schema issue [SPARK-25313][SQL][FOLLOW-UP] Fix InsertIntoHiveDirCommand output schema issue Sep 8, 2018
@dongjoon-hyun
Copy link
Member

dongjoon-hyun commented Sep 8, 2018

I removed my previous comment. It seems to be the Parquet behavior from the beginning of this command at 2.3.0. I was confused because it's different from ORC.

@SparkQA
Copy link

SparkQA commented Sep 8, 2018

Test build #95818 has finished for PR 22359 at commit 71f382b.

  • This patch passes all tests.
  • This patch merges cleanly.
  • This patch adds no public classes.

@dongjoon-hyun
Copy link
Member

Since this is related to Parquet behavior only, can we have in Parquet at the end of title specifically?

@wangyum wangyum changed the title [SPARK-25313][SQL][FOLLOW-UP] Fix InsertIntoHiveDirCommand output schema issue [SPARK-25313][SQL][FOLLOW-UP] Fix InsertIntoHiveDirCommand output schema in Parquet issue Sep 9, 2018
@wangyum
Copy link
Member Author

wangyum commented Sep 10, 2018

cc @cloud-fan

@cloud-fan
Copy link
Contributor

thanks, merging to master/2.4!

asfgit pushed a commit that referenced this pull request Sep 10, 2018
…ema in Parquet issue

## What changes were proposed in this pull request?

How to reproduce:
```scala
spark.sql("CREATE TABLE tbl(id long)")
spark.sql("INSERT OVERWRITE TABLE tbl VALUES 4")
spark.sql("CREATE VIEW view1 AS SELECT id FROM tbl")
spark.sql(s"INSERT OVERWRITE LOCAL DIRECTORY '/tmp/spark/parquet' " +
  "STORED AS PARQUET SELECT ID FROM view1")
spark.read.parquet("/tmp/spark/parquet").schema
scala> spark.read.parquet("/tmp/spark/parquet").schema
res10: org.apache.spark.sql.types.StructType = StructType(StructField(id,LongType,true))
```
The schema should be `StructType(StructField(ID,LongType,true))` as we `SELECT ID FROM view1`.

This pr fix this issue.

## How was this patch tested?

unit tests

Closes #22359 from wangyum/SPARK-25313-FOLLOW-UP.

Authored-by: Yuming Wang <[email protected]>
Signed-off-by: Wenchen Fan <[email protected]>
(cherry picked from commit f8b4d5a)
Signed-off-by: Wenchen Fan <[email protected]>
@asfgit asfgit closed this in f8b4d5a Sep 10, 2018
asfgit pushed a commit that referenced this pull request Sep 11, 2018
…and output schema in Parquet issue

## What changes were proposed in this pull request?

Backport #22359 to branch-2.3.

## How was this patch tested?

unit tests

Closes #22387 from wangyum/SPARK-25313-FOLLOW-UP-branch-2.3.

Authored-by: Yuming Wang <[email protected]>
Signed-off-by: Dongjoon Hyun <[email protected]>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

5 participants