[SPARK-6322][SQL] CTAS should consider the case where no file format or storage handler is given #5014

viirya · 2015-03-13T15:13:12Z

When creating CreateTableAsSelect in HiveQl, it doesn't consider the case where no file format or storage handler is given. So later in CreateTables, one of CreateTableAsSelect cases will never be run. The two CreateTableAsSelect cases are basically the same codes except for checking CreateTableDesc. This pr fixes this issue.

SparkQA · 2015-03-13T16:35:29Z

Test build #28567 has finished for PR 5014 at commit d3c9f6b.

This patch passes all tests.
This patch merges cleanly.
This patch adds no public classes.

chenghao-intel · 2015-03-13T17:33:55Z

sql/hive/src/main/scala/org/apache/spark/sql/hive/HiveQl.scala

That's my bad I didn't drop a comment here.
Actually we are not using the ignores, previously we were trying the parse the storage format etc. information within HiveQl.scala, however, the logic is quite complicated and error-prone, and then we decided to reuse the Hive code for that purpose, that's why we always pass down the node (ASTNode) for further analysis. The node is required to be passed down even without storage format specified, Hive will provide a default storage format for that during the analysis. So I don't think the change here is reasonable.

No. Actually, ignores could be None here when no file format or storage handler is given.

Even without the storage handler specified, Hive still need the node to get the default format.
See https://github.com/apache/spark/pull/5014/files#diff-ee66e11b56c21364760a5ed2b783f863L539

When it is None and hive.convertCTAS is true, Hive CTAS statement will become a data source table with default data source. Please refer to CreateTables in HiveMetastoreCatalog.scala.

Otherwise it will fall into the default format specified by hive.conf.defaultDataSourceName, which probably not the Hive default behavior.

In CreateTables, the node is parsed by Hive's codes and see if file format or storage handler is given. If no, and hive.convertCTAS is true, Hive CTAS statement will become a data source table with default data source, too.

@chenghao-intel Please refer to #4639, which uses Parquet as the default file format for CTAS statements.

Oh, ok , I see, thanks for the explanation, but what if hive.convertCTAS is false and user didn't specify the storage format in HiveQl? How can we get the default Hive storage format?

Yes. In that case, the parsed CreateTableDesc will be passed to execution.CreateTableAsSelect.

viirya · 2015-03-18T04:04:01Z

@yhuai @chenghao-intel I updated it. Please take a look when you have time. Thanks!

SparkQA · 2015-03-18T05:22:29Z

Test build #28769 has finished for PR 5014 at commit 5b45999.

This patch passes all tests.
This patch merges cleanly.
This patch adds no public classes.

viirya · 2015-03-20T17:52:53Z

@yhuai Any other comments?

viirya · 2015-03-23T10:20:46Z

/cc @yhuai @liancheng Can you take a look of this if you have time? Thanks!

liancheng · 2015-03-23T10:56:30Z

Not super familiar with this part of code. But according to the context and discussion, I think this change makes sense. @yhuai Could you help confirming this?

chenghao-intel · 2015-03-23T13:40:42Z

LGTM

yhuai · 2015-03-23T16:04:13Z

sql/hive/src/main/scala/org/apache/spark/sql/hive/HiveQl.scala

Is it the only place where we create a CreateTableAsSelect? If so, can we get rid of the Option and just pass the node?

Currently, it is. If we are sure that CreateTableAsSelect is only used by Hive dialect, we can remove the Option.

SparkQA · 2015-03-23T18:20:18Z

Test build #28999 has finished for PR 5014 at commit 5b611cb.

This patch passes all tests.
This patch merges cleanly.
This patch adds no public classes.

yhuai · 2015-03-23T18:33:04Z

sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/plans/logical/basicOperators.scala

Let's get rid of the type parameter and rename it to CreateHiveTableAsSelect (be a little bit more specific on what this one does).

The CreateTableAsSelect is designed as a common logical plan node, that's why I made the desc as T, and also the optional parameter. Otherwise, every SQL dialect will implements it's own CTAS node(logical plan).
Or is the CreateTableUsingAsSelect a more generic interface for the same purpose?

I think CreateTableUsingAsSelect is just for data source API?

I add a specific CreateHiveTableAsSelect but still keepCreateTableAsSelect as a common logical plan for the use of other SQL dialects.

…f CreateTableAsSelect.

SparkQA · 2015-03-24T03:33:28Z

Test build #29045 has finished for PR 5014 at commit 4389607.

This patch passes all tests.
This patch merges cleanly.
This patch adds the following public classes (experimental):
- trait CreateTableAsSelect extends UnaryNode

liancheng · 2015-03-24T06:54:12Z

@yhuai Is this good to go now?

Conflicts: sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/plans/logical/basicOperators.scala

SparkQA · 2015-03-25T19:28:21Z

Test build #29164 has finished for PR 5014 at commit d5fcf67.

This patch passes all tests.
This patch merges cleanly.
This patch adds the following public classes (experimental):
- trait CreateTableAsSelect extends UnaryNode

viirya · 2015-03-26T00:07:43Z

@yhuai please take a look, thanks.

viirya · 2015-03-31T10:26:56Z

@yhuai Can you check if this is ready to merge?

viirya · 2015-04-07T05:56:56Z

@liancheng @yhuai this is quite a while for waiting merging, may you take a look?

marmbrus · 2015-04-11T23:50:32Z

sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/plans/logical/basicOperators.scala

Instead of having this trait here, can we just make the implementation below a Command?

I guess we cannot use Command because a Command is a LeafNode. If we make it a Command, the child logical plan will not be analyzed and optimized.

I do think it will be good to have a different type of Command that is not a LeafNode.

marmbrus · 2015-04-12T00:11:43Z

One minor comment otherwise this LGTM. ping @yhuai

SparkQA · 2015-04-12T08:12:02Z

Test build #30106 has finished for PR 5014 at commit 7d82ede.

This patch passes all tests.
This patch merges cleanly.
This patch adds the following public classes (experimental):
- abstract class CreateTableAsSelect extends UnaryCommand
- abstract class UnaryCommand extends UnaryNode
This patch does not change any dependencies.

viirya · 2015-04-13T02:05:02Z

ping @yhuai

SparkQA · 2015-04-27T18:27:10Z

Test build #31053 has started for PR 5014 at commit 7d82ede.

viirya added 2 commits March 1, 2015 19:39

Combine two CreateTableAsSelect case blocks.

a8d3756

Merge remote-tracking branch 'upstream/master' into fix_hive_ctas

d3c9f6b

chenghao-intel reviewed Mar 13, 2015
View reviewed changes

viirya added 2 commits March 18, 2015 11:57

Merge remote-tracking branch 'upstream/master' into fix_hive_ctas

6cf2cfe

For comment.

5b45999

yhuai reviewed Mar 23, 2015
View reviewed changes

viirya added 2 commits March 24, 2015 00:49

Merge remote-tracking branch 'upstream/master' into fix_hive_ctas

b7c1346

For comment.

5b611cb

yhuai reviewed Mar 23, 2015
View reviewed changes

Add specific node CreateHiveTableAsSelect and remove type parameter o…

4389607

…f CreateTableAsSelect.

Merge remote-tracking branch 'upstream/master' into fix_hive_ctas

d5fcf67

Conflicts: sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/plans/logical/basicOperators.scala

marmbrus reviewed Apr 11, 2015
View reviewed changes

viirya added 2 commits April 12, 2015 14:34

Merge remote-tracking branch 'upstream/master' into fix_hive_ctas

b348094

Add UnaryCommand and make CreateTableAsSelect below it.

7d82ede

viirya closed this May 21, 2015

viirya deleted the fix_hive_ctas branch December 27, 2023 18:31

[SPARK-6322][SQL] CTAS should consider the case where no file format or storage handler is given #5014

[SPARK-6322][SQL] CTAS should consider the case where no file format or storage handler is given #5014

Uh oh!

Conversation

viirya commented Mar 13, 2015

Uh oh!

SparkQA commented Mar 13, 2015

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

viirya commented Mar 18, 2015

Uh oh!

SparkQA commented Mar 18, 2015

Uh oh!

viirya commented Mar 20, 2015

Uh oh!

viirya commented Mar 23, 2015

Uh oh!

liancheng commented Mar 23, 2015

Uh oh!

chenghao-intel commented Mar 23, 2015

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

SparkQA commented Mar 23, 2015

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

SparkQA commented Mar 24, 2015

Uh oh!

liancheng commented Mar 24, 2015

Uh oh!

SparkQA commented Mar 25, 2015

Uh oh!

viirya commented Mar 26, 2015

Uh oh!

viirya commented Mar 31, 2015

Uh oh!

viirya commented Apr 7, 2015

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

marmbrus commented Apr 12, 2015

Uh oh!

SparkQA commented Apr 12, 2015

Uh oh!

viirya commented Apr 13, 2015

Uh oh!

SparkQA commented Apr 27, 2015

Uh oh!

Reviewers

Assignees