Skip to content
Closed
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
Original file line number Diff line number Diff line change
Expand Up @@ -53,7 +53,7 @@ case class CreateArray(children: Seq[Expression]) extends Expression {
* Returns a Row containing the evaluation of all children expressions.
* TODO: [[CreateStruct]] does not support codegen.
*/
case class CreateStruct(children: Seq[NamedExpression]) extends Expression {
case class CreateStruct(children: Seq[Expression]) extends Expression {

override def foldable: Boolean = children.forall(_.foldable)

Expand All @@ -62,9 +62,14 @@ case class CreateStruct(children: Seq[NamedExpression]) extends Expression {
override lazy val dataType: StructType = {
assert(resolved,
s"CreateStruct contains unresolvable children: ${children.filterNot(_.resolved)}.")
val fields = children.map { child =>
StructField(child.name, child.dataType, child.nullable, child.metadata)
}
val fields = children.zipWithIndex.map { case (child, idx) =>
child match {
case ne: NamedExpression =>
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

How about StructField(s"col${idx + 1}", child.dataType, child.nullable, Metadata.empty) as Hive does (GenericUDFStruct)

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The CreateStruct here was also used by two org.apache.spark.sql.functions.struct and I did this above to avoid change the semantic of these two struct method. Maybe it's better to add another two class to act as GenericUDFStruct & GenericUDFNamedStruct?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I will not argue that they have some share logic, however, the difference between GenericUDFStruct and GenericUDFNamedStruct is, the former gives names for output fields as (col1, col2 ...), but the later requires specifying the output fields names from its arguments.
Probably we'd better create another expression class in Spark SQL for GenericUDFNamedStruct, mixing the functionality in a single class probably makes people confusing.

It will be great if you can create an issue under the expression umbrella task https://issues.apache.org/jira/browse/SPARK-8159

What do you think?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Agree, I would create a new ticket for named_struct in JIRA and take all discussions we had above into consideration. Thanks.

StructField(ne.name, ne.dataType, ne.nullable, ne.metadata)
case _ =>
StructField(s"col${idx + 1}", child.dataType, child.nullable, Metadata.empty)
}
}
StructType(fields)
}

Expand Down
Original file line number Diff line number Diff line change
Expand Up @@ -933,7 +933,7 @@ class HiveCompatibilitySuite extends HiveQueryFileTest with BeforeAndAfter {
"udf_stddev_pop",
"udf_stddev_samp",
"udf_string",
// "udf_struct", TODO: FIX THIS and enable it.
"udf_struct",
"udf_substring",
"udf_subtract",
"udf_sum",
Expand Down