[SPARK-29503][SQL] Remove conversion CreateNamedStruct to CreateNamedStructUnsafe #26173

HeartSaVioR · 2019-10-19T06:43:37Z

What changes were proposed in this pull request?

There's a case where MapObjects has a lambda function which creates nested struct - unsafe data in safe data struct. In this case, MapObjects doesn't copy the row returned from lambda function (as outmost data type is safe data struct), which misses copying nested unsafe data.

The culprit is that UnsafeProjection.toUnsafeExprs converts CreateNamedStruct to CreateNamedStructUnsafe (this is the only place where CreateNamedStructUnsafe is used) which incurs safe and unsafe being mixed up temporarily, which may not be needed at all at least logically, as it will finally assembly these evaluations to UnsafeRow.

Before the patch

/* 105 */   private ArrayData MapObjects_0(InternalRow i) {
/* 106 */     boolean isNull_1 = i.isNullAt(0);
/* 107 */     ArrayData value_1 = isNull_1 ?
/* 108 */     null : (i.getArray(0));
/* 109 */     ArrayData value_0 = null;
/* 110 */
/* 111 */     if (!isNull_1) {
/* 112 */
/* 113 */       int dataLength_0 = value_1.numElements();
/* 114 */
/* 115 */       ArrayData[] convertedArray_0 = null;
/* 116 */       convertedArray_0 = new ArrayData[dataLength_0];
/* 117 */
/* 118 */
/* 119 */       int loopIndex_0 = 0;
/* 120 */
/* 121 */       while (loopIndex_0 < dataLength_0) {
/* 122 */         value_MapObject_lambda_variable_1 = (int) (value_1.getInt(loopIndex_0));
/* 123 */         isNull_MapObject_lambda_variable_1 = value_1.isNullAt(loopIndex_0);
/* 124 */
/* 125 */         ArrayData arrayData_0 = ArrayData.allocateArrayData(
/* 126 */           -1, 1L, " createArray failed.");
/* 127 */
/* 128 */         mutableStateArray_0[0].reset();
/* 129 */
/* 130 */
/* 131 */         mutableStateArray_0[0].zeroOutNullBytes();
/* 132 */
/* 133 */
/* 134 */         if (isNull_MapObject_lambda_variable_1) {
/* 135 */           mutableStateArray_0[0].setNullAt(0);
/* 136 */         } else {
/* 137 */           mutableStateArray_0[0].write(0, value_MapObject_lambda_variable_1);
/* 138 */         }
/* 139 */         arrayData_0.update(0, (mutableStateArray_0[0].getRow()));
/* 140 */         if (false) {
/* 141 */           convertedArray_0[loopIndex_0] = null;
/* 142 */         } else {
/* 143 */           convertedArray_0[loopIndex_0] = arrayData_0 instanceof UnsafeArrayData? arrayData_0.copy() : arrayData_0;
/* 144 */         }
/* 145 */
/* 146 */         loopIndex_0 += 1;
/* 147 */       }
/* 148 */
/* 149 */       value_0 = new org.apache.spark.sql.catalyst.util.GenericArrayData(convertedArray_0);
/* 150 */     }
/* 151 */     globalIsNull_0 = isNull_1;
/* 152 */     return value_0;
/* 153 */   }

After the patch

/* 104 */   private ArrayData MapObjects_0(InternalRow i) {
/* 105 */     boolean isNull_1 = i.isNullAt(0);
/* 106 */     ArrayData value_1 = isNull_1 ?
/* 107 */     null : (i.getArray(0));
/* 108 */     ArrayData value_0 = null;
/* 109 */
/* 110 */     if (!isNull_1) {
/* 111 */
/* 112 */       int dataLength_0 = value_1.numElements();
/* 113 */
/* 114 */       ArrayData[] convertedArray_0 = null;
/* 115 */       convertedArray_0 = new ArrayData[dataLength_0];
/* 116 */
/* 117 */
/* 118 */       int loopIndex_0 = 0;
/* 119 */
/* 120 */       while (loopIndex_0 < dataLength_0) {
/* 121 */         value_MapObject_lambda_variable_1 = (int) (value_1.getInt(loopIndex_0));
/* 122 */         isNull_MapObject_lambda_variable_1 = value_1.isNullAt(loopIndex_0);
/* 123 */
/* 124 */         ArrayData arrayData_0 = ArrayData.allocateArrayData(
/* 125 */           -1, 1L, " createArray failed.");
/* 126 */
/* 127 */         Object[] values_0 = new Object[1];
/* 128 */
/* 129 */
/* 130 */         if (isNull_MapObject_lambda_variable_1) {
/* 131 */           values_0[0] = null;
/* 132 */         } else {
/* 133 */           values_0[0] = value_MapObject_lambda_variable_1;
/* 134 */         }
/* 135 */
/* 136 */         final InternalRow value_3 = new org.apache.spark.sql.catalyst.expressions.GenericInternalRow(values_0);
/* 137 */         values_0 = null;
/* 138 */         arrayData_0.update(0, value_3);
/* 139 */         if (false) {
/* 140 */           convertedArray_0[loopIndex_0] = null;
/* 141 */         } else {
/* 142 */           convertedArray_0[loopIndex_0] = arrayData_0 instanceof UnsafeArrayData? arrayData_0.copy() : arrayData_0;
/* 143 */         }
/* 144 */
/* 145 */         loopIndex_0 += 1;
/* 146 */       }
/* 147 */
/* 148 */       value_0 = new org.apache.spark.sql.catalyst.util.GenericArrayData(convertedArray_0);
/* 149 */     }
/* 150 */     globalIsNull_0 = isNull_1;
/* 151 */     return value_0;
/* 152 */   }

Why are the changes needed?

This patch fixes the bug described above.

Does this PR introduce any user-facing change?

No.

How was this patch tested?

UT added which fails on master branch and passes on PR.

HeartSaVioR · 2019-10-19T06:48:02Z

Credit to @aaronlewism to report this issue and provide simple reproducer.

Please let me know if we would like to apply checking type recursively and call copy() to all unsafe data structs.

SparkQA · 2019-10-19T07:05:01Z

Test build #112311 has finished for PR 26173 at commit 16e4954.

This patch fails due to an unknown error code, -9.
This patch merges cleanly.
This patch adds no public classes.

HeartSaVioR · 2019-10-19T07:21:24Z

retest this, please

SparkQA · 2019-10-19T10:59:04Z

Test build #112313 has finished for PR 26173 at commit 16e4954.

This patch passes all tests.
This patch merges cleanly.
This patch adds no public classes.

SparkQA · 2019-10-19T11:38:22Z

Test build #112314 has finished for PR 26173 at commit 8b2be71.

This patch passes all tests.
This patch merges cleanly.
This patch adds no public classes.

aaronlewism · 2019-10-19T14:56:34Z

.../main/scala/org/apache/spark/sql/catalyst/expressions/codegen/GenerateUnsafeProjection.scala

+    val runInsideLoop = expressions.exists {
+      case _: LambdaVariable => true
+      case _ => false
+    }
+    val extractValueCode = if (runInsideLoop) {
+      s"$rowWriter.getRow().copy()"
+    } else {
+      s"$rowWriter.getRow()"
+    }


Nice solution! Correct me if I'm wrong, but I believe this means it should be safe to get rid of the top-level copy from applying makeCopyIfInstanceOf in MapObjects

I'm not an expert on SQL area (limited knowledge) so I'm not sure this is only one path - I'd feel safe if experts in SQL area jump in and review the approach. If we are uncertain and don't feel safe to fix for discovered case, I may need to make more change.

Agree that we should fix CodegenContext.generateExpressionAndCopyIfUnsafe. To simplify codegen, maybe we can add copyUnsafeData to InternalRow/ArrayData/MapData.

Just to follow up review comment clearly, there's no CodegenContext,generateExpressionAndCopyIfUnsafe. Maybe you referred to makeCopyIfInstanceOf inside of MapObjects.doGenCode, or suggest to create a new method CodegenContext,generateExpressionAndCopyIfUnsafe?

HeartSaVioR · 2019-10-21T01:07:23Z

cc. @cloud-fan @maropu @viirya

viirya · 2019-10-21T06:25:55Z

sql/core/src/test/scala/org/apache/spark/sql/DataFrameComplexTypeSuite.scala

  }
+
+  test("SPARK-29503 nest unsafe struct inside safe array") {
+    withSQLConf(SQLConf.WHOLESTAGE_CODEGEN_ENABLED.key -> "false") {


This issue is encountered only when whole stage codegen is disabled?

At least yes for provided reproducer. For other case I'm not sure.

I see. For whole stage codegen, CreateNamedStruct is not converted to CreateNamedStructUnsafe, so the nested struct is not unsafe one.

Ah OK got it. Thanks for explanation.

HeartSaVioR · 2019-10-21T12:54:58Z

I've changed the approach to what @cloud-fan had suggested.

It has to pass a data type to check nested struct and copy unsafe data, so it needs to serialize data type to JSON and deserialize in method. Hope this is OK given we only do once per generating code for MapObjects.

I'd like to see whether the direction is OK - if we agree with the direction, I'll try to refine the code as well as add some UTs for new methods, and change the title and description of PR.

In case of rolling back, previous commit head is here:
https://github.com/HeartSaVioR/spark/commits/8b2be7116f61df17c7e538d026a7b82b94bf0f12

SparkQA · 2019-10-21T13:19:44Z

Test build #112391 has finished for PR 26173 at commit 39d0a2e.

This patch fails Spark unit tests.
This patch merges cleanly.
This patch adds no public classes.

SparkQA · 2019-10-21T14:57:26Z

Test build #112395 has finished for PR 26173 at commit 07514a1.

This patch fails Spark unit tests.
This patch merges cleanly.
This patch adds no public classes.

HeartSaVioR · 2019-10-21T21:27:38Z

Looks like escaping JSON doesn't seem to work consistently during compiling generated code.

val typeToJson = StringEscapeUtils.escapeJava(StringEscapeUtils.escapeJson(dataType.json))

DataFrameComplexTypeSuite.SPARK-29503 nest unsafe struct inside safe array

arrayData_0.copyUnsafeData("{\\\"type\\\":\\\"array\\\",\\\"elementType\\\":{\\\"type\\\":\\\"struct\\\",\\\"fields\\\":[{\\\"name\\\":\\\"col1\\\",\\\"type\\\":\\\"integer\\\",\\\"nullable\\\":true,\\\"metadata\\\":{}}]},\\\"containsNull\\\":false}")

/* 140 */         if (false) {
/* 141 */           convertedArray_0[loopIndex_0] = null;
/* 142 */         } else {
/* 143 */           convertedArray_0[loopIndex_0] = arrayData_0.copyUnsafeData("{\"type\":\"array\",\"elementType\":{\"type\":\"struct\",\"fields\":[{\"name\":\"col1\",\"type\":\"integer\",\"nullable\":true,\"metadata\":{}}]},\"containsNull\":false}");
/* 144 */         }

ALSSuite.recommendForAllUsers with k <, = and > num_items

serializefromobject_value_2.copyUnsafeData("{\\\"type\\\":\\\"struct\\\",\\\"fields\\\":[{\\\"name\\\":\\\"_1\\\",\\\"type\\\":\\\"integer\\\",\\\"nullable\\\":false,\\\"metadata\\\":{}},{\\\"name\\\":\\\"_2\\\",\\\"type\\\":{\\\"type\\\":\\\"array\\\",\\\"elementType\\\":\\\"float\\\",\\\"containsNull\\\":false},\\\"nullable\\\":true,\\\"metadata\\\":{}}]}")

/* 112 */         if (serializefromobject_isNull_2) {
/* 113 */           serializefromobject_convertedArray_0[serializefromobject_loopIndex_0] = null;
/* 114 */         } else {
/* 115 */           serializefromobject_convertedArray_0[serializefromobject_loopIndex_0] = serializefromobject_value_2.copyUnsafeData("{"type":"struct","fields":[{"name":"_1","type":"integer","nullable":false,"metadata":{}},{"name":"_2","type":{"type":"array","elementType":"float","containsNull":false},"nullable":true,"metadata":{}}]}");
/* 116 */         }

val typeToJson = StringEscapeUtils.escapeJson(dataType.json)

DataFrameComplexTypeSuite.SPARK-29503 nest unsafe struct inside safe array

arrayData_0.copyUnsafeData("{\"type\":\"array\",\"elementType\":{\"type\":\"struct\",\"fields\":[{\"name\":\"col1\",\"type\":\"integer\",\"nullable\":true,\"metadata\":{}}]},\"containsNull\":false}")

/* 140 */         if (false) {
/* 141 */           convertedArray_0[loopIndex_0] = null;
/* 142 */         } else {
/* 143 */           convertedArray_0[loopIndex_0] = arrayData_0.copyUnsafeData("{"type":"array","elementType":{"type":"struct","fields":[{"name":"col1","type":"integer","nullable":true,"metadata":{}}]},"containsNull":false}");
/* 144 */         }

ALSSuite.recommendForAllUsers with k <, = and > num_items

serializefromobject_value_2.copyUnsafeData("{\"type\":\"struct\",\"fields\":[{\"name\":\"_1\",\"type\":\"integer\",\"nullable\":false,\"metadata\":{}},{\"name\":\"_2\",\"type\":{\"type\":\"array\",\"elementType\":\"float\",\"containsNull\":false},\"nullable\":true,\"metadata\":{}}]}")

/* 112 */         if (serializefromobject_isNull_2) {
/* 113 */           serializefromobject_convertedArray_0[serializefromobject_loopIndex_0] = null;
/* 114 */         } else {
/* 115 */           serializefromobject_convertedArray_0[serializefromobject_loopIndex_0] = serializefromobject_value_2.copyUnsafeData("{"type":"struct","fields":[{"name":"_1","type":"integer","nullable":false,"metadata":{}},{"name":"_2","type":{"type":"array","elementType":"float","containsNull":false},"nullable":true,"metadata":{}}]}");
/* 116 */         }

val typeToJson = StringEscapeUtils.escapeJava(dataType.json)

DataFrameComplexTypeSuite.SPARK-29503 nest unsafe struct inside safe array

arrayData_0.copyUnsafeData("{\"type\":\"array\",\"elementType\":{\"type\":\"struct\",\"fields\":[{\"name\":\"col1\",\"type\":\"integer\",\"nullable\":true,\"metadata\":{}}]},\"containsNull\":false}")

/* 140 */         if (false) {
/* 141 */           convertedArray_0[loopIndex_0] = null;
/* 142 */         } else {
/* 143 */           convertedArray_0[loopIndex_0] = arrayData_0.copyUnsafeData("{"type":"array","elementType":{"type":"struct","fields":[{"name":"col1","type":"integer","nullable":true,"metadata":{}}]},"containsNull":false}");
/* 144 */         }

ALSSuite.recommendForAllUsers with k <, = and > num_items

serializefromobject_value_2.copyUnsafeData("{\"type\":\"struct\",\"fields\":[{\"name\":\"_1\",\"type\":\"integer\",\"nullable\":false,\"metadata\":{}},{\"name\":\"_2\",\"type\":{\"type\":\"array\",\"elementType\":\"float\",\"containsNull\":false},\"nullable\":true,\"metadata\":{}}]}")

/* 112 */         if (serializefromobject_isNull_2) {
/* 113 */           serializefromobject_convertedArray_0[serializefromobject_loopIndex_0] = null;
/* 114 */         } else {
/* 115 */           serializefromobject_convertedArray_0[serializefromobject_loopIndex_0] = serializefromobject_value_2.copyUnsafeData("{"type":"struct","fields":[{"name":"_1","type":"integer","nullable":false,"metadata":{}},{"name":"_2","type":{"type":"array","elementType":"float","containsNull":false},"nullable":true,"metadata":{}}]}");
/* 116 */         }

So double escaping seems to be only working solution for new UT, but consistently fails on ALSSuite - \\ escape is gone. Not sure why.

HeartSaVioR · 2019-10-21T21:38:33Z

Using catalog string instead of json seems to work on both tests. Let me see the build result.

viirya · 2019-10-21T21:53:29Z

sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/expressions/objects/objects.scala

-      s"$value instanceof ${clazz.getSimpleName}? ${value}.copy() : $value"
+    // Make a copy of the unsafe data if the result contains any
+    def makeCopyUnsafeData(dataType: DataType, value: String) = {
+      s"""${value}.copyUnsafeData("${dataType.catalogString}")"""


Is this safe for codgen? For complex type, will we bump into a very long string here?

I guess it depends on how long would we classify as very long string. JSON is much easier to possibly create very long string, catalog string would create less. I agree the approach is questionable, but even getting a value from field requires schema for InternalRow so I'm not sure there's alternative approach for this.

We may be able to let CodeGenerator go through the schema and generate code based on the schema, but I'd suspect the overall amount of generated code could be larger as generated code cannot leverage schema information and we have to generate all the codes accessing and replacing fields recursively.

And looks like string literal has 64k limit, which doesn't seem to be short - it's also the limitation of method length as well. We may be even be able to try to apply trick here, define this in static field or field of class, and assign to one String instance via splitting by chunk (less than 64k) and concatenating to get over the limitation.

So the approach depends on how long of catalog string we are expecting for very complex type.

If we expect this as not considerable size, it would work as it is. We might be able to refine a bit as defining local variable and assign from there, but it may require additional conditional logic.

If we expect less than 64k but considerable size and we don't want to let it consume the length of method limitation, we may need to define it to be one of static fields.

If we expect more than 64k, we should have it to be one of static fields, and also break catalog string down to chunk, and concat to one of String instance.

Do we have some huge and complicated but realistic (in production) schema to test this?

viirya

I feel this current approach seems overkill. Also I suspect possible performance downgrade. Can we avoid convert CreateNamedStruct to CreateNamedStructUnsafe in this special case?

HeartSaVioR · 2019-10-21T23:00:27Z

Would it work if we replace CreateNamedStructUnsafe to CreateNamedStruct in lambda expression of MapObjects?

I'm not sure we don't miss anything as it seems to only deal with struct type, but what we discovered for now is struct type, so the approach makes sense to me for resolving SPARK-29503. Just a view of as non-expert of SQL, current approach makes me feel safer though, in the sense of defensive programming.

I would wait to hear more voices before making change so we don't do back and forth. Thanks for the suggestion!

SparkQA · 2019-10-21T23:34:53Z

Test build #112413 has finished for PR 26173 at commit 4c38ffc.

This patch fails Spark unit tests.
This patch merges cleanly.
This patch adds no public classes.

SparkQA · 2019-10-22T06:10:29Z

Test build #112426 has finished for PR 26173 at commit 6187d99.

This patch passes all tests.
This patch merges cleanly.
This patch adds no public classes.

cloud-fan · 2019-10-22T12:44:25Z

it's safer to make the copy supports mixed safe and unsafe data, but I agree with the perf concern from @viirya .

How many places do we create mixed safe and unsafe data? It looks to me that there is only one: UnsafeProjection.toUnsafeExprs only converts create_struct, but not create_array and create_map.

Furthermore, I don't quite understand why we need CreateNamedStructUnsafe. Can we simply remove it?

… CreateNamedStructUnsafe

HeartSaVioR · 2019-10-22T20:23:54Z

I quickly checked the usage of CreateNamedStructUnsafe, and it is only used in UnsafeProjection.toUnsafeExprs. If UnsafeProjection converts the result of evaluations of expressions into UnsafeRow (it should), I don't see strict need to care about individual expression, at least logically.

So I'm seeing the same, CreateNamedStructUnsafe doesn't seem to be needed at all. As a side effect UnsafeProjection.toUnsafeExprs is also not needed at all as well.

I've just removed them and rebased the branch to contain only the change. The new test passes with new change, so let's see the build result.

Old commit hash to revert: 6187d99

HeartSaVioR · 2019-10-22T20:54:56Z

OFFTOPIC: I've indicated ComplexTypes.scala file is using CRLF instead of LF which leads showing ^M in git diff making me a bit annoying. We don't seem to set any rule on this but personally it might be better if we make it be consistent. Have we discussed this earlier?

viirya · 2019-10-22T22:00:48Z

looks a much simpler. if no other concern, this approach looks good.

SparkQA · 2019-10-23T00:11:02Z

Test build #112485 has finished for PR 26173 at commit d8d3900.

This patch passes all tests.
This patch merges cleanly.
This patch adds the following public classes (experimental):
case class CreateNamedStruct(children: Seq[Expression]) extends Expression

maropu

Sorry, I'm late. The current one looks ok to me, too. I left super minor comments.

maropu · 2019-10-23T01:56:25Z

sql/core/src/test/scala/org/apache/spark/sql/DataFrameComplexTypeSuite.scala

+      assert(result.size === 1)
+      assert(getValueInsideDepth(result.head, 0) === 1)
+      assert(getValueInsideDepth(result.head, 1) === 2)
+      assert(getValueInsideDepth(result.head, 2) === 3)


nit: just check assert(result === Row(Seq(Seq(Row(1)), Seq(Row(2)), Seq(Row(3)))) :: Nil)?

maropu · 2019-10-23T01:56:47Z

sql/core/src/test/scala/org/apache/spark/sql/DataFrameComplexTypeSuite.scala

+
+  test("SPARK-29503 nest unsafe struct inside safe array") {
+    withSQLConf(SQLConf.WHOLESTAGE_CODEGEN_ENABLED.key -> "false") {
+      val exampleDS = spark.sparkContext.parallelize(Seq(Seq(1, 2, 3))).toDF("items")


super...nit.., exampleDF?

or simply df

maropu · 2019-10-23T02:03:27Z

Ur, for better commit logs, can you describe a change of generated code for MapObjects in the PR decription?

HeartSaVioR · 2019-10-23T04:35:10Z

Thanks for reviewing. Just addressed review comments.

maropu · 2019-10-23T04:39:48Z

sql/core/src/test/scala/org/apache/spark/sql/DataFrameComplexTypeSuite.scala

+import scala.collection.mutable
+
 import org.apache.spark.sql.catalyst.DefinedByConstructorParams
+import org.apache.spark.sql.catalyst.expressions.{Expression, GenericRowWithSchema}


nit: plz remove GenericRowWithSchema

maropu · 2019-10-23T04:39:56Z

sql/core/src/test/scala/org/apache/spark/sql/DataFrameComplexTypeSuite.scala


 package org.apache.spark.sql

+import scala.collection.mutable


nit: plz remove this import.

SparkQA · 2019-10-23T07:05:02Z

Test build #112508 has finished for PR 26173 at commit 4460233.

This patch fails due to an unknown error code, -9.
This patch merges cleanly.
This patch adds no public classes.

SparkQA · 2019-10-23T07:05:02Z

Test build #112509 has finished for PR 26173 at commit f10042a.

This patch fails due to an unknown error code, -9.
This patch merges cleanly.
This patch adds no public classes.

cloud-fan · 2019-10-23T07:09:51Z

retest this please

SparkQA · 2019-10-23T11:28:05Z

Test build #112520 has finished for PR 26173 at commit f10042a.

This patch passes all tests.
This patch merges cleanly.
This patch adds no public classes.

cloud-fan · 2019-10-23T16:41:59Z

thanks, merging to master!

HeartSaVioR · 2019-10-23T19:47:16Z

Thanks all for reviewing and merging!

HyukjinKwon · 2019-10-29T07:01:05Z

sql/core/src/test/scala/org/apache/spark/sql/DataFrameComplexTypeSuite.scala

+      // items: Seq[Int] => items.map { item => Seq(Struct(item)) }
+      val result = df.select(
+        new Column(MapObjects(
+          (item: Expression) => array(struct(new Column(item))).expr,


Hm, while fix seems fine to me too, was this only the reproducer? MapObjects is supposed to be internal purpose - it's under catalyst package.

I haven't spent another time to try it (as it seems to be clean and simple reproducer). I'm not sure it's not going to be valid reproducer just due to pulling catalyst package. Catalyst could analyze the query and inject it if necessary in any way.

I indicated you'd like to revisit #25745 - that was WIP and it didn't have any number of performance gain. I'd rather choose "safeness" over "speed", and even we haven't figured out there's outstanding difference between twos. It was the only one case MapObjects could have unsafe struct, by allowing this, safe and unsafe are possibly mixed up leading to encounter corner case.

Yeah, I am not against this change. In that way, I think this fix is fine but wanted to know if this actually affects any user-facing surface.

Was also wondering if we can benefit from #25745 since some investigations look already made there to completely use unsafe one instead.

My guess of #25745 is, it was based on the assumption that it's safe to replace CreateNamedStruct with CreateNamedStructUnsafe as we already have one path to do this - and this observation broke the assumption. IMHO, once we found it's not safe to do this, the improvement has to prove safety before we take its benefits into account.

dongjoon-hyun · 2020-01-25T09:33:55Z

Hi, All.
This seems to exist since 2.1.1, do you think we can have a backport of this?

maropu · 2020-01-26T00:29:51Z

Is this an internal bug? (by referring to the @HyukjinKwon suggestion: https://github.com/apache/spark/pull/26173/files#r339915780)
If so, I think its less worth backporting this.

dongjoon-hyun · 2020-01-26T00:38:49Z

Do you mean that this PR doesn't fix anything to user sides with the final results?

Is this an internal bug?

maropu · 2020-01-26T00:44:24Z

If users don't use internal objects (MapObjects in this case), I think yes.

dongjoon-hyun · 2020-01-26T00:46:00Z

@maropu Please see the example in the JIRA.

It's already exposed to the users.
catalyst is internal means that we can change freely without deprecations or something.

dongjoon-hyun · 2020-01-26T00:46:45Z

Databricks Delta library is also using our internals. So, it had some problems with 3.0.0. It's a frequent situation.

maropu · 2020-01-26T01:01:07Z

Ah, I see. Probably, I don't clearly understand which component correctness should cover (I thought that should do explicitly public APIs only).

dongjoon-hyun · 2020-01-26T01:08:03Z

No~ Thank you for feedback, @maropu . The reason why I'm asking around (or pushing more) is that there was no consensus on it. Since RC1 failure, I'm collecting them including yours.

maropu · 2020-01-26T01:19:26Z

Ah, I see, ok. Thanks for the kind explanation! Yea, its important to discuss more about it.

HeartSaVioR · 2020-01-26T08:46:14Z

I also agree that catalyst thing has been treated as a non-public API, but it seems to be no harm to port back this to 2.4 branch. (And even we treated catalyst as non-public API, looks like end users don't recognize the fact and use it.) No strong opinion though - OK in either way. I can follow up if we decide to port back.

aaronlewism reviewed Oct 19, 2019

View reviewed changes

dongjoon-hyun added the SQL label Oct 19, 2019

viirya reviewed Oct 21, 2019

View reviewed changes

HeartSaVioR force-pushed the SPARK-29503 branch from 8b2be71 to 39d0a2e Compare October 21, 2019 12:43

HeartSaVioR force-pushed the SPARK-29503 branch from 39d0a2e to 07514a1 Compare October 21, 2019 14:07

viirya reviewed Oct 21, 2019

View reviewed changes

HeartSaVioR force-pushed the SPARK-29503 branch from 4c38ffc to 6187d99 Compare October 22, 2019 02:11

[SPARK-29503][SQL] Remove problematic conversion CreateNamedStruct to…

d8d3900

… CreateNamedStructUnsafe

HeartSaVioR force-pushed the SPARK-29503 branch from 6187d99 to d8d3900 Compare October 22, 2019 20:23

HeartSaVioR changed the title ~~[SPARK-29503][SQL] Copy result row from RowWriter in GenerateUnsafeProjection when the expression is lambdaFunction in MapObject~~ [SPARK-29503][SQL] Remove conversion CreateNamedStruct to CreateNamedStructUnsafe Oct 22, 2019

HeartSaVioR mentioned this pull request Oct 22, 2019

[SPARK-29033][SQL][WIP] Always use UnsafeRow-based version of CreateNamedStruct #25745

Closed

maropu approved these changes Oct 23, 2019

View reviewed changes

reflect review comments

4460233

maropu approved these changes Oct 23, 2019

View reviewed changes

fix nits

f10042a

cloud-fan closed this in bfbf282 Oct 23, 2019

HeartSaVioR deleted the SPARK-29503 branch October 23, 2019 19:47

HyukjinKwon reviewed Oct 29, 2019

View reviewed changes

[SPARK-29503][SQL] Remove conversion CreateNamedStruct to CreateNamedStructUnsafe #26173

[SPARK-29503][SQL] Remove conversion CreateNamedStruct to CreateNamedStructUnsafe #26173

Uh oh!

Conversation

HeartSaVioR commented Oct 19, 2019 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

What changes were proposed in this pull request?

Why are the changes needed?

Does this PR introduce any user-facing change?

How was this patch tested?

Uh oh!

HeartSaVioR commented Oct 19, 2019 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

SparkQA commented Oct 19, 2019

Uh oh!

HeartSaVioR commented Oct 19, 2019

Uh oh!

SparkQA commented Oct 19, 2019

Uh oh!

SparkQA commented Oct 19, 2019

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

HeartSaVioR commented Oct 21, 2019

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

HeartSaVioR commented Oct 21, 2019 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

SparkQA commented Oct 21, 2019

Uh oh!

SparkQA commented Oct 21, 2019

Uh oh!

HeartSaVioR commented Oct 21, 2019

Uh oh!

HeartSaVioR commented Oct 21, 2019

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

viirya left a comment

Choose a reason for hiding this comment

Uh oh!

HeartSaVioR commented Oct 21, 2019 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

SparkQA commented Oct 21, 2019

Uh oh!

SparkQA commented Oct 22, 2019

Uh oh!

cloud-fan commented Oct 22, 2019 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

HeartSaVioR commented Oct 22, 2019

Uh oh!

HeartSaVioR commented Oct 22, 2019

Uh oh!

viirya commented Oct 22, 2019

Uh oh!

SparkQA commented Oct 23, 2019

Uh oh!

HeartSaVioR commented Oct 19, 2019 •

edited

Loading

HeartSaVioR commented Oct 19, 2019 •

edited

Loading

HeartSaVioR commented Oct 21, 2019 •

edited

Loading

HeartSaVioR commented Oct 21, 2019 •

edited

Loading

cloud-fan commented Oct 22, 2019 •

edited

Loading

maropu left a comment •

edited

Loading

maropu commented Oct 23, 2019 •

edited

Loading

maropu Oct 23, 2019 •

edited

Loading

maropu Oct 23, 2019 •

edited

Loading

HeartSaVioR Oct 29, 2019 •

edited

Loading

HyukjinKwon Oct 29, 2019 •

edited

Loading