Skip to content

Conversation

@mgaido91
Copy link
Contributor

@mgaido91 mgaido91 commented May 18, 2018

What changes were proposed in this pull request?

The interpreted evaluation of several collection operations works only for simple datatypes. For complex data types, for instance, array_contains it returns always false. The list of the affected functions is array_contains, array_position, element_at and GetMapValue.

The PR fixes the behavior for all the datatypes.

How was this patch tested?

added UT

@mgaido91
Copy link
Contributor Author

cc @cloud-fan @kiszk @ueshin

override def dataType: DataType = BooleanType

@transient private lazy val ordering: Ordering[Any] =
TypeUtils.getInterpretedOrdering(right.dataType)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Then in checkInputDataTypes we should check if there is Ordering for right.dataType. Otherwise for example MapType will throw a match error.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

yes, thanks

if (v == null) {
hasNull = true
} else if (v == value) {
} else if (ordering.equiv(v, value)) {
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Previously does this work for Map? No?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

MapType is not supported in comparison, even =

@ueshin
Copy link
Member

ueshin commented May 18, 2018

ArrayPosition has the same problem? cc @kiszk

@kiszk
Copy link
Member

kiszk commented May 18, 2018

Great catch, thanks.
@ueshin, I will work for ArrayPosition.

@mgaido91
Copy link
Contributor Author

yes @ueshin , I can fix ArrayPosition in this PR if you want

@kiszk
Copy link
Member

kiszk commented May 18, 2018

@mgaido91 Thank you very much. Can I ask you to fix ArrayPosition in this PR, too?

@mgaido91
Copy link
Contributor Author

sure, thanks @kiszk. Sorry, I saw your comment only now, probably we were writing at the same time :)

@mgaido91 mgaido91 changed the title [SPARK-24313][SQL] Fix array_contains interpreted evaluation for complex types [SPARK-24313][SQL] Fix array_contains/array_position interpreted evaluation for complex types May 18, 2018
@ueshin
Copy link
Member

ueshin commented May 18, 2018

We should also fix GetMapValueUtil.getValueEval()?

@mgaido91
Copy link
Contributor Author

yes @ueshin , great catch! I am fixing it too.

@mgaido91 mgaido91 changed the title [SPARK-24313][SQL] Fix array_contains/array_position interpreted evaluation for complex types [SPARK-24313][SQL] Fix collection operations' interpreted evaluation for complex types May 18, 2018
@SparkQA
Copy link

SparkQA commented May 18, 2018

Test build #90785 has finished for PR 21361 at commit 35370cd.

  • This patch passes all tests.
  • This patch merges cleanly.
  • This patch adds no public classes.

"Arguments must be an array followed by a value of same type as the array members")
} else {
TypeCheckResult.TypeCheckSuccess
if (RowOrdering.isOrderable(right.dataType)) {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

we can call TypeUtils.checkForOrderingExpr

@cloud-fan
Copy link
Contributor

good catch! this is a long-standing bug...

@SparkQA
Copy link

SparkQA commented May 18, 2018

Test build #90788 has finished for PR 21361 at commit b7d7249.

  • This patch passes all tests.
  • This patch merges cleanly.
  • This patch adds no public classes.

@SparkQA
Copy link

SparkQA commented May 18, 2018

Test build #90790 has finished for PR 21361 at commit a819a97.

  • This patch passes all tests.
  • This patch merges cleanly.
  • This patch adds no public classes.

@SparkQA
Copy link

SparkQA commented May 18, 2018

Test build #90794 has finished for PR 21361 at commit 71916d9.

  • This patch passes all tests.
  • This patch merges cleanly.
  • This patch adds no public classes.

test("SPARK-24313: support complex types as map keys") {
val mb0 = Literal.create(
Map(Array[Byte](1, 2) -> "1", Array[Byte](3, 4) -> null, Array[Byte](2, 1) -> "2"),
MapType(BinaryType, StringType))
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

shall we test ArrayTypeto reflect the test name?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I changed the test name here and the comment for ElementAt because I found an issue in using arrays as keys for map. When I do that, I get:

Caused by: java.lang.ClassCastException: scala.collection.immutable.$colon$colon cannot be cast to org.apache.spark.sql.catalyst.util.ArrayData

I will investigate this further, but I think this is a separate issue and should be addressed in another PR, what do you think?

// test complex types as keys
val mb0 = Literal.create(
Map(Array[Byte](1, 2) -> "1", Array[Byte](3, 4) -> null, Array[Byte](2, 1) -> "2"),
MapType(BinaryType, StringType))
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

ditto

ArrayType(BinaryType))
val b3 = Literal.create(Seq[Array[Byte]](null, Array[Byte](1, 2)),
ArrayType(BinaryType))
val be = Literal.create(Array[Byte](1, 2), BinaryType)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

ditto, binary type is not complex type

@SparkQA
Copy link

SparkQA commented May 20, 2018

Test build #90857 has finished for PR 21361 at commit c74ddd7.

  • This patch fails Spark unit tests.
  • This patch merges cleanly.
  • This patch adds no public classes.

@kiszk
Copy link
Member

kiszk commented May 20, 2018

retest this please

@SparkQA
Copy link

SparkQA commented May 20, 2018

Test build #90863 has finished for PR 21361 at commit c74ddd7.

  • This patch passes all tests.
  • This patch merges cleanly.
  • This patch adds no public classes.

@cloud-fan
Copy link
Contributor

LGTM, can we add an end-to-end test case for GetMapValue? This bug can be triggered with constant folding.

@mgaido91
Copy link
Contributor Author

@cloud-fan sorry but I am not sure I got it. May you please provide me some more details about the end-to-end test case for GetMapValue you want me to add? Thanks.

@cloud-fan
Copy link
Contributor

cloud-fan commented May 21, 2018

e.g.

val binaryLiteral = lit(Array[Byte](1.toByte))
spark.range(1).select(map(binaryLiteral, lit(1)).getItem(Array[Byte](1.toByte))).show
+--------------------+
|map(X'01', 1)[X'01']|
+--------------------+
|                null|
+--------------------+

we should use checkAnswer in the test instead of show

@mgaido91
Copy link
Contributor Author

thank you for your kind explanation @cloud-fan. I added the UT you suggested. Thanks.

assert(complexData.filter(complexData("m").getItem("1") === 1).count() == 1)
assert(complexData.filter(complexData("s").getField("key") === 1).count() == 1)

// SPARK-24313: access binary keys
Copy link
Contributor

@cloud-fan cloud-fan May 21, 2018

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

can we create a new test case? I think a correctness bug deserves an individual end-to-end test case.

@SparkQA
Copy link

SparkQA commented May 21, 2018

Test build #90900 has finished for PR 21361 at commit 3f8624b.

  • This patch passes all tests.
  • This patch merges cleanly.
  • This patch adds no public classes.

@SparkQA
Copy link

SparkQA commented May 21, 2018

Test build #90903 has finished for PR 21361 at commit 6315775.

  • This patch passes all tests.
  • This patch merges cleanly.
  • This patch adds no public classes.

@cloud-fan
Copy link
Contributor

thanks, merging to master!

@mgaido91 can you send a new PR for 2.3? thanks!

@asfgit asfgit closed this in d3d1807 May 22, 2018
@mgaido91
Copy link
Contributor Author

sure @cloud-fan, thanks. I will create it in the next days. Thank you.

mgaido91 added a commit to mgaido91/spark that referenced this pull request May 23, 2018
…ed evaluation for complex types

The interpreted evaluation of several collection operations works only for simple datatypes. For complex data types, for instance, `array_contains` it returns always `false`. The list of the affected functions is `array_contains`, `array_position`, `element_at` and `GetMapValue`.

The PR fixes the behavior for all the datatypes.

added UT

Author: Marco Gaido <[email protected]>

Closes apache#21361 from mgaido91/SPARK-24313.
} else {
TypeCheckResult.TypeCheckFailure(s"${elementType.simpleString} cannot be used in comparison.")
}
TypeUtils.checkForOrderingExpr(elementType, s"function $prettyName")
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Also a general suggestion. For these refactoring, we should do it in a separate PR.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

ok, I'll keep this in mind for the future, thanks.

@gatorsmile
Copy link
Member

Thanks for fixing this!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

7 participants