Skip to content

Conversation

@cloud-fan
Copy link
Contributor

@cloud-fan cloud-fan commented Nov 21, 2016

What changes were proposed in this pull request?

Technically map type is not orderable, but can be used in equality comparison. However, due to the limitation of the current implementation, map type can't be used in equality comparison so that it can't be join key or grouping key.

This PR makes this limitation explicit, to avoid wrong result.

How was this patch tested?

updated tests.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This doesn't make sense, BinaryType is comparable, we should accept it as join key.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We should add a few tests with binary keys, that can also be done in a follow-up. The parser supports binary literals, so we can use SQLQueryTestSuite for this.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I already added one: https://github.com/apache/spark/pull/15956/files#diff-999f50ed13c690eaec243e3b3446af08R468 , yea we should add more end-to-end tests in SQLQueryTestSuite in follow-up

@cloud-fan
Copy link
Contributor Author

cc @yhuai @hvanhovell @gatorsmile

@SparkQA
Copy link

SparkQA commented Nov 21, 2016

Test build #68920 has started for PR 15956 at commit a7b6e42.

@rxin
Copy link
Contributor

rxin commented Nov 21, 2016

So technically I think map type should be able to be used in equality comparison, but not orderable, so I'm not sure if this is correct.

Spark SQL currently has a restriction that we just don't allow map types to be used in equality comparison at all, but I don't want to structure code in a way that would make it harder to support that in the future.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

isn't this "wrong"? map type is not supported here is it?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We need this to satisfy BinaryOperator, the type checking logic is defined by both BinaryOperator and checkInputDataTypes here. Shall we make BinaryComparison not extends BinaryOperator and duplicate its type checking logic here?

@cloud-fan
Copy link
Contributor Author

Spark SQL currently has a restriction that we just don't allow map types to be used in equality comparison at all, but I don't want to structure code in a way that would make it harder to support that in the future.

If we wanna support equality comparison for map type in the future, we can just update the type checking logic for BinaryComparison. For now I think it's better to forbid users to do this, to avoid confusing behaviors, e.g. SELECT MAP(1,2) = MAP(1,2) returns false.

@SparkQA
Copy link

SparkQA commented Nov 21, 2016

Test build #68932 has finished for PR 15956 at commit b95e25f.

  • This patch fails Spark unit tests.
  • This patch merges cleanly.
  • This patch adds no public classes.

@SparkQA
Copy link

SparkQA commented Nov 21, 2016

Test build #68938 has finished for PR 15956 at commit fe6818b.

  • This patch fails Spark unit tests.
  • This patch merges cleanly.
  • This patch adds no public classes.

@SparkQA
Copy link

SparkQA commented Nov 21, 2016

Test build #68940 has finished for PR 15956 at commit c3275ed.

  • This patch passes all tests.
  • This patch merges cleanly.
  • This patch adds no public classes.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

why can't this be expressed? isn't the old ordered a way to express it?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

because TypeCollection.Ordered doesn't cover all the cases, e.g. ArrayType and StructType are orderable.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

hmm, Hive only supports atomic type, shall we follow it?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

FYI, postgres supports binary comparison on array type.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Semantically, this is different from the previous lilmits:

override def inputType: AbstractDataType = TypeCollection.Ordered

We add the support of StructType, ArrayType and UDT. It sounds like no test case coverage. Could you please add them?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

let me minimize the size of this PR, I'll keep GreaterThan, LessThan, etc. unchanged, and only update EqualTo, EqualNullSafe to forbid map types.

@cloud-fan cloud-fan changed the title [SPARK-18519][SQL] map type can not be used in binary comparison [SPARK-18519][SQL] map type can not be used in EqualTo Nov 22, 2016
}
}

private def hasMapType(dt: DataType): Boolean = dt match {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

dt.existsRecursively(_.isInstanceOf[MapType])?


override def inputType: AbstractDataType = AnyDataType

override def checkInputDataTypes(): TypeCheckResult = {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Minor: Can we deduplicate this?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

well it's only duplicated twice and will be removed after #15970 , I think it should be fine.

Copy link
Contributor

@hvanhovell hvanhovell left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

A few minor things, otherwise LGTM

@SparkQA
Copy link

SparkQA commented Nov 22, 2016

Test build #68997 has finished for PR 15956 at commit b5a50f4.

  • This patch passes all tests.
  • This patch merges cleanly.
  • This patch adds no public classes.

@hvanhovell
Copy link
Contributor

LGTM pending Jenkins

@SparkQA
Copy link

SparkQA commented Nov 22, 2016

Test build #69002 has finished for PR 15956 at commit dff0b08.

  • This patch passes all tests.
  • This patch merges cleanly.
  • This patch adds no public classes.

@hvanhovell
Copy link
Contributor

hvanhovell commented Nov 22, 2016

LGTM - merging to master/2.1. Thanks!

asfgit pushed a commit that referenced this pull request Nov 22, 2016
## What changes were proposed in this pull request?

Technically map type is not orderable, but can be used in equality comparison. However, due to the limitation of the current implementation, map type can't be used in equality comparison so that it can't be join key or grouping key.

This PR makes this limitation explicit, to avoid wrong result.

## How was this patch tested?

updated tests.

Author: Wenchen Fan <[email protected]>

Closes #15956 from cloud-fan/map-type.

(cherry picked from commit bb152cd)
Signed-off-by: Herman van Hovell <[email protected]>
@asfgit asfgit closed this in bb152cd Nov 22, 2016
@hvanhovell
Copy link
Contributor

@cloud-fan I cannot merge this to 2.0. Can you make a backport if we need one?

@cloud-fan
Copy link
Contributor Author

Yea, I'll backport

asfgit pushed a commit that referenced this pull request Nov 23, 2016
## What changes were proposed in this pull request?

Technically map type is not orderable, but can be used in equality comparison. However, due to the limitation of the current implementation, map type can't be used in equality comparison so that it can't be join key or grouping key.

This PR makes this limitation explicit, to avoid wrong result.

backport #15956 to 2.0

## How was this patch tested?
updated tests

Author: Wenchen Fan <[email protected]>

Closes #15988 from cloud-fan/map-type.
robert3005 pushed a commit to palantir/spark that referenced this pull request Dec 2, 2016
## What changes were proposed in this pull request?

Technically map type is not orderable, but can be used in equality comparison. However, due to the limitation of the current implementation, map type can't be used in equality comparison so that it can't be join key or grouping key.

This PR makes this limitation explicit, to avoid wrong result.

## How was this patch tested?

updated tests.

Author: Wenchen Fan <[email protected]>

Closes apache#15956 from cloud-fan/map-type.
uzadude pushed a commit to uzadude/spark that referenced this pull request Jan 27, 2017
## What changes were proposed in this pull request?

Technically map type is not orderable, but can be used in equality comparison. However, due to the limitation of the current implementation, map type can't be used in equality comparison so that it can't be join key or grouping key.

This PR makes this limitation explicit, to avoid wrong result.

## How was this patch tested?

updated tests.

Author: Wenchen Fan <[email protected]>

Closes apache#15956 from cloud-fan/map-type.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

5 participants