Skip to content

Conversation

@OopsOutOfMemory
Copy link
Contributor

  1. Given a query
    select coalesce(null, 1, '1') from dual will cause exception:
    java.lang.RuntimeException: Could not determine return type of Coalesce for IntegerType,StringType
  2. Given a query:
    select case when true then 1 else '1' end from dual will cause exception:
    java.lang.RuntimeException: Types in CASE WHEN must be the same or coercible to a common type: StringType != IntegerType
    I checked the code, the main cause is the HiveTypeCoercion doesn't do implicit convert when there is a IntegerType and StringType.

Numeric types can be promoted to string type

Hive will always do this implicit conversion.

@OopsOutOfMemory
Copy link
Contributor Author

@chenghao-intel Could u please review this?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

numericPrecedence doesn't contain StringType, you need to write a separate case like case (t1 @ StringType(), t2 @ NumericType()).

@cloud-fan
Copy link
Contributor

findTightestCommonType is also used in Rule WidenTypes which is run before PromoteStrings, and your change will break some special logic for binary comparison in PromoteStrings. We need to be careful to avoid breaking existing rules.

@chenghao-intel
Copy link
Contributor

It's better to update the rule for CaseWhenCoercion and Coalesce directly, not provide a general rule, which probably break the existed logic as @cloud-fan said.

Beside, can you also add a unit test to compare the hive answer?

@OopsOutOfMemory
Copy link
Contributor Author

Thanks, @cloud-fan and @chenghao-intel. I will update it tomorrow night.

@OopsOutOfMemory
Copy link
Contributor Author

@cloud-fan and @chenghao-intel. Updated, Any more comments?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can you compare the result with Hive?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@OopsOutOfMemory
Copy link
Contributor Author

@cloud-fan and @chenghao-intel
Resolve conflicts and test suite passed locally.
Would you mind to review this?

@OopsOutOfMemory OopsOutOfMemory changed the title [SPARK-8010][SQL]Promote numeric types to string type as implicit conversion in HiveTypeCoercion [SPARK-8010][SQL]Promote types to StringType as implicit conversion in non-binary expression of HiveTypeCoercion Jun 5, 2015
@OopsOutOfMemory
Copy link
Contributor Author

@rxin @marmbrus

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

How about ArrayType, StructType, etc.? Does hive has a document for this?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

yes, @cloud-fan
you can refer the chart at the bottom of page in https://cwiki.apache.org/confluence/display/Hive/LanguageManual+Types

This is only for primitive type.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Then I think we should add t2.isInstanceOf[AtomicType] to make sure it's primitive type.

@OopsOutOfMemory
Copy link
Contributor Author

@cloud-fan That's make sense, thanks. Updated.

@OopsOutOfMemory
Copy link
Contributor Author

ping..

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

why not make this a function instead?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

and make it private

@rxin
Copy link
Contributor

rxin commented Jun 8, 2015

@cloud-fan and @yhuai can you both take another look at this?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

cc @rxin , findTightestCommonTypeOfTwo is a function and I think we should make them consistent...

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@cloud-fan I'm sorry, I don't quite understand when to use a function or when to use a method. And what's the purpose here to use a function.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Here we don't care about the parameter names(i.e. t1 and t2). Using functions, we can just write the partially applied function(i.e. {case ...}) and don't need to add (t1, t2) match before it.
However, it's not a big deal, using method is OK, but we should make it consistent with existing code.

@OopsOutOfMemory
Copy link
Contributor Author

ping...
May AmplabJenkins test this please?

@cloud-fan
Copy link
Contributor

LGTM, but we need a more general rule to handle these promote-string stuff...

@marmbrus
Copy link
Contributor

ok to test

@SparkQA
Copy link

SparkQA commented Jun 14, 2015

Test build #34894 has finished for PR 6551 at commit 7a209d7.

  • This patch passes all tests.
  • This patch merges cleanly.
  • This patch adds no public classes.

@marmbrus
Copy link
Contributor

Thanks! Merging to master.

@asfgit asfgit closed this in 98ee351 Jun 17, 2015
nemccarthy pushed a commit to nemccarthy/spark that referenced this pull request Jun 19, 2015
… in non-binary expression of HiveTypeCoercion

1. Given a query
`select coalesce(null, 1, '1') from dual` will cause exception:
java.lang.RuntimeException: Could not determine return type of Coalesce for IntegerType,StringType
2. Given a query:
`select case when true then 1 else '1' end from dual` will cause exception:
java.lang.RuntimeException: Types in CASE WHEN must be the same or coercible to a common type: StringType != IntegerType
I checked the code, the main cause is the HiveTypeCoercion doesn't do implicit convert when there is a IntegerType and StringType.

Numeric types can be promoted to string type

Hive will always do this implicit conversion.

Author: OopsOutOfMemory <[email protected]>

Closes apache#6551 from OopsOutOfMemory/pnts and squashes the following commits:

7a209d7 [OopsOutOfMemory] rebase master
6018613 [OopsOutOfMemory] convert function to method
4cd5618 [OopsOutOfMemory] limit the data type to primitive type
df365d2 [OopsOutOfMemory] refine
95cbd58 [OopsOutOfMemory] fix style
403809c [OopsOutOfMemory] promote non-string to string when can not found tighestCommonTypeOfTwo
@OopsOutOfMemory OopsOutOfMemory deleted the pnts branch June 24, 2015 06:13
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

7 participants