Skip to content

Conversation

@HyukjinKwon
Copy link
Member

@HyukjinKwon HyukjinKwon commented Aug 24, 2016

What changes were proposed in this pull request?

Currently, type-widening does not work between TimestampType and DateType.

This applies to SetOperation, Union, In, CaseWhen, Greatest, Leatest, CreateArray, CreateMap, Coalesce, NullIf, IfNull, Nvl and Nvl2, .

This PR adds the support for widening DateType to TimestampType for them.

For a simple example,

Before

Seq(Tuple2(new Timestamp(0), new Date(0))).toDF("a", "b").selectExpr("greatest(a, b)").show()

shows below:

cannot resolve 'greatest(`a`, `b`)' due to data type mismatch: The expressions should all have the same type, got GREATEST(timestamp, date)

or union as below:

val a = Seq(Tuple1(new Timestamp(0))).toDF()
val b = Seq(Tuple1(new Date(0))).toDF()
a.union(b).show()

shows below:

Union can only be performed on tables with the compatible column types. DateType <> TimestampType at the first column of the second table;

After

Seq(Tuple2(new Timestamp(0), new Date(0))).toDF("a", "b").selectExpr("greatest(a, b)").show()

shows below:

+----------------------------------------------------+
|greatest(CAST(a AS TIMESTAMP), CAST(b AS TIMESTAMP))|
+----------------------------------------------------+
|                                1969-12-31 16:00:...|
+----------------------------------------------------+

or union as below:

val a = Seq(Tuple1(new Timestamp(0))).toDF()
val b = Seq(Tuple1(new Date(0))).toDF()
a.union(b).show()

shows below:

+--------------------+
|                  _1|
+--------------------+
|1969-12-31 16:00:...|
|1969-12-31 00:00:...|
+--------------------+

How was this patch tested?

Unit tests in TypeCoercionSuite.

@SparkQA
Copy link

SparkQA commented Aug 24, 2016

Test build #64345 has finished for PR 14786 at commit d49fbdb.

  • This patch passes all tests.
  • This patch merges cleanly.
  • This patch adds no public classes.

@HyukjinKwon
Copy link
Member Author

Cc @cloud-fan and @petermaxlee could you please take a look for this one?

@cloud-fan
Copy link
Contributor

Have you checked with other databases? It looks reasonable to do this but I'd like to know how other databases deal with this case.

@HyukjinKwon
Copy link
Member Author

I have only checked Hive it seems it does not handle this case. I will check others and be back soon.

@HyukjinKwon
Copy link
Member Author

MySQL and PostgreSQL support this

MySQL

  • Greatest/leastest
mysql> SELECT GREATEST(CAST("1990-02-24 12:00:00" AS DATETIME), CAST("1990-02-25" AS DATE));
+-------------------------------------------------------------------------------+
| GREATEST(CAST("1990-02-24 12:00:00" AS DATETIME), CAST("1990-02-25" AS DATE)) |
+-------------------------------------------------------------------------------+
| 1990-02-25 00:00:00                                                           |
+-------------------------------------------------------------------------------+
  • Union
mysql> SELECT CAST("1990-02-24 12:00:00" AS DATETIME) UNION SELECT CAST("1990-02-24" AS DATE);
+-----------------------------------------+
| CAST("1990-02-24 12:00:00" AS DATETIME) |
+-----------------------------------------+
| 1990-02-24 12:00:00                     |
| 1990-02-24 00:00:00                     |
+-----------------------------------------+

PostgreSQL

  • Greatest/leatest
postgres=# SELECT GREATEST(CAST('1990-02-24 12:00:00' AS TIMESTAMP), CAST('1990-02-25' AS DATE));
      greatest
---------------------
 1990-02-25 00:00:00
(1 row)
  • Union
postgres=# SELECT CAST('1990-02-24 12:00:00' AS TIMESTAMP) UNION SELECT CAST('1990-02-24' AS DATE);
      timestamp
---------------------
 1990-02-24 00:00:00
 1990-02-24 12:00:00
(2 rows)

Some(DecimalPrecision.widerDecimalType(DecimalType.forType(t), d))
case (_: FractionalType, _: DecimalType) | (_: DecimalType, _: FractionalType) =>
Some(DoubleType)
case (_: TimestampType, _: DateType) | (_: DateType, _: TimestampType) =>
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

shall we put it in findTightestCommonTypeOfTwo?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ah, sure!

@SparkQA
Copy link

SparkQA commented Aug 25, 2016

Test build #64411 has finished for PR 14786 at commit ab754fa.

  • This patch passes all tests.
  • This patch merges cleanly.
  • This patch adds no public classes.

// TimestampType
widenTest(NullType, TimestampType, Some(TimestampType))
widenTest(TimestampType, TimestampType, Some(TimestampType))
widenTest(DateType, TimestampType, Some(TimestampType))
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think we only need this test

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yeap, fixed.

@SparkQA
Copy link

SparkQA commented Aug 25, 2016

Test build #64419 has finished for PR 14786 at commit d035eb3.

  • This patch passes all tests.
  • This patch merges cleanly.
  • This patch adds no public classes.

@cloud-fan
Copy link
Contributor

thanks, merging to master!

@asfgit asfgit closed this in b964a17 Aug 26, 2016
asfgit pushed a commit that referenced this pull request Oct 23, 2016
… for set operations

## What changes were proposed in this pull request?

This PR backports #15072

Please note that the test code is a bit different with the master as #14786 was only merged into master and therefore, it does not support type-widening between `DateType` and `TimestampType`.

So, both types were taken out from the test.

## How was this patch tested?

Unit test in `DataFrameSuite`.

Author: hyukjinkwon <[email protected]>

Closes #15601 from HyukjinKwon/backport-17123.
@HyukjinKwon HyukjinKwon deleted the SPARK-17212 branch January 2, 2018 03:42
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants