[SPARK-22157] [SQL] The uniux_timestamp method handles the time field that is lost in mill #19380

httfighter · 2017-09-28T12:32:25Z

What changes were proposed in this pull request?

keep the the mill part of the time field

How was this patch tested?

Add new test cases and update existing test cases

Please review http://spark.apache.org/contributing.html before opening a pull request.

srowen

I think this has problems in many ways. You're treating dates as doubles and hard-coding some locale-specific concerns. But mostly, this is the wrong answer because the UNIX timestamp is in whole seconds

AmplabJenkins · 2017-09-28T12:37:00Z

Can one of the admins verify this patch?

httfighter · 2017-09-29T01:44:17Z

In RDMS , unix_timestamp method can keep the milliseconds. For example, execute the command as follows
select unix_timestamp("2017-10-10 10:10:20.111") from test;
you can get the result： 1490667020.111
But the native unix_timestamp method of Spark will be lost milliseconds, we want to keep the milliseconds.

HyukjinKwon · 2017-09-29T02:02:33Z

sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/expressions/datetimeExpressions.scala

    Seq(TypeCollection(StringType, DateType, TimestampType), StringType)

-  override def dataType: DataType = LongType
+  override def dataType: DataType = DoubleType


BTW, I think we can't just change this datatype directly. This could break backward compatibility.

ouyangxiaochen · 2017-09-29T02:19:24Z

Since the RDMS keep the milliseconds, we should follow it. This proposal LGTM. @gatorsmile CC

viirya · 2017-09-29T02:24:15Z

We also have FromUnixTime and seems the data type of unix time is defined as LongType across those unix time expressions. We shouldn't change just one expression so there is inconsistency.

As @HyukjinKwon said, we also need to not break backward compatibility.

Btw, for RDMS support, I only found MySQL has direct unix_timestamp support like this. Sounds like not a good idea to break backward compatibility just for following one (or few) RDMS.

srowen · 2017-09-29T07:06:43Z

This would break compatibility with Spark and other engines like Hive. This shoudl be closed.

ouyangxiaochen · 2017-09-29T07:45:12Z

In fact, there are many scenarios that need to be accurate to milliseconds, should we try to solve this problem together?

gatorsmile · 2017-09-29T21:55:28Z

The workaround is to let users write a UDF to handle these cases

httfighter · 2017-09-30T00:59:04Z

I understand everyone's worries.But i hava few thoughts.
Firstly, the native unix_timestamp itself supports the "yyyy-MM-dd HH:mm:ss.SSS" form of the date, but the result is lost in milliseconds when i use it. Obviously, it's a bug. It will give users the wrong results
I think this should be fixed.
Secondly, unix_timestamp, from_unixtime and to_unix_timestamp all have the similar bug, but related methods only these three. I think the data type of unix time of the three methods should be defined as DoubleType，not for LongType. Or in milliseconds which will bring more problems.

gatorsmile · 2017-09-30T04:32:41Z

Currently, we are following Hive for these built-in functions. See https://cwiki.apache.org/confluence/display/Hive/LanguageManual+UDF

Maybe we can wait and see whether more users have the same requests? Then, we can see whether we should introduce new functions or introduce a SQLConf.

srowen · 2017-09-30T04:34:08Z

This itself is certainly not a bug. The type is on purpose and certainly the answer is correct given the type. You are arguing for a new function called something else but you can also do this with a UDF

HyukjinKwon · 2017-10-08T14:23:25Z

I'd close this for now and optionally we could ask this case and discuss in the mailing list if this is important.

The uniux_timestamp method handles the time field that is lost in mill

697e61a

srowen reviewed Sep 28, 2017

View reviewed changes

HyukjinKwon reviewed Sep 29, 2017

View reviewed changes

srowen mentioned this pull request Nov 6, 2017

[BUILD] Close stale PRs #19669

Closed

asfgit closed this in ed1478c Nov 7, 2017

[SPARK-22157] [SQL] The uniux_timestamp method handles the time field that is lost in mill #19380

[SPARK-22157] [SQL] The uniux_timestamp method handles the time field that is lost in mill #19380

Uh oh!

Conversation

httfighter commented Sep 28, 2017

What changes were proposed in this pull request?

How was this patch tested?

Uh oh!

srowen left a comment

Choose a reason for hiding this comment

Uh oh!

AmplabJenkins commented Sep 28, 2017

Uh oh!

httfighter commented Sep 29, 2017

Uh oh!

HyukjinKwon Sep 29, 2017

Choose a reason for hiding this comment

Uh oh!

ouyangxiaochen commented Sep 29, 2017

Uh oh!

viirya commented Sep 29, 2017 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

srowen commented Sep 29, 2017

Uh oh!

ouyangxiaochen commented Sep 29, 2017

Uh oh!

gatorsmile commented Sep 29, 2017

Uh oh!

httfighter commented Sep 30, 2017

Uh oh!

gatorsmile commented Sep 30, 2017

Uh oh!

srowen commented Sep 30, 2017

Uh oh!

HyukjinKwon commented Oct 8, 2017 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

7 participants

viirya commented Sep 29, 2017 •

edited

Loading

HyukjinKwon commented Oct 8, 2017 •

edited

Loading