-
Notifications
You must be signed in to change notification settings - Fork 28.9k
[SPARK-20763][SQL]The function of month and day return the value which is not we expected.
#17997
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
month and day return an error valuemonth and day return the value which is not we expected.
|
Point taken, but do other DBs support dates before 1970? it does highly depend on historical calendars. The right solution isn't a hack in one place here but using Calendar properly, if anything. |
|
@srowen I have tested in mysql, it can support dates before 1970. mysql> select day("1582-04-18"); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
this would cause massive performance regression.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
yes,this cause performance worse than before
3f2d8d6 to
a8d6e04
Compare
|
I guess this would also affect date-related functions such as |
|
@ueshin Yes, I will do it ,thanks |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I believe the dates we need to test here should be the boundaries like 1582-10-04 and 1582-10-15.
Btw, let's add 0 for 1-digit date, e.g. 1582-10-04 instead of 1582-10-4.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Here should be 1581-12-31 and 1582-01-01, for example, and the rest should be in a similar way.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
How about using 1582-09-30 and 1582-10-01 for this?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'm not sure whether this value is correct or not. should be 278?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'm not sure too, maybe 278 is better
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
cc @cloud-fan @gatorsmile
Do you have any ideas?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
can we check with other databases?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@cloud-fan In mysql ,the rusult is :
mysql> select dayofyear("1982-10-04");
+-------------------------+
| dayofyear("1982-10-04") |
+-------------------------+
| 277 |
+-------------------------+
1 row in set (0.00 sec)
mysql> select dayofyear("1982-10-15");
+--------------------------+
| dayofyear("1982-10-15") |
+--------------------------+
| 288 |
+--------------------------+
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
why 278 is better?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@cloud-fan Because in history,the period(5.10.1582 ~ 14.10.1582) is not exist.
But maybe 288 is more human readable.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
let's follow mysql
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
OK, thanks @cloud-fan
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
nit: this representation may be confusing for all of the people in the world. Can we use 1582-10-5 or like Oct. 5, 1582?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It's only about comment, and I think 1582-10-5 or Oct. 5, 1582 is more human readable.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
OK, I will do ,thanks @kiszk @cloud-fan
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
In history,the period(5.10.1582 ~ 14.10.1582) is not exist
->
Since Julian calendar was replaced with the Gregorian calendar, the 10 days after Oct. 4 were skipped.
205d3d0 to
8c99316
Compare
|
ok to test |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
In history,the period(5.10.1582 ~ 14.10.1582) is not exist
->
Since Julian calendar was replaced with the Gregorian calendar, the 10 days after Oct. 4 were skipped.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@gatorsmile ok,thanks
|
The current impl is still missing the special handling of that miracle period. See the outputs of Oracle: |
|
@gatorsmile I have tested like this: Also,i tested in mysql: mysql> select dayofyear('1582-10-5'); mysql> select dayofyear('1582-10-6'); mysql> select dayofyear('1582-10-14'); mysql> select dayofyear('1582-10-15'); |
|
After checking DB2, it behaves the same as MySQL. LGTM |
… which is not we expected.
## What changes were proposed in this pull request?
spark-sql>select month("1582-09-28");
spark-sql>10
For this case, the expected result is 9, but it is 10.
spark-sql>select day("1582-04-18");
spark-sql>28
For this case, the expected result is 18, but it is 28.
when the date before "1582-10-04", the function of `month` and `day` return the value which is not we expected.
## How was this patch tested?
unit tests
Author: liuxian <[email protected]>
Closes #17997 from 10110346/wip_lx_0516.
(cherry picked from commit ea3b1e3)
Signed-off-by: Xiao Li <[email protected]>
|
Thanks! Merging to master/2.2. |
|
Could you submit a backport PR to 2.1? Thanks! |
|
OK,l Will do it, thanks. @gatorsmile |
…turn the value which is not we expected. What changes were proposed in this pull request? This PR is to backport #17997 to Spark 2.1 when the date before "1582-10-04", the function of month and day return the value which is not we expected. How was this patch tested? unit tests Author: liuxian <[email protected]> Closes #18054 from 10110346/wip-lx-0522.
… which is not we expected.
## What changes were proposed in this pull request?
spark-sql>select month("1582-09-28");
spark-sql>10
For this case, the expected result is 9, but it is 10.
spark-sql>select day("1582-04-18");
spark-sql>28
For this case, the expected result is 18, but it is 28.
when the date before "1582-10-04", the function of `month` and `day` return the value which is not we expected.
## How was this patch tested?
unit tests
Author: liuxian <[email protected]>
Closes apache#17997 from 10110346/wip_lx_0516.
What changes were proposed in this pull request?
spark-sql>select month("1582-09-28");
spark-sql>10
For this case, the expected result is 9, but it is 10.
spark-sql>select day("1582-04-18");
spark-sql>28
For this case, the expected result is 18, but it is 28.
when the date before "1582-10-04", the function of
monthanddayreturn the value which is not we expected.How was this patch tested?
unit tests