-
Notifications
You must be signed in to change notification settings - Fork 28.9k
[SPARK-27252][SQL] Make current_date() independent from time zones #24185
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
|
Test build #103837 has finished for PR 24185 at commit
|
|
jenkins, retest this, please |
|
Test build #103841 has finished for PR 24185 at commit
|
|
Test build #103851 has finished for PR 24185 at commit
|
| * There is no code generation since this expression should get constant folded by the optimizer. | ||
| */ | ||
| @ExpressionDescription( | ||
| usage = "_FUNC_() - Returns the current date at the start of query evaluation.", |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Is it better to specify the time zone in this usage text?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Updated docs (and comments) of the expression and the current_date() function.
|
Test build #103872 has finished for PR 24185 at commit
|
|
@srowen @cloud-fan WDYT of the changes? |
|
I see your point, in that a date by itself without a time shouldn't be relative to a time zone. |
sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/optimizer/finishAnalysis.scala
Outdated
Show resolved
Hide resolved
|
@cloud-fan Just in case, are you ok with the changes in general? |
|
According to the SQL standard, timestamp is timezone dependent while date is not. For |
The current_date() returns a value of DateType which is time zone independent, and actually stores number of days since epoch (in UTC time zone). And Spark assumes that in many places. How can it be shifted by time zone offset?
I would guess you speak about textual representation of current_date result? If so, conversion |
|
Maybe I'm confused; If that's the case then we just have a bug, sure. After the fix, |
|
Your argument here is compelling: #24181 (comment) So, basically we already convert from absolute time to date w.r.t. UTC elsewhere? |
Yah, I missed that |
Ah yes, then +1 for this PR |
|
@cloud-fan sounds like we're good with this; what do you think about your comment at https://github.com/apache/spark/pull/24185/files#r269162289 ? |
Sorry, I forget about the comment. I am about to change this. |
|
Test build #104059 has finished for PR 24185 at commit
|
|
Just to make consequences of the changes clear:
To take the current date in local time zone (defined by select date_format(cast(current_date as TIMESTAMP), 'yyyy-MM-dd')or with implicit conversion: select date_format(current_date, 'yyyy-MM-dd') |
|
Test build #104060 has finished for PR 24185 at commit
|
|
thanks, merging to master! |
|
@MaxGekk @cloud-fan I'm seeing this failure in If it's just the test that needs to be adjusted, maybe just a follow up to patch it, not a revert. |
|
@srowen I am fixing the test, and preparing a PR. |
|
Here is the PR #24240 which fixes |
What changes were proposed in this pull request?
This makes the
CurrentDateexpression andcurrent_datefunction independent from time zone settings. New result is number of days since epoch inUTCtime zone. Previously, Spark shifted the current date (inUTCtime zone) according the session time zone which violets definition ofDateType- number of days since epoch (which is an absolute point in time, midnight of Jan 1 1970 in UTC time).The changes makes
CurrentDateconsistent toCurrentTimestampwhich is independent from time zone too.How was this patch tested?
The changes were tested by existing test suites like
DateExpressionsSuite.