Skip to content

Conversation

@MaxGekk
Copy link
Member

@MaxGekk MaxGekk commented Mar 22, 2019

What changes were proposed in this pull request?

This makes the CurrentDate expression and current_date function independent from time zone settings. New result is number of days since epoch in UTC time zone. Previously, Spark shifted the current date (in UTC time zone) according the session time zone which violets definition of DateType - number of days since epoch (which is an absolute point in time, midnight of Jan 1 1970 in UTC time).

The changes makes CurrentDate consistent to CurrentTimestamp which is independent from time zone too.

How was this patch tested?

The changes were tested by existing test suites like DateExpressionsSuite.

@SparkQA
Copy link

SparkQA commented Mar 23, 2019

Test build #103837 has finished for PR 24185 at commit 7ff29aa.

  • This patch fails Spark unit tests.
  • This patch merges cleanly.
  • This patch adds no public classes.

@MaxGekk
Copy link
Member Author

MaxGekk commented Mar 23, 2019

jenkins, retest this, please

@SparkQA
Copy link

SparkQA commented Mar 23, 2019

Test build #103841 has finished for PR 24185 at commit 7ff29aa.

  • This patch passes all tests.
  • This patch merges cleanly.
  • This patch adds no public classes.

@SparkQA
Copy link

SparkQA commented Mar 23, 2019

Test build #103851 has finished for PR 24185 at commit ae918cb.

  • This patch passes all tests.
  • This patch merges cleanly.
  • This patch adds no public classes.

* There is no code generation since this expression should get constant folded by the optimizer.
*/
@ExpressionDescription(
usage = "_FUNC_() - Returns the current date at the start of query evaluation.",
Copy link
Member

@viirya viirya Mar 24, 2019

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is it better to specify the time zone in this usage text?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Updated docs (and comments) of the expression and the current_date() function.

@SparkQA
Copy link

SparkQA commented Mar 24, 2019

Test build #103872 has finished for PR 24185 at commit 11afacc.

  • This patch passes all tests.
  • This patch merges cleanly.
  • This patch adds no public classes.

@MaxGekk
Copy link
Member Author

MaxGekk commented Mar 24, 2019

@srowen @cloud-fan WDYT of the changes?

@srowen
Copy link
Member

srowen commented Mar 24, 2019

I see your point, in that a date by itself without a time shouldn't be relative to a time zone.current_date seems like it should reflect the date in the session time zone though. I think of it as the date portion of the current time, and time is relative to a time zone. Is this matching some other DB behavior?

@MaxGekk
Copy link
Member Author

MaxGekk commented Mar 24, 2019

@cloud-fan Just in case, are you ok with the changes in general?

@cloud-fan
Copy link
Contributor

According to the SQL standard, timestamp is timezone dependent while date is not. For current_date, it seems the timezone shifting is reasonable. E.g. when I run current_date in a database, I'd expect the result matches my local calendar. Can we check other mainstreaming databases?

@MaxGekk
Copy link
Member Author

MaxGekk commented Mar 26, 2019

timestamp is timezone dependent while date is not. For current_date, it seems the timezone shifting is reasonable.

The current_date() returns a value of DateType which is time zone independent, and actually stores number of days since epoch (in UTC time zone). And Spark assumes that in many places. How can it be shifted by time zone offset?

when I run current_date in a database, I'd expect the result matches my local calendar.

I would guess you speak about textual representation of current_date result? If so, conversion DateType -> StringType should care about time zone offset which what the date_format() function does.

@srowen
Copy link
Member

srowen commented Mar 26, 2019

Maybe I'm confused; DateType isn't logically time-zone-dependent but it's stored in a way that is (time since epoch)? OK this makes more sense. So if I later render this value according to my session time zone, it would already also 'correct' for the timezone?

If that's the case then we just have a bug, sure. After the fix, current_date() formatted for my current time zone would return what I expect today is? would be the current date from midnight to midnight in my timezone?

@srowen
Copy link
Member

srowen commented Mar 26, 2019

Your argument here is compelling: #24181 (comment)

So, basically we already convert from absolute time to date w.r.t. UTC elsewhere?

@MaxGekk
Copy link
Member Author

MaxGekk commented Mar 26, 2019

what the date_format() function does.
... Your argument here is compelling: #24181 (comment)

Yah, I missed that date_format() has the same behavior as other date renders in Spark. Fortunately everything has been already consistent from this side :-)

@cloud-fan
Copy link
Contributor

I would guess you speak about textual representation of current_date result

Ah yes, then +1 for this PR

@srowen
Copy link
Member

srowen commented Mar 28, 2019

@cloud-fan sounds like we're good with this; what do you think about your comment at https://github.com/apache/spark/pull/24185/files#r269162289 ?

@MaxGekk
Copy link
Member Author

MaxGekk commented Mar 28, 2019

@cloud-fan sounds like we're good with this; what do you think about your comment at https://github.com/apache/spark/pull/24185/files#r269162289 ?

Sorry, I forget about the comment. I am about to change this.

@SparkQA
Copy link

SparkQA commented Mar 28, 2019

Test build #104059 has finished for PR 24185 at commit ec1f7e8.

  • This patch fails to build.
  • This patch merges cleanly.
  • This patch adds no public classes.

@MaxGekk
Copy link
Member Author

MaxGekk commented Mar 28, 2019

Just to make consequences of the changes clear:

  1. current_date() will always produce current date as a value of DateType in the UTC time zone.
  2. All available conversions of DateType to StringType is time zone independent in Spark at the moment. See the comment: [SPARK-27242][SQL] Make formatting TIMESTAMP/DATE literals independent from the default time zone #24181 (comment) . Produced strings are always textual representation of dates in the UTC time zone.

To take the current date in local time zone (defined by spark.sql.session.timeZone), DATE column can be casted to TIMESTAMP column, and former one can be converted STRING column. This works because TimestampType to StringType is time zone dependent. For example:

select date_format(cast(current_date as TIMESTAMP), 'yyyy-MM-dd')

or with implicit conversion:

select date_format(current_date, 'yyyy-MM-dd')

@SparkQA
Copy link

SparkQA commented Mar 29, 2019

Test build #104060 has finished for PR 24185 at commit cdb24ae.

  • This patch passes all tests.
  • This patch merges cleanly.
  • This patch adds no public classes.

@cloud-fan
Copy link
Contributor

thanks, merging to master!

@cloud-fan cloud-fan closed this in 06abd06 Mar 29, 2019
@srowen
Copy link
Member

srowen commented Mar 29, 2019

@MaxGekk @cloud-fan I'm seeing this failure in master; looks related?

ComputeCurrentTimeSuite:
- analyzer should replace current_timestamp with literals
- analyzer should replace current_date with literals *** FAILED ***
  17984 was greater than or equal to 17983, but 17984 was not less than or equal to 17983 (ComputeCurrentTimeSuite.scala:64)

If it's just the test that needs to be adjusted, maybe just a follow up to patch it, not a revert.

@MaxGekk
Copy link
Member Author

MaxGekk commented Mar 29, 2019

@srowen I am fixing the test, and preparing a PR.

@MaxGekk
Copy link
Member Author

MaxGekk commented Mar 29, 2019

Here is the PR #24240 which fixes ComputeCurrentTimeSuite.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

5 participants