-
Notifications
You must be signed in to change notification settings - Fork 28.9k
[SPARK-29012][SQL] Support special timestamp values #25716
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
|
Test build #110256 has finished for PR 25716 at commit
|
This reverts commit ad23507.
|
Test build #110286 has finished for PR 25716 at commit
|
|
Test build #110288 has finished for PR 25716 at commit
|
| -- !query 16 schema | ||
| struct<64:string,d1:timestamp> | ||
| -- !query 16 output | ||
| 1969-12-31 16:00:00 |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This is the epoch in UTC (1970-01-01 00:00:00) displayed in the local time zone.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
To print 1970-01-01 00:00:00 here, better to set a config for pgSQL tests in
https://github.com/apache/spark/blob/master/sql/core/src/test/scala/org/apache/spark/sql/SQLQueryTestSuite.scala#L301 ?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I can set UTC globally but ... if we know the reason of this, should we do that?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
And I am afraid it won't be enough. Need to set system time zone as well:
spark/sql/core/src/test/scala/org/apache/spark/sql/SQLQueryTestSuite.scala
Lines 552 to 553 in 7cc0f0e
| // Timezone is fixed to America/Los_Angeles for those timezone sensitive tests (timestamp_*) | |
| TimeZone.setDefault(TimeZone.getTimeZone("America/Los_Angeles")) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
NVM. I think its ok as it is. But, better to leave some comments about why in timestamp.sql.
|
@dongjoon-hyun @cloud-fan @maropu Could you take a look at this when you have time. |
|
Test build #110357 has finished for PR 25716 at commit
|
| * @return some of microseconds since the epoch if the conversion completed | ||
| * successfully otherwise None. | ||
| */ | ||
| def convertSpecialTimestamp(input: String, zoneId: ZoneId): Option[SQLTimestamp] = { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
What's different from convertSpecialDate? I know the output dataType is different, but the way to handle these special values is different, too?
https://github.com/apache/spark/pull/25708/files#diff-da60f07e1826788aaeb07f295fae4b8aR866
Can we share some code between them?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I have extracted common code there https://github.com/apache/spark/pull/25716/files#diff-da60f07e1826788aaeb07f295fae4b8aR864-R890
|
|
||
| /** | ||
| * Converts notational shorthands that are converted to ordinary timestamps. | ||
| * @param input - a trimmed string |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
How about checking if an input is trimmed by assert?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I will add the assert:
assert(input.trim.length == input.length)| } | ||
| } | ||
|
|
||
| private def convertSpecialTimestamp(bytes: Array[Byte], zoneId: ZoneId): Option[SQLTimestamp] = { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Why did you use Array[Byte] instead of UTF8String?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Because I need String inside of extractSpecialValue, and UTF8String.fromString converts UTF8String to String via Array[Byte]. Why should we convert the same string to bytes twice?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Ur, I see.
| var currentSegmentValue = 0 | ||
| val bytes = s.trim.getBytes | ||
| val specialTimestamp = convertSpecialTimestamp(bytes, timeZoneId) | ||
| if (specialTimestamp.isDefined) return specialTimestamp |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Can we avoid to use return here?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Why?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'm not 100% sure about bytecode for this though, no overhead to use return?
| import java.util.Locale | ||
| import java.util.concurrent.TimeUnit.SECONDS | ||
|
|
||
| import DateTimeUtils.{convertSpecialTimestamp} |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
nit: remove {
| Instant.now().atZone(zoneId).`with`(LocalTime.MIDNIGHT) | ||
| } | ||
|
|
||
| private val specialValue = """(EPOCH|NOW|TODAY|TOMORROW|YESTERDAY)\p{Blank}*(.*)""".r |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The description for the supported special values in this pr should be;
epoch [zoneid] - 1970-01-01 00:00:00+00 (Unix system time zero)
today [zoneid] - midnight today.
yesterday [zoneid] -midnight yesterday
tomorrow [zoneid] - midnight tomorrow
now - current query start time.
?
|
|
||
| assert(toTimestamp("Epoch", zoneId).get === 0) | ||
| val now = instantToMicros(LocalDateTime.now(zoneId).atZone(zoneId).toInstant) | ||
| toTimestamp("NOW", zoneId).get should be (now +- tolerance) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Can you check illegal cases, e.g., now CET
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I have already added the test here https://github.com/apache/spark/pull/25716/files#diff-c5655e947ce2dd3748e4cf95ebc32e8aR580
| -- !query 16 schema | ||
| struct<64:string,d1:timestamp> | ||
| -- !query 16 output | ||
| 1969-12-31 16:00:00 |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
To print 1970-01-01 00:00:00 here, better to set a config for pgSQL tests in
https://github.com/apache/spark/blob/master/sql/core/src/test/scala/org/apache/spark/sql/SQLQueryTestSuite.scala#L301 ?
sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/util/DateTimeUtils.scala
Outdated
Show resolved
Hide resolved
|
Thank you for pinging me, @MaxGekk . I support this approach.
Hi, @gatorsmile , @gengliangwang , @cloud-fan .
|
|
Test build #110403 has finished for PR 25716 at commit
|
|
Well, for this feature it doesn't really conflict with the current Spark behavior. I think we can proceed it. |
|
Test build #110408 has finished for PR 25716 at commit
|
|
@maropu @dongjoon-hyun Can we continue with the PR or we are waiting for @gengliangwang 's #25697 ? |
|
How about holding this pr until this weekend for the @gengliangwang work? I personally think we don't have any reason to rush to merge this. |
|
I have some performance related concerns regarding to using the config. In current implementation, decision is pretty cheap - just comparing first byte. In the case of the config usage, we will need to retrieve it and compare its value with other string which can bring visible overhead even if PostgreSQL compatibility mode is turned off here https://github.com/apache/spark/pull/25716/files#diff-da60f07e1826788aaeb07f295fae4b8aR223 Are you absolutely sure about using this config in the PR? |
|
@dongjoon-hyun @maropu Can you merge this PR? Checking the flag could be added in #25697 itself or in a separate follow-up PR after the merge of #25697, I do believe. |
|
retest this please |
|
@MaxGekk so do you plan to hide this change behind the configuration |
@HyukjinKwon Yes, I do. I would do that in a separate PR as soon as the flag is available in the master branch. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM but please @MaxGekk make sure adding a configuration later. Otherwise we might have to revert before 3.0
|
Test build #110860 has finished for PR 25716 at commit
|
|
retest this please |
|
Test build #110888 has finished for PR 25716 at commit
|
|
Merged to master. |
|
This PR #25834 hides the feature under the SQL config |
… SQL migration guide ### What changes were proposed in this pull request? Updated the SQL migration guide regarding to recently supported special date and timestamp values, see #25716 and #25708. Closes #25834 ### Why are the changes needed? To let users know about new feature in Spark 3.0. ### Does this PR introduce any user-facing change? No Closes #25948 from MaxGekk/special-values-migration-guide. Authored-by: Maxim Gekk <[email protected]> Signed-off-by: Dongjoon Hyun <[email protected]>
…only ### What changes were proposed in this pull request? In the PR, I propose to support special datetime values introduced by #25708 and by #25716 only in typed literals, and don't recognize them in parsing strings to dates/timestamps. The following string values are supported only in typed timestamp literals: - `epoch [zoneId]` - `1970-01-01 00:00:00+00 (Unix system time zero)` - `today [zoneId]` - midnight today. - `yesterday [zoneId]` - midnight yesterday - `tomorrow [zoneId]` - midnight tomorrow - `now` - current query start time. For example: ```sql spark-sql> SELECT timestamp 'tomorrow'; 2019-09-07 00:00:00 ``` Similarly, the following special date values are supported only in typed date literals: - `epoch [zoneId]` - `1970-01-01` - `today [zoneId]` - the current date in the time zone specified by `spark.sql.session.timeZone`. - `yesterday [zoneId]` - the current date -1 - `tomorrow [zoneId]` - the current date + 1 - `now` - the date of running the current query. It has the same notion as `today`. For example: ```sql spark-sql> SELECT date 'tomorrow' - date 'yesterday'; 2 ``` ### Why are the changes needed? In the current implementation, Spark supports the special date/timestamp value in any input strings casted to dates/timestamps that leads to the following problems: - If executors have different system time, the result is inconsistent, and random. Column values depend on where the conversions were performed. - The special values play the role of distributed non-deterministic functions though users might think of the values as constants. ### Does this PR introduce _any_ user-facing change? Yes but the probability should be small. ### How was this patch tested? By running existing test suites: ``` $ build/sbt "sql/testOnly org.apache.spark.sql.SQLQueryTestSuite -- -z interval.sql" $ build/sbt "sql/testOnly org.apache.spark.sql.SQLQueryTestSuite -- -z date.sql" $ build/sbt "sql/testOnly org.apache.spark.sql.SQLQueryTestSuite -- -z timestamp.sql" $ build/sbt "test:testOnly *DateTimeUtilsSuite" ``` Closes #32714 from MaxGekk/remove-datetime-special-values. Lead-authored-by: Max Gekk <[email protected]> Co-authored-by: Maxim Gekk <[email protected]> Signed-off-by: Max Gekk <[email protected]>
What changes were proposed in this pull request?
Supported special string values for
TIMESTAMPtype. They are simply notational shorthands that will be converted to ordinary timestamp values when read. The following string values are supported:epoch [zoneId]-1970-01-01 00:00:00+00 (Unix system time zero)today [zoneId]- midnight today.yesterday [zoneId]-midnight yesterdaytomorrow [zoneId]- midnight tomorrownow- current query start time.For example:
Why are the changes needed?
To maintain feature parity with PostgreSQL, see 8.5.1.4. Special Values
Does this PR introduce any user-facing change?
Previously, the parser fails on the special values with the error:
After the changes, the special values are converted to appropriate dates:
How was this patch tested?
TimestampFormatterSuiteto check parsing special values from regular strings.DateTimeUtilsSuitecheck parsing those values fromUTF8Stringtimestamp.sql