Skip to content

Conversation

@MaxGekk
Copy link
Member

@MaxGekk MaxGekk commented Sep 6, 2019

What changes were proposed in this pull request?

Supported special string values for TIMESTAMP type. They are simply notational shorthands that will be converted to ordinary timestamp values when read. The following string values are supported:

  • epoch [zoneId] - 1970-01-01 00:00:00+00 (Unix system time zero)
  • today [zoneId] - midnight today.
  • yesterday [zoneId] -midnight yesterday
  • tomorrow [zoneId] - midnight tomorrow
  • now - current query start time.

For example:

spark-sql> SELECT timestamp 'tomorrow';
2019-09-07 00:00:00

Why are the changes needed?

To maintain feature parity with PostgreSQL, see 8.5.1.4. Special Values

Does this PR introduce any user-facing change?

Previously, the parser fails on the special values with the error:

spark-sql> select timestamp 'today';
Error in query: 
Cannot parse the TIMESTAMP value: today(line 1, pos 7)

After the changes, the special values are converted to appropriate dates:

spark-sql> select timestamp 'today';
2019-09-06 00:00:00

How was this patch tested?

  • Added tests to TimestampFormatterSuite to check parsing special values from regular strings.
  • Tests in DateTimeUtilsSuite check parsing those values from UTF8String
  • Uncommented tests in timestamp.sql

@SparkQA
Copy link

SparkQA commented Sep 6, 2019

Test build #110256 has finished for PR 25716 at commit ad23507.

  • This patch fails Spark unit tests.
  • This patch merges cleanly.
  • This patch adds no public classes.

@SparkQA
Copy link

SparkQA commented Sep 7, 2019

Test build #110286 has finished for PR 25716 at commit 59e30e3.

  • This patch passes all tests.
  • This patch merges cleanly.
  • This patch adds no public classes.

@SparkQA
Copy link

SparkQA commented Sep 7, 2019

Test build #110288 has finished for PR 25716 at commit 9f7ed14.

  • This patch passes all tests.
  • This patch merges cleanly.
  • This patch adds no public classes.

@MaxGekk MaxGekk changed the title [WIP][SPARK-29012][SQL] Support special timestamp values [SPARK-29012][SQL] Support special timestamp values Sep 9, 2019
-- !query 16 schema
struct<64:string,d1:timestamp>
-- !query 16 output
1969-12-31 16:00:00
Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is the epoch in UTC (1970-01-01 00:00:00) displayed in the local time zone.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I can set UTC globally but ... if we know the reason of this, should we do that?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

And I am afraid it won't be enough. Need to set system time zone as well:

// Timezone is fixed to America/Los_Angeles for those timezone sensitive tests (timestamp_*)
TimeZone.setDefault(TimeZone.getTimeZone("America/Los_Angeles"))

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

NVM. I think its ok as it is. But, better to leave some comments about why in timestamp.sql.

@MaxGekk
Copy link
Member Author

MaxGekk commented Sep 9, 2019

@dongjoon-hyun @cloud-fan @maropu Could you take a look at this when you have time.

@SparkQA
Copy link

SparkQA commented Sep 9, 2019

Test build #110357 has finished for PR 25716 at commit a268e62.

  • This patch passes all tests.
  • This patch merges cleanly.
  • This patch adds no public classes.

* @return some of microseconds since the epoch if the conversion completed
* successfully otherwise None.
*/
def convertSpecialTimestamp(input: String, zoneId: ZoneId): Option[SQLTimestamp] = {
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

What's different from convertSpecialDate? I know the output dataType is different, but the way to handle these special values is different, too?
https://github.com/apache/spark/pull/25708/files#diff-da60f07e1826788aaeb07f295fae4b8aR866
Can we share some code between them?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.


/**
* Converts notational shorthands that are converted to ordinary timestamps.
* @param input - a trimmed string
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

How about checking if an input is trimmed by assert?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I will add the assert:

assert(input.trim.length == input.length)

}
}

private def convertSpecialTimestamp(bytes: Array[Byte], zoneId: ZoneId): Option[SQLTimestamp] = {
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why did you use Array[Byte] instead of UTF8String?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Because I need String inside of extractSpecialValue, and UTF8String.fromString converts UTF8String to String via Array[Byte]. Why should we convert the same string to bytes twice?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ur, I see.

var currentSegmentValue = 0
val bytes = s.trim.getBytes
val specialTimestamp = convertSpecialTimestamp(bytes, timeZoneId)
if (specialTimestamp.isDefined) return specialTimestamp
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can we avoid to use return here?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm not 100% sure about bytecode for this though, no overhead to use return?

import java.util.Locale
import java.util.concurrent.TimeUnit.SECONDS

import DateTimeUtils.{convertSpecialTimestamp}
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

nit: remove {

Instant.now().atZone(zoneId).`with`(LocalTime.MIDNIGHT)
}

private val specialValue = """(EPOCH|NOW|TODAY|TOMORROW|YESTERDAY)\p{Blank}*(.*)""".r
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The description for the supported special values in this pr should be;

epoch [zoneid] - 1970-01-01 00:00:00+00 (Unix system time zero)
today [zoneid] - midnight today.
yesterday [zoneid] -midnight yesterday
tomorrow [zoneid] - midnight tomorrow
now - current query start time.

?


assert(toTimestamp("Epoch", zoneId).get === 0)
val now = instantToMicros(LocalDateTime.now(zoneId).atZone(zoneId).toInstant)
toTimestamp("NOW", zoneId).get should be (now +- tolerance)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can you check illegal cases, e.g., now CET

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

-- !query 16 schema
struct<64:string,d1:timestamp>
-- !query 16 output
1969-12-31 16:00:00
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@dongjoon-hyun
Copy link
Member

Thank you for pinging me, @MaxGekk . I support this approach.
One thing I'm concerning is the coordination between this PR and the followings.

Hi, @gatorsmile , @gengliangwang , @cloud-fan .

  • Can we proceed the feature implementation first in the non-critical PR like this?
  • Or, do we need to wait and choose one of them to follow?

@maropu
Copy link
Member

maropu commented Sep 10, 2019

The current code looks ok to me, but this might be blocked by #25697 because of the same reason with #25708 ...

@SparkQA
Copy link

SparkQA commented Sep 10, 2019

Test build #110403 has finished for PR 25716 at commit b17d642.

  • This patch fails Spark unit tests.
  • This patch merges cleanly.
  • This patch adds no public classes.

@gengliangwang
Copy link
Member

Well, for this feature it doesn't really conflict with the current Spark behavior. I think we can proceed it.
I will continue the work of #25697 this week.

@SparkQA
Copy link

SparkQA commented Sep 10, 2019

Test build #110408 has finished for PR 25716 at commit a4fae09.

  • This patch passes all tests.
  • This patch merges cleanly.
  • This patch adds no public classes.

@MaxGekk
Copy link
Member Author

MaxGekk commented Sep 12, 2019

@maropu @dongjoon-hyun Can we continue with the PR or we are waiting for @gengliangwang 's #25697 ?

@maropu
Copy link
Member

maropu commented Sep 12, 2019

How about holding this pr until this weekend for the @gengliangwang work? I personally think we don't have any reason to rush to merge this.

@MaxGekk
Copy link
Member Author

MaxGekk commented Sep 12, 2019

I have some performance related concerns regarding to using the config. In current implementation, decision is pretty cheap - just comparing first byte. In the case of the config usage, we will need to retrieve it and compare its value with other string which can bring visible overhead even if PostgreSQL compatibility mode is turned off here https://github.com/apache/spark/pull/25716/files#diff-da60f07e1826788aaeb07f295fae4b8aR223

Are you absolutely sure about using this config in the PR?

@MaxGekk
Copy link
Member Author

MaxGekk commented Sep 17, 2019

@dongjoon-hyun @maropu Can you merge this PR? Checking the flag could be added in #25697 itself or in a separate follow-up PR after the merge of #25697, I do believe.

@HyukjinKwon
Copy link
Member

retest this please

@HyukjinKwon
Copy link
Member

@MaxGekk so do you plan to hide this change behind the configuration spark.sql.dialect later?

@MaxGekk
Copy link
Member Author

MaxGekk commented Sep 18, 2019

so do you plan to hide this change behind the configuration spark.sql.dialect later?

@HyukjinKwon Yes, I do. I would do that in a separate PR as soon as the flag is available in the master branch.

Copy link
Member

@HyukjinKwon HyukjinKwon left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM but please @MaxGekk make sure adding a configuration later. Otherwise we might have to revert before 3.0

@SparkQA
Copy link

SparkQA commented Sep 18, 2019

Test build #110860 has finished for PR 25716 at commit a4fae09.

  • This patch fails due to an unknown error code, -9.
  • This patch merges cleanly.
  • This patch adds no public classes.

@HyukjinKwon
Copy link
Member

retest this please

@SparkQA
Copy link

SparkQA commented Sep 18, 2019

Test build #110888 has finished for PR 25716 at commit a4fae09.

  • This patch passes all tests.
  • This patch merges cleanly.
  • This patch adds no public classes.

@HyukjinKwon
Copy link
Member

Merged to master.

@MaxGekk
Copy link
Member Author

MaxGekk commented Sep 22, 2019

This PR #25834 hides the feature under the SQL config spark.sql.dialect = "PostgreSQL"

dongjoon-hyun pushed a commit that referenced this pull request Sep 27, 2019
… SQL migration guide

### What changes were proposed in this pull request?

Updated the SQL migration guide regarding to recently supported special date and timestamp values, see #25716 and #25708.

Closes #25834

### Why are the changes needed?
To let users know about new feature in Spark 3.0.

### Does this PR introduce any user-facing change?
No

Closes #25948 from MaxGekk/special-values-migration-guide.

Authored-by: Maxim Gekk <[email protected]>
Signed-off-by: Dongjoon Hyun <[email protected]>
@MaxGekk MaxGekk deleted the timestamp-special-values branch October 5, 2019 19:17
MaxGekk added a commit that referenced this pull request Jun 1, 2021
…only

### What changes were proposed in this pull request?
In the PR, I propose to support special datetime values introduced by #25708 and by #25716 only in typed literals, and don't recognize them in parsing strings to dates/timestamps. The following string values are supported only in typed timestamp literals:
- `epoch [zoneId]` - `1970-01-01 00:00:00+00 (Unix system time zero)`
- `today [zoneId]` - midnight today.
- `yesterday [zoneId]` - midnight yesterday
- `tomorrow [zoneId]` - midnight tomorrow
- `now` - current query start time.

For example:
```sql
spark-sql> SELECT timestamp 'tomorrow';
2019-09-07 00:00:00
```

Similarly, the following special date values are supported only in typed date literals:
- `epoch [zoneId]` - `1970-01-01`
- `today [zoneId]` - the current date in the time zone specified by `spark.sql.session.timeZone`.
- `yesterday [zoneId]` - the current date -1
- `tomorrow [zoneId]` - the current date + 1
- `now` - the date of running the current query. It has the same notion as `today`.

For example:
```sql
spark-sql> SELECT date 'tomorrow' - date 'yesterday';
2
```

### Why are the changes needed?
In the current implementation, Spark supports the special date/timestamp value in any input strings casted to dates/timestamps that leads to the following problems:
- If executors have different system time, the result is inconsistent, and random. Column values depend on where the conversions were performed.
- The special values play the role of distributed non-deterministic functions though users might think of the values as constants.

### Does this PR introduce _any_ user-facing change?
Yes but the probability should be small.

### How was this patch tested?
By running existing test suites:
```
$ build/sbt "sql/testOnly org.apache.spark.sql.SQLQueryTestSuite -- -z interval.sql"
$ build/sbt "sql/testOnly org.apache.spark.sql.SQLQueryTestSuite -- -z date.sql"
$ build/sbt "sql/testOnly org.apache.spark.sql.SQLQueryTestSuite -- -z timestamp.sql"
$ build/sbt "test:testOnly *DateTimeUtilsSuite"
```

Closes #32714 from MaxGekk/remove-datetime-special-values.

Lead-authored-by: Max Gekk <[email protected]>
Co-authored-by: Maxim Gekk <[email protected]>
Signed-off-by: Max Gekk <[email protected]>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

Projects

None yet

Development

Successfully merging this pull request may close these issues.

6 participants