Skip to content

Conversation

@MaxGekk
Copy link
Member

@MaxGekk MaxGekk commented Feb 2, 2020

What changes were proposed in this pull request?

In the PR, I propose to change the description of the to_timestamp() function, and change the pattern in the example.

Why are the changes needed?

To inform users about valid patterns for to_timestamp function.

Does this PR introduce any user-facing change?

No

How was this patch tested?

N/A

@MaxGekk
Copy link
Member Author

MaxGekk commented Feb 2, 2020

@dongjoon-hyun Please, take a look at this.

@SparkQA
Copy link

SparkQA commented Feb 2, 2020

Test build #117752 has finished for PR 27438 at commit f6a7060.

  • This patch passes all tests.
  • This patch merges cleanly.
  • This patch adds no public classes.

@dongjoon-hyun
Copy link
Member

It's too late for RC2, @MaxGekk . Let's consider this PR after RC2 vote.

@dongjoon-hyun
Copy link
Member

In general, this seems to be a follow-up of SPARK-23792 (at 2.4.0). Since it's too old JIRA, we had better use another JIRA or use [MINOR] like you. I support your choice. I believe that one thing we need is that yyyy-MM-dd HH:mm:ss.SSSS -> yyyy-MM-dd HH:mm:ss change. I recommend to focus on the above change only.

@dongjoon-hyun
Copy link
Member

@MaxGekk . Every committer's suggestion is his/her own criteria for his/her acceptance. I am also able to merge only what I can agree. And, as you know, the other committers also have different opinion and they will merge this if they agree with AS-IS status more. It always does. In addition, I don't complain about the other committer's decision when I understand it's on the edge. Since this PR is yours, it's up to you always~ @MaxGekk .

@dongjoon-hyun
Copy link
Member

Anyway, thank you always for you active contribution. Apache Spark community really needs that. 😄

@MaxGekk MaxGekk changed the title [MINOR][SQL][DOCS][2.4] Fix the descriptions of to_timestamp and ParseToTimestamp [MINOR][SQL][DOCS][2.4] Fix the timestamp pattern in the example for to_timestamp Feb 3, 2020
Copy link
Member

@HyukjinKwon HyukjinKwon left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM. Since @dongjoon-hyun is preparing the release, I will leave it to him though.

@SparkQA
Copy link

SparkQA commented Feb 3, 2020

Test build #117767 has finished for PR 27438 at commit c7f75a9.

  • This patch passes all tests.
  • This patch merges cleanly.
  • This patch adds no public classes.

* See [[java.text.SimpleDateFormat]] for valid date and time format patterns
*
* @param s A date, timestamp or string. If a string, the data must be in a format that can be
* cast to a timestamp, such as `yyyy-MM-dd` or `yyyy-MM-dd HH:mm:ss.SSSS`
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

what's wrong with ".SSSS"?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

  1. The result is incorrect, look at the PR which fixes that [SPARK-29904][SQL][2.4] Parse timestamps in microsecond precision by JSON/CSV datasources #26507, and here is an example:
    val sdf = new java.text.SimpleDateFormat("yyyy-MM-dd HH:mm:ss.SSSS")
    val res = sdf.parse("1970-01-01 00:00:00.1234")
    println(sdf.format(res))

1970-01-01 00:00:01.0234
  1. And the result of to_timestamp is truncated to seconds.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

for 1), is it fixed in 2.4 or not?

for 2), have we documented it?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

SPARK-29904 is merged at 2.4.5. In the worst case, we can revert @MaxGekk 's #26507 because of this.

Copy link
Member Author

@MaxGekk MaxGekk Feb 3, 2020

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

for 1), is it fixed in 2.4 or not?

@cloud-fan It is fixed in JSON and CSV datasources only but not for to_timestamp() and other functions.

for 2), have we documented it?

I have tried to document it here in the PR

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

In the worst case, we can revert @MaxGekk 's #26507 because of this.

@dongjoon-hyun Why? What's the worst case? I do think we should apply the fix in other places but not reverting it.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@cloud-fan I tried to document the restriction of to_timestmp() in the PR but @dongjoon-hyun convinced me to do not do that here, see #27438 (comment) " I believe that one thing we need is that yyyy-MM-dd HH:mm:ss.SSSS -> yyyy-MM-dd HH:mm:ss change. I recommend to focus on the above change only."

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@MaxGekk . To be clear, I want to make it sure we don't have any regression at 2.4.5 as a release manager. You raised this issue and we are investigating the relevant PRs. That's all. We are considering the scope and effect. So far, we didn't make any decision.

/**
* Converts time string with the given pattern to timestamp.
*
* See [[java.text.SimpleDateFormat]] for valid date and time format patterns
Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If you don't mind, I would like to say here that currently supported pattern for seconds fractions is SSS only. The second s fractions can be parsed but will be ignored while casting to TimestampType.

Copy link
Member

@dongjoon-hyun dongjoon-hyun left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

+1, LGTM (AS-IS). Thanks, @MaxGekk and @HyukjinKwon .
Let's postpone this PR during RC2 vote period.

@cloud-fan
Copy link
Contributor

maybe we can just fix the issue in to_timestamp after RC2 passes?

@MaxGekk
Copy link
Member Author

MaxGekk commented Feb 4, 2020

maybe we can just fix the issue in to_timestamp after RC2 passes?

@cloud-fan I have tried to port all functions on TimestampParser by rewriting DateTimeUtils.newDateFormat:

  def newDateFormat(formatString: String, timeZone: TimeZone): TimestampParser = {
    new TimestampParser(FastDateFormat.getInstance(formatString, timeZone))
  }

from

  def newDateFormat(formatString: String, timeZone: TimeZone): DateFormat = {
    val sdf = new SimpleDateFormat(formatString, Locale.US)
    sdf.setTimeZone(timeZone)
    // Enable strict parsing, if the input date/format is invalid, it will throw an exception.
    // e.g. to parse invalid date '2016-13-12', or '2016-01-12' with  invalid format 'yyyy-aa-dd',
    // an exception will be throwed.
    sdf.setLenient(false)
    sdf
  }

but I got a few test failures due to strict mode set via sdf.setLenient(false). I haven't found a way to set similar mode for FastDateFormat.

@cloud-fan
Copy link
Contributor

OK then this doc change LGTM

@dongjoon-hyun
Copy link
Member

Thank you for the investigation, @MaxGekk . And, thank you for the conclusion, @cloud-fan .

@MaxGekk
Copy link
Member Author

MaxGekk commented Feb 6, 2020

@dongjoon-hyun Kindly remind you about this minor fix.

@dongjoon-hyun
Copy link
Member

Sure, @MaxGekk . Sorry for making you wait. Merged to branch-2.4.

dongjoon-hyun pushed a commit that referenced this pull request Feb 6, 2020
…`to_timestamp`

### What changes were proposed in this pull request?
In the PR, I propose to change the description of the `to_timestamp()` function, and change the pattern in the example.

### Why are the changes needed?
To inform users about valid patterns for `to_timestamp` function.

### Does this PR introduce any user-facing change?
No

### How was this patch tested?
N/A

Closes #27438 from MaxGekk/to_timestamp-z-2.4.

Authored-by: Maxim Gekk <[email protected]>
Signed-off-by: Dongjoon Hyun <[email protected]>
@MaxGekk MaxGekk deleted the to_timestamp-z-2.4 branch June 5, 2020 19:43
scunniff pushed a commit to scunniff/nomad-spark that referenced this pull request Nov 10, 2020
…`to_timestamp`

### What changes were proposed in this pull request?
In the PR, I propose to change the description of the `to_timestamp()` function, and change the pattern in the example.

### Why are the changes needed?
To inform users about valid patterns for `to_timestamp` function.

### Does this PR introduce any user-facing change?
No

### How was this patch tested?
N/A

Closes apache#27438 from MaxGekk/to_timestamp-z-2.4.

Authored-by: Maxim Gekk <[email protected]>
Signed-off-by: Dongjoon Hyun <[email protected]>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Projects

None yet

Development

Successfully merging this pull request may close these issues.

5 participants