-
Notifications
You must be signed in to change notification settings - Fork 28.9k
[SPARK-31868][SQL] Restore the parsing behaviour week-based-year for 2.4 #28674
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
|
cc @cloud-fan @MaxGekk @maropu @dongjoon-hyun @HyukjinKwon thanks |
|
Test build #123295 has finished for PR 28674 at commit
|
sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/util/DateTimeFormatterHelper.scala
Outdated
Show resolved
Hide resolved
sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/util/DateTimeFormatterHelper.scala
Outdated
Show resolved
Hide resolved
sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/util/DateTimeFormatterHelper.scala
Outdated
Show resolved
Hide resolved
|
Can we define the 2.4 behavior first? The behavior looks very weird to me. If it doesn't make sense, we can narrow down the scope or even forbid it. |
I found it very difficult to follow the behavior of 2.4. It is also hard to fully define it.
I agree that we can narrow down the scope of our new formatter First of all, I suggest that we fail the 'Y' field to work with other non-week-based date fields y/M/L/d, etc as they are silently omitted. |
|
this makes sense to me that week-based date fields should not co-exist with non-week-based date fields. We should also clearly define the default values if year or month or day is not specified. |
|
Test build #123368 has finished for PR 28674 at commit
|
|
Test build #123366 has finished for PR 28674 at commit
|
| val weekBased = mayWeekBased(accessor, weekFields) | ||
| if (weekBased && mayNonWeekBased(accessor)) { | ||
| throw new DateTimeException( | ||
| s"Can not mix week-based and non-week-based date fields together for parsing dates") |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
should this happen when we create the formatter?
|
Test build #123376 has finished for PR 28674 at commit
|
This PR disables week-based date filed for parsing closes #28674 1. It's an un-fixable behavior change to fill the gap between SimpleDateFormat and DateTimeFormater and backward-compatibility for different JDKs.A lot of effort has been made to prove it at #28674 2. The existing behavior itself in 2.4 is confusing, e.g. ```sql spark-sql> select to_timestamp('1', 'w'); 1969-12-28 00:00:00 spark-sql> select to_timestamp('1', 'u'); 1970-01-05 00:00:00 ``` The 'u' here seems not to go to the Monday of the first week in week-based form or the first day of the year in non-week-based form but go to the Monday of the second week in week-based form. And, e.g. ```sql spark-sql> select to_timestamp('2020 2020', 'YYYY yyyy'); 2020-01-01 00:00:00 spark-sql> select to_timestamp('2020 2020', 'yyyy YYYY'); 2019-12-29 00:00:00 spark-sql> select to_timestamp('2020 2020 1', 'YYYY yyyy w'); NULL spark-sql> select to_timestamp('2020 2020 1', 'yyyy YYYY w'); 2019-12-29 00:00:00 ``` I think we don't need to introduce all the weird behavior from Java 3. The current test coverage for week-based date fields is almost 0%, which indicates that we've never imagined using it. 4. Avoiding JDK bugs https://issues.apache.org/jira/browse/SPARK-31880 Yes, the 'Y/W/w/u/F/E' pattern cannot be used datetime parsing functions. more tests added Closes #28706 from yaooqinn/SPARK-31892. Authored-by: Kent Yao <[email protected]> Signed-off-by: Wenchen Fan <[email protected]> (cherry picked from commit afe95bd) Signed-off-by: Wenchen Fan <[email protected]>
|
Merged to master and branch-3.0. |
|
@HyukjinKwon this PR is closed by afe95bd It's not merged... |
|
Okay, understood now .. let's be diligent on updating related JIRAs btw in particular the blockers. So SPARK-31868 is won't fix? |
|
yea, let me close it. |
What changes were proposed in this pull request?
In 3.0, the new approach in datetime parser can not extract the week-based-year field, so it goes
idempotently to 1970.
In Legacy mode a.k.a version 2.4, the result will be the last Sunday of the past year(It is weird too!)
In this PR, I propose to restore the behavior in 2.4 before we have final conclusion and do not block releasing 3.0. Since in 3.0 it is much weirder.
FYI, Postgres may set a good example here, https://www.postgresql.org/docs/9.0/functions-formatting.html
Why are the changes needed?
bugfix and behavior change restoration
Does this PR introduce any user-facing change?
No, the behavior will be restored.
How was this patch tested?
added new tests