-
Notifications
You must be signed in to change notification settings - Fork 28.9k
[SPARK-39259][SQL][3.2] Evaluate timestamps consistently in subqueries #36753
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
|
Can one of the admins verify this patch? |
|
There are some Python errors in the build, which seem to me like there is a problem with pandas or python versions in the build pipeline. Anyway, I really doubt that I broke those tests |
Looking at GAs for commits in https://github.com/apache/spark/commits/branch-3.2, they have been broken for some time already. cc @HyukjinKwon @ueshin @itholic |
|
Let me fix it |
|
Let's wait for #36759 and then re-trigger the build via an empty commit: |
|
@olaky Could you re-trigger builds, please. |
63cc981 to
ae808ea
Compare
dongjoon-hyun
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Hi, folks. Please hold on this.
This patch seems to break Scala 2.13 in master/branch-3.3 (including RC4).
|
Here is the PR to fix Scala 2.13 issue. |
|
When updating this PR, let's also pull in my changes from #36765 . When merging this, we should probably pick it all the way back to 3.0 (since it looks like a correctness issue that would also impact those versions). |
…` in `ComputeCurrentTimeSuite` ### What changes were proposed in this pull request? Unfortunately, apache#36654 causes seven Scala 2.13 test failures in master/3.3 and Apache Spark 3.3 RC4. This PR aims to fix Scala 2.13 ClassCastException in the test code. ### Why are the changes needed? ``` $ dev/change-scala-version.sh 2.13 $ build/sbt "catalyst/testOnly *.ComputeCurrentTimeSuite" -Pscala-2.13 ... [info] ComputeCurrentTimeSuite: [info] - analyzer should replace current_timestamp with literals *** FAILED *** (1 second, 189 milliseconds) [info] java.lang.ClassCastException: scala.collection.mutable.ArrayBuffer cannot be cast to scala.collection.immutable.Seq [info] at org.apache.spark.sql.catalyst.optimizer.ComputeCurrentTimeSuite.literals(ComputeCurrentTimeSuite.scala:146) [info] at org.apache.spark.sql.catalyst.optimizer.ComputeCurrentTimeSuite.$anonfun$new$1(ComputeCurrentTimeSuite.scala:47) ... [info] *** 7 TESTS FAILED *** [error] Failed tests: [error] org.apache.spark.sql.catalyst.optimizer.ComputeCurrentTimeSuite [error] (catalyst / Test / testOnly) sbt.TestsFailedException: Tests unsuccessful [error] Total time: 189 s (03:09), completed Jun 3, 2022 10:29:39 AM ``` ### Does this PR introduce _any_ user-facing change? No. ### How was this patch tested? Pass the CIs and manually tests with Scala 2.13. ``` $ dev/change-scala-version.sh 2.13 $ build/sbt "catalyst/testOnly *.ComputeCurrentTimeSuite" -Pscala-2.13 ... [info] ComputeCurrentTimeSuite: [info] - analyzer should replace current_timestamp with literals (545 milliseconds) [info] - analyzer should replace current_date with literals (11 milliseconds) [info] - SPARK-33469: Add current_timezone function (3 milliseconds) [info] - analyzer should replace localtimestamp with literals (4 milliseconds) [info] - analyzer should use equal timestamps across subqueries (182 milliseconds) [info] - analyzer should use consistent timestamps for different timezones (13 milliseconds) [info] - analyzer should use consistent timestamps for different timestamp functions (2 milliseconds) [info] Run completed in 1 second, 579 milliseconds. [info] Total number of tests run: 7 [info] Suites: completed 1, aborted 0 [info] Tests: succeeded 7, failed 0, canceled 0, ignored 0, pending 0 [info] All tests passed. [success] Total time: 12 s, completed Jun 3, 2022, 10:54:03 AM ``` Closes apache#36762 from dongjoon-hyun/SPARK-39259. Authored-by: Dongjoon Hyun <[email protected]> Signed-off-by: Dongjoon Hyun <[email protected]>
JoshRosen
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM for backport pending tests.
For committers: whoever merges this should also try to merge to 3.1 and 3.0.
|
I guess the failure is not related to PR's changes: Last commits in https://github.com/apache/spark/commits/branch-3.2 fail on the same. |
|
+1, LGTM. Merging to 3.2 and trying to merge to 3.1/3.0. |
### What changes were proposed in this pull request? Apply the optimizer rule ComputeCurrentTime consistently across subqueries. This is a backport of #36654. ### Why are the changes needed? At the moment timestamp functions like now() can return different values within a query if subqueries are involved ### Does this PR introduce _any_ user-facing change? No ### How was this patch tested? A new unit test was added Closes #36753 from olaky/SPARK-39259-spark_3_2. Lead-authored-by: Ole Sasse <[email protected]> Co-authored-by: Josh Rosen <[email protected]> Co-authored-by: Dongjoon Hyun <[email protected]> Signed-off-by: Max Gekk <[email protected]>
|
@olaky The changes cause conflicts in branch-3.1. Could you PRs w/ backports to 3.1 and 3.0, please. |
|
Merging is blocked because of a test failure that also surfaces in #36386 |
|
@MaxGekk since you closed this, should I still work on propagating this to 3.1 and 3.0? And how should we deal with the test failures happening on branch-3.2? |
|
@MaxGekk 3.1 already does not have any transform with subqueries function, so I would have to backport this as well. I personally feel that this could be a risky endeavour not worth doing to get this fix, what do you think? |
Could you list required PRs, please. Is it possible to extract only needed functions from them? |
|
And for 3.0 we have: #36822 (looks to me like there is an infrastructure problem with building it) |
Thank you for making a backporting PR to |
### What changes were proposed in this pull request? Apply the optimizer rule ComputeCurrentTime consistently across subqueries. This is a backport of apache#36654. ### Why are the changes needed? At the moment timestamp functions like now() can return different values within a query if subqueries are involved ### Does this PR introduce _any_ user-facing change? No ### How was this patch tested? A new unit test was added Closes apache#36753 from olaky/SPARK-39259-spark_3_2. Lead-authored-by: Ole Sasse <[email protected]> Co-authored-by: Josh Rosen <[email protected]> Co-authored-by: Dongjoon Hyun <[email protected]> Signed-off-by: Max Gekk <[email protected]>

What changes were proposed in this pull request?
Apply the optimizer rule ComputeCurrentTime consistently across subqueries.
This is a backport of #36654.
Why are the changes needed?
At the moment timestamp functions like now() can return different values within a query if subqueries are involved
Does this PR introduce any user-facing change?
No
How was this patch tested?
A new unit test was added