-
Notifications
You must be signed in to change notification settings - Fork 28.9k
[SPARK-1442][SQL][WIP] Initial window function implementation (refactored from #2953) #3703
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
|
Test build #24462 has started for PR 3703 at commit
|
|
Comments from the review on Reviewable.io Note that instead of whitelisting window function test cases in |
|
Comments from the review on Reviewable.io sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/expressions/aggregates.scala, line 30 [r1] (raw file): sql/hive/src/main/scala/org/apache/spark/sql/hive/HiveQl.scala, line 874 [r1] (raw file): SELECT
p_mfgr, p_name, p_size,
SUM(p_size) OVER w1 AS s1,
SUM(p_size) OVER w2 AS s2,
SUM(p_size) OVER (w3 ROWS BETWEEN 2 PRECEDING AND 2 FOLLOWING) AS s3
FROM
part
WINDOW
w1 AS (DISTRIBUTE BY p_mfgr SORT BY p_size RANGE BETWEEN 2 PRECEDING AND 2 FOLLOWING),
w2 AS w3,
w3 AS (DISTRIBUTE BY p_mfgr SORT BY p_size RANGE BETWEEN UNBOUNDED PRECEDING AND CURRENT ROW)This map is cleaned and refilled in sql/hive/src/main/scala/org/apache/spark/sql/hive/HiveQl.scala, line 1060 [r1] (raw file):
|
|
Test build #24462 has finished for PR 3703 at commit
|
|
Test PASSed. |
|
Should we close this issue in favor of #3703 ? |
|
I'm confused. Why was this PR abruptly closed? Was there another active PR for window functions? |
|
I think that might have been a mistake and #2953 was supposed to be closed. Since we are unable to close PRs, there is (long story) a process that eventually closes PRs that have a comment like "mind closing this pr". That's why it got auto-closed then. That said I don't otherwise know whether this was going to proceed anyway. I don't see other PRs for this JIRA. |
This WIP PR is refactored from PR #2953. Please refer to the original PR description for features implemented and not implemented in this PR.
The original PR was a huge one, commenting on each issue could be very time consuming. After offline discussions with @guowei2, I decided to work on a refactoring branch to fix most minor issues first and then start discussion based on this refactored version.
Major issues left in this PR are:
var, which breaks query plan immutability.COUNT,SUM,AVGetc are not translated into Hive aggregation functions rather than Spark SQL builtin implementations.execution.WindowFunction) can be further simplified.