[SPARK-1442][SQL][WIP] Initial window function implementation (refactored from #2953) #3703

liancheng · 2014-12-15T17:54:25Z

This WIP PR is refactored from PR #2953. Please refer to the original PR description for features implemented and not implemented in this PR.

The original PR was a huge one, commenting on each issue could be very time consuming. After offline discussions with @guowei2, I decided to work on a refactoring branch to fix most minor issues first and then start discussion based on this refactored version.

Major issues left in this PR are:

Window spec is added to aggregation functions with a var, which breaks query plan immutability.
When used with window specs, common aggregation functions like COUNT, SUM, AVG etc are not translated into Hive aggregation functions rather than Spark SQL builtin implementations.
Execution code (execution.WindowFunction) can be further simplified.

…l, false)

SparkQA · 2014-12-15T17:57:34Z

Test build #24462 has started for PR 3703 at commit 922a8b9.

This patch merges cleanly.

liancheng · 2014-12-15T17:58:06Z

Comments from the review on Reviewable.io

Note that instead of whitelisting window function test cases in HiveCompatibilitySuite, a new HiveWindowFunctionSuite was added. This is because the current Spark SQL HiveQl parser doesn't handle comments, and window function test input files come with Hive contains comment lines.

liancheng · 2014-12-15T18:35:55Z

Comments from the review on Reviewable.io

^{sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/expressions/aggregates.scala, line 30 [r1] (raw file):}
This can be problematic. Ideally every aggregation function that can be used with window should have a windowSpec: Option[WindowSpec] field which defaults to None, and a withWindowSpec method that returns a new instance of the aggregation function object itself with a window spec.

^{sql/hive/src/main/scala/org/apache/spark/sql/hive/HiveQl.scala, line 874 [r1] (raw file):}
The thread-local windowDefs map is used to store window definitions (w1, w2 and w3) in queries like this:

SELECT
    p_mfgr, p_name, p_size,
    SUM(p_size) OVER w1 AS s1,
    SUM(p_size) OVER w2 AS s2,
    SUM(p_size) OVER (w3 ROWS BETWEEN 2 PRECEDING AND 2 FOLLOWING)  AS s3
FROM
    part
WINDOW
    w1 AS (DISTRIBUTE BY p_mfgr SORT BY p_size RANGE BETWEEN 2 PRECEDING AND 2 FOLLOWING),
    w2 AS w3,
    w3 AS (DISTRIBUTE BY p_mfgr SORT BY p_size RANGE BETWEEN UNBOUNDED PRECEDING AND CURRENT ROW)

This map is cleaned and refilled in collectWindowDefs below, so it doesn't grow indefinitely.

^{sql/hive/src/main/scala/org/apache/spark/sql/hive/HiveQl.scala, line 1060 [r1] (raw file):}
All builtin aggregation functions need a similar case clause to handle their windowed version. Otherwise they all fallback to Hive UDAF implementations.

COUNT is picked here because its Hive version GenericUDAFCount implements GenericUDAFResolver2 rather than AbstractGenericUDAFResolver, and is not handled by HiveFunctionRegistry.lookupFunction.

SparkQA · 2014-12-15T19:11:27Z

Test build #24462 has finished for PR 3703 at commit 922a8b9.

This patch passes all tests.
This patch merges cleanly.
This patch adds the following public classes (experimental):
- case class WindowSpec(windowPartition: WindowPartition, windowFrame: Option[WindowFrame])
- case class WindowPartition(partitionBy: Seq[Expression], sortBy: Seq[SortOrder])
- case class WindowFrame(frameType: FrameType, preceding: Int, following: Int)
- abstract class AggregateExpression extends Expression with Serializable
- case class WindowFunction(
- case class WindowFunction(

AmplabJenkins · 2014-12-15T19:11:30Z

Test PASSed.
Refer to this link for build results (access rights to CI server needed):
https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/24462/
Test PASSed.

marmbrus · 2014-12-17T23:42:48Z

Should we close this issue in favor of #3703 ?

nchammas · 2015-01-13T16:07:42Z

Should we close this issue in favor of #3703 ?

@marmbrus This is #3703. Did you mean another PR?

mallman · 2015-03-19T17:44:06Z

I'm confused. Why was this PR abruptly closed? Was there another active PR for window functions?

srowen · 2015-03-19T19:14:51Z

I think that might have been a mistake and #2953 was supposed to be closed. Since we are unable to close PRs, there is (long story) a process that eventually closes PRs that have a comment like "mind closing this pr". That's why it got auto-closed then.

That said I don't otherwise know whether this was going to proceed anyway. I don't see other PRs for this JIRA.

guowei2 and others added 14 commits December 15, 2014 16:28

window function

9897413

window function

7d7a703

window function

1999e07

window function

76bfd4b

window function

88c5789

fix problems after rebase

828199a

change test suite and golden files

03bd77d

add constant objectinspector support for udafs, such as last_value(co…

d06baeb

…l, false)

fix window function to support multi-different window partitions

173016c

fix DecimalType bug after rebase

ab21933

fix bug about attribute reference

66ef7a6

WIP: refactoring window functions support

dc87d8d

Removed trailing spaces from query string in HiveWindowFunctionSuite

2da6175

Fixes COUNT with window spec

922a8b9

asfgit closed this in ca12608 Dec 17, 2014

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

[SPARK-1442][SQL][WIP] Initial window function implementation (refactored from #2953) #3703

[SPARK-1442][SQL][WIP] Initial window function implementation (refactored from #2953) #3703

Uh oh!

liancheng commented Dec 15, 2014

Uh oh!

SparkQA commented Dec 15, 2014

Uh oh!

liancheng commented Dec 15, 2014

Uh oh!

liancheng commented Dec 15, 2014

Uh oh!

SparkQA commented Dec 15, 2014

Uh oh!

AmplabJenkins commented Dec 15, 2014

Uh oh!

marmbrus commented Dec 17, 2014

Uh oh!

nchammas commented Jan 13, 2015

Uh oh!

mallman commented Mar 19, 2015

Uh oh!

srowen commented Mar 19, 2015

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

8 participants

[SPARK-1442][SQL][WIP] Initial window function implementation (refactored from #2953) #3703

[SPARK-1442][SQL][WIP] Initial window function implementation (refactored from #2953) #3703

Uh oh!

Conversation

liancheng commented Dec 15, 2014

Uh oh!

SparkQA commented Dec 15, 2014

Uh oh!

liancheng commented Dec 15, 2014

Uh oh!

liancheng commented Dec 15, 2014

Uh oh!

SparkQA commented Dec 15, 2014

Uh oh!

AmplabJenkins commented Dec 15, 2014

Uh oh!

marmbrus commented Dec 17, 2014

Uh oh!

nchammas commented Jan 13, 2015

Uh oh!

mallman commented Mar 19, 2015

Uh oh!

srowen commented Mar 19, 2015

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

8 participants