Skip to content

Conversation

@scwf
Copy link

@scwf scwf commented May 2, 2015

@yhuai @guowei2 This PR fix two issues
1 HiveQl line number 796, it should be withDistinct, otherwise leads to test failure

2 Fix the case of sql in my added unit tests:

select area, sum(product), sum(sum(product)) over (partition by area) from windowData group by month, area

This sql want to aggregate product for (month, area) and (area)

select area, sum(product), sum(product) / sum(sum(product)) over (partition by area) from windowData group by month, area

this sql want to aggregate product for (month, area) and want to know the ratio over the product of all area.

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think instead of having regularExpressionWithAlias, a better way is to extract all expressions from an WindowExpression and push them to the underlying Projects and Aggregates. Then, a WindowExpression will only have AttributeReference inside.

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

correct me if i misunderstanding,
1

a better way is to extract all expressions from an WindowExpression and push them to the underlying Projects and Aggregates

Do you mean push down all expressions of WindowExpression including self?
val allexpressions = windowExpressions.flatMap(_.collect {
case e: Expression => e
})

then allexpressions here including some expression we do not want to push down,such as
HiveWindowFunction
WindowSpecDefinition
CAST ...

2 i think we can not push down all expressions, see this example:
select sum(key) as s1, sum(sum(key)) over (...) from src group by value
we can not push down key to underlying aggregate because key is not in group by expressions, this will lead to analysis error. Here we just want to replace sum(sum(key)) with sum(s1)

3 IMO, we should only consider Alias in regularExpressions, user may also use them in windowexpression such as my examples in description above, so i substitute them.
And here this case actually is not a pushing down expression case, the underlying Aggregates already has the output attribute, and in windowexpression we want to use the attribute(otherwise analysis error), so do a substitution here.

Copy link
Owner

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@scwf I think it is not a basic solution。
It is still error when select sum(key) -1 , sum(sum(key)) over (...) from src group by value

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

this is ok in hive?

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

yeah this is a case i did not considered, i am trying to fix this.

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

And @guowei2 can you merge this firstly?, let's see what jenkins say

Copy link
Owner

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

expression 'key' is neither present in the group by, nor is it an aggregate function
The key issue is that sum(key) -1 is a unit compute expression in aggregate. so it doesn't work just to substitute sum(key) to attribute

@yhuai
Copy link
Collaborator

yhuai commented May 4, 2015

I have merged it.

@yhuai yhuai closed this May 4, 2015
@scwf
Copy link
Author

scwf commented May 4, 2015

@yhuai , i have not fixed all the issues, this PR has test failure

@yhuai
Copy link
Collaborator

yhuai commented May 4, 2015

What test failures? Tests you added and we already have or tests you have not added? There is no failure in the current pr. If you have any case that has not been added in your mind, please comment on the pr. Thanks!

@yhuai
Copy link
Collaborator

yhuai commented May 4, 2015

btw, after I merged your pr, I made some changes on the analysis part.

@scwf
Copy link
Author

scwf commented May 4, 2015

YES, i have noticed that you have fixed it. it is in analyzer changes leads to window.q failure

guowei2 pushed a commit that referenced this pull request May 5, 2015
SQL
```
select key from (select key from src limit 100) t2 limit 10
```
Optimized Logical Plan before modifying
```
== Optimized Logical Plan ==
Limit 10
Limit 100
Project key#3
MetastoreRelation default, src, None
```
Optimized Logical Plan after modifying
```
== Optimized Logical Plan ==
Limit 10
 Project [key#1]
  MetastoreRelation default, src, None
```

Author: Zhongshuai Pei <[email protected]>
Author: DoingDone9 <[email protected]>

Closes apache#5770 from DoingDone9/limitOptimizer and squashes the following commits:

c68eaa7 [Zhongshuai Pei] Update CombiningLimitsSuite.scala
97e18cf [Zhongshuai Pei] Update Optimizer.scala
19ab875 [Zhongshuai Pei] Update CombiningLimitsSuite.scala
7db4566 [Zhongshuai Pei] Update CombiningLimitsSuite.scala
e2a491d [Zhongshuai Pei] Update Optimizer.scala
f03fe7f [Zhongshuai Pei] Merge pull request apache#12 from apache/master
f12fa50 [Zhongshuai Pei] Merge pull request apache#10 from apache/master
f61210c [Zhongshuai Pei] Merge pull request apache#9 from apache/master
34b1a9a [Zhongshuai Pei] Merge pull request apache#8 from apache/master
802261c [DoingDone9] Merge pull request apache#7 from apache/master
d00303b [DoingDone9] Merge pull request apache#6 from apache/master
98b134f [DoingDone9] Merge pull request #5 from apache/master
161cae3 [DoingDone9] Merge pull request #4 from apache/master
c87e8b6 [DoingDone9] Merge pull request #3 from apache/master
cb1852d [DoingDone9] Merge pull request #2 from apache/master
c3f046f [DoingDone9] Merge pull request #1 from apache/master
guowei2 pushed a commit that referenced this pull request May 5, 2015
SQL
```
select key from (select key,value from t1 limit 100) t2 limit 10
```
Optimized Logical Plan before modifying
```
== Optimized Logical Plan ==
Limit 10
  Project key#228
    Limit 100
      MetastoreRelation default, t1, None
```
Optimized Logical Plan after modifying
```
== Optimized Logical Plan ==
Limit 10
  Limit 100
    Project key#228
      MetastoreRelation default, t1, None
```
After this, we can combine limits

Author: Zhongshuai Pei <[email protected]>
Author: DoingDone9 <[email protected]>

Closes apache#5797 from DoingDone9/ProjectLimit and squashes the following commits:

70d0fca [Zhongshuai Pei] Update FilterPushdownSuite.scala
dc83ae9 [Zhongshuai Pei] Update FilterPushdownSuite.scala
485c61c [Zhongshuai Pei] Update Optimizer.scala
f03fe7f [Zhongshuai Pei] Merge pull request apache#12 from apache/master
f12fa50 [Zhongshuai Pei] Merge pull request apache#10 from apache/master
f61210c [Zhongshuai Pei] Merge pull request apache#9 from apache/master
34b1a9a [Zhongshuai Pei] Merge pull request apache#8 from apache/master
802261c [DoingDone9] Merge pull request apache#7 from apache/master
d00303b [DoingDone9] Merge pull request apache#6 from apache/master
98b134f [DoingDone9] Merge pull request #5 from apache/master
161cae3 [DoingDone9] Merge pull request #4 from apache/master
c87e8b6 [DoingDone9] Merge pull request #3 from apache/master
cb1852d [DoingDone9] Merge pull request #2 from apache/master
c3f046f [DoingDone9] Merge pull request #1 from apache/master
guowei2 pushed a commit that referenced this pull request May 12, 2015
…" into true or false directly

SQL
```
select key from src where 3 in (4, 5);
```
Before
```
== Optimized Logical Plan ==
Project [key#12]
 Filter 3 INSET (5,4)
  MetastoreRelation default, src, None
```

After
```
== Optimized Logical Plan ==
LocalRelation [key#228], []
```

Author: Zhongshuai Pei <[email protected]>
Author: DoingDone9 <[email protected]>

Closes apache#5972 from DoingDone9/InToFalse and squashes the following commits:

4c722a2 [Zhongshuai Pei] Update predicates.scala
abe2bbb [Zhongshuai Pei] Update Optimizer.scala
fa461a5 [Zhongshuai Pei] Update Optimizer.scala
e34c28a [Zhongshuai Pei] Update predicates.scala
24739bd [Zhongshuai Pei] Update ConstantFoldingSuite.scala
f4dbf50 [Zhongshuai Pei] Update ConstantFoldingSuite.scala
35ceb7a [Zhongshuai Pei] Update Optimizer.scala
36c194e [Zhongshuai Pei] Update Optimizer.scala
2e8f6ca [Zhongshuai Pei] Update Optimizer.scala
14952e2 [Zhongshuai Pei] Merge pull request apache#13 from apache/master
f03fe7f [Zhongshuai Pei] Merge pull request apache#12 from apache/master
f12fa50 [Zhongshuai Pei] Merge pull request apache#10 from apache/master
f61210c [Zhongshuai Pei] Merge pull request apache#9 from apache/master
34b1a9a [Zhongshuai Pei] Merge pull request apache#8 from apache/master
802261c [DoingDone9] Merge pull request apache#7 from apache/master
d00303b [DoingDone9] Merge pull request apache#6 from apache/master
98b134f [DoingDone9] Merge pull request #5 from apache/master
161cae3 [DoingDone9] Merge pull request #4 from apache/master
c87e8b6 [DoingDone9] Merge pull request #3 from apache/master
cb1852d [DoingDone9] Merge pull request #2 from apache/master
c3f046f [DoingDone9] Merge pull request #1 from apache/master
guowei2 pushed a commit that referenced this pull request Jun 16, 2015
This PR move codegen implementation of expressions into Expression class itself, make it easy to manage.

It introduces two APIs in Expression:
```
def gen(ctx: CodeGenContext): GeneratedExpressionCode
def genCode(ctx: CodeGenContext, ev: GeneratedExpressionCode): Code
```

gen(ctx) will call genSource(ctx, ev) to generate Java source code for the current expression. A expression needs to override genSource().

Here are the types:
```
type Term String
type Code String

/**
 * Java source for evaluating an [[Expression]] given a [[Row]] of input.
 */
case class GeneratedExpressionCode(var code: Code,
                               nullTerm: Term,
                               primitiveTerm: Term,
                               objectTerm: Term)
/**
 * A context for codegen, which is used to bookkeeping the expressions those are not supported
 * by codegen, then they are evaluated directly. The unsupported expression is appended at the
 * end of `references`, the position of it is kept in the code, used to access and evaluate it.
 */
class CodeGenContext {
  /**
   * Holding all the expressions those do not support codegen, will be evaluated directly.
   */
  val references: Seq[Expression] = new mutable.ArrayBuffer[Expression]()
}
```

This is basically apache#6660, but fixed style violation and compilation failure.

Author: Davies Liu <[email protected]>
Author: Reynold Xin <[email protected]>

Closes apache#6690 from rxin/codegen and squashes the following commits:

e1368c2 [Reynold Xin] Fixed tests.
73db80e [Reynold Xin] Fixed compilation failure.
19d6435 [Reynold Xin] Fixed style violation.
9adaeaf [Davies Liu] address comments
f42c732 [Davies Liu] improve coverage and tests
bad6828 [Davies Liu] address comments
e03edaa [Davies Liu] consts fold
86fac2c [Davies Liu] fix style
02262c9 [Davies Liu] address comments
b5d3617 [Davies Liu] Merge pull request #5 from rxin/codegen
48c454f [Reynold Xin] Some code gen update.
2344bc0 [Davies Liu] fix test
12ff88a [Davies Liu] fix build
c5fb514 [Davies Liu] rename
8c6d82d [Davies Liu] update docs
b145047 [Davies Liu] fix style
e57959d [Davies Liu] add type alias
3ff25f8 [Davies Liu] refactor
593d617 [Davies Liu] pushing codegen into Expression
guowei2 pushed a commit that referenced this pull request Jun 16, 2015
…into a single batch.

SQL
```
select * from tableA join tableB on (a > 3 and b = d) or (a > 3 and b = e)
```
Plan before modify
```
== Optimized Logical Plan ==
Project [a#293,b#294,c#295,d#296,e#297]
 Join Inner, Some(((a#293 > 3) && ((b#294 = d#296) || (b#294 = e#297))))
  MetastoreRelation default, tablea, None
  MetastoreRelation default, tableb, None
```
Plan after modify
```
== Optimized Logical Plan ==
Project [a#293,b#294,c#295,d#296,e#297]
 Join Inner, Some(((b#294 = d#296) || (b#294 = e#297)))
  Filter (a#293 > 3)
   MetastoreRelation default, tablea, None
  MetastoreRelation default, tableb, None
```

CombineLimits ==> Limit(If(LessThan(ne, le), ne, le), grandChild) and LessThan is in BooleanSimplification ,  so CombineLimits  must before BooleanSimplification and BooleanSimplification must before PushPredicateThroughJoin.

Author: Zhongshuai Pei <[email protected]>
Author: DoingDone9 <[email protected]>

Closes apache#6351 from DoingDone9/master and squashes the following commits:

20de7be [Zhongshuai Pei] Update Optimizer.scala
7bc7d28 [Zhongshuai Pei] Merge pull request apache#17 from apache/master
0ba5f42 [Zhongshuai Pei] Update Optimizer.scala
f8b9314 [Zhongshuai Pei] Update FilterPushdownSuite.scala
c529d9f [Zhongshuai Pei] Update FilterPushdownSuite.scala
ae3af6d [Zhongshuai Pei] Update FilterPushdownSuite.scala
a04ffae [Zhongshuai Pei] Update Optimizer.scala
11beb61 [Zhongshuai Pei] Update FilterPushdownSuite.scala
f2ee5fe [Zhongshuai Pei] Update Optimizer.scala
be6b1d5 [Zhongshuai Pei] Update Optimizer.scala
b01e622 [Zhongshuai Pei] Merge pull request apache#15 from apache/master
8df716a [Zhongshuai Pei] Update FilterPushdownSuite.scala
d98bc35 [Zhongshuai Pei] Update FilterPushdownSuite.scala
fa65718 [Zhongshuai Pei] Update Optimizer.scala
ab8e9a6 [Zhongshuai Pei] Merge pull request apache#14 from apache/master
14952e2 [Zhongshuai Pei] Merge pull request apache#13 from apache/master
f03fe7f [Zhongshuai Pei] Merge pull request apache#12 from apache/master
f12fa50 [Zhongshuai Pei] Merge pull request apache#10 from apache/master
f61210c [Zhongshuai Pei] Merge pull request apache#9 from apache/master
34b1a9a [Zhongshuai Pei] Merge pull request apache#8 from apache/master
802261c [DoingDone9] Merge pull request apache#7 from apache/master
d00303b [DoingDone9] Merge pull request apache#6 from apache/master
98b134f [DoingDone9] Merge pull request #5 from apache/master
161cae3 [DoingDone9] Merge pull request #4 from apache/master
c87e8b6 [DoingDone9] Merge pull request #3 from apache/master
cb1852d [DoingDone9] Merge pull request #2 from apache/master
c3f046f [DoingDone9] Merge pull request #1 from apache/master
guowei2 pushed a commit that referenced this pull request Jul 6, 2015
…" into true or false directly

SQL
```
select key from src where 3 in (4, 5);
```
Before
```
== Optimized Logical Plan ==
Project [key#12]
 Filter 3 INSET (5,4)
  MetastoreRelation default, src, None
```

After
```
== Optimized Logical Plan ==
LocalRelation [key#228], []
```

Author: Zhongshuai Pei <[email protected]>
Author: DoingDone9 <[email protected]>

Closes apache#5972 from DoingDone9/InToFalse and squashes the following commits:

4c722a2 [Zhongshuai Pei] Update predicates.scala
abe2bbb [Zhongshuai Pei] Update Optimizer.scala
fa461a5 [Zhongshuai Pei] Update Optimizer.scala
e34c28a [Zhongshuai Pei] Update predicates.scala
24739bd [Zhongshuai Pei] Update ConstantFoldingSuite.scala
f4dbf50 [Zhongshuai Pei] Update ConstantFoldingSuite.scala
35ceb7a [Zhongshuai Pei] Update Optimizer.scala
36c194e [Zhongshuai Pei] Update Optimizer.scala
2e8f6ca [Zhongshuai Pei] Update Optimizer.scala
14952e2 [Zhongshuai Pei] Merge pull request apache#13 from apache/master
f03fe7f [Zhongshuai Pei] Merge pull request apache#12 from apache/master
f12fa50 [Zhongshuai Pei] Merge pull request apache#10 from apache/master
f61210c [Zhongshuai Pei] Merge pull request apache#9 from apache/master
34b1a9a [Zhongshuai Pei] Merge pull request apache#8 from apache/master
802261c [DoingDone9] Merge pull request apache#7 from apache/master
d00303b [DoingDone9] Merge pull request apache#6 from apache/master
98b134f [DoingDone9] Merge pull request #5 from apache/master
161cae3 [DoingDone9] Merge pull request #4 from apache/master
c87e8b6 [DoingDone9] Merge pull request #3 from apache/master
cb1852d [DoingDone9] Merge pull request #2 from apache/master
c3f046f [DoingDone9] Merge pull request #1 from apache/master

(cherry picked from commit 4b5e1fe)
Signed-off-by: Michael Armbrust <[email protected]>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants