-
Notifications
You must be signed in to change notification settings - Fork 28.9k
[SPARK-17849] [SQL] Fix NPE problem when using grouping sets #15416
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
|
cc @davies Would you help reviewing this? |
|
also cc @hvanhovell |
| } | ||
| } | ||
|
|
||
| test("SPARK-17849: grouping set throws NPE") { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
maybe we can move this into SQLQueryTestSuite, by creating a new grouping_set.q file??
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@rxin done
|
Test build #3309 has finished for PR 15416 at commit
|
| } | ||
|
|
||
| val nonNullBitmask = x.bitmasks.reduce(_ & _) | ||
| val nonNullBitmask = ~ x.bitmasks.reduce(_ | _) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Bit manipulation magic is hard to follow. This is should be documented better. Could you add a line or two to explain how the bitmasks are structured?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
+1
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Ok, I'll do it.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@hvanhovell @rxin comments are added
| // The left most bit in the bitmasks corresponds to the last expression in groupByAliases | ||
| // with 0 indicating this expression is in the grouping set. The following line of code | ||
| // calculates the bit mask representing the expressions that exist in all the grouping sets. | ||
| val nonNullBitmask = ~ x.bitmasks.reduce(_ | _) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Could you remove the '~' here, and use (nonNullBitmask & (1 << (attrLength - idx - 1))) == 1?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Do you mean ((nonNullBitmask >> (attrLength - idx - 1)) & 1) == 1? We can only test on 0 if we left shift 1, right? @davies
|
@yangw1234 Thanks for working on this, could you also double check that all the places that use bitmasks are correct? |
|
@davies Other places all seem to be correct. |
| // The rightmost bit in the bitmasks corresponds to the last expression in groupByAliases with 0 | ||
| // indicating this expression is in the grouping set. The following line of code calculates the | ||
| // bitmask representing the expressions that exist in all the grouping sets (also indicated by 0). | ||
| val nonNullBitmask = x.bitmasks.reduce(_ | _) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Should we call this nullBitmask now? (1 means it's nullable)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
done @davies
|
Test build #3337 has finished for PR 15416 at commit
|
|
@yangw1234 can you fix the scala styles? |
|
scala style fixed. |
|
retest this please |
|
LGTM, pending tests |
|
Test build #66970 has finished for PR 15416 at commit
|
|
retest this please |
|
I'll merge after a successfull test run |
|
Test build #68203 has finished for PR 15416 at commit
|
## What changes were proposed in this pull request?
Prior this pr, the following code would cause an NPE:
`case class point(a:String, b:String, c:String, d: Int)`
`val data = Seq(
point("1","2","3", 1),
point("4","5","6", 1),
point("7","8","9", 1)
)`
`sc.parallelize(data).toDF().registerTempTable("table")`
`spark.sql("select a, b, c, count(d) from table group by a, b, c GROUPING SETS ((a)) ").show()`
The reason is that when the grouping_id() behavior was changed in #10677, some code (which should be changed) was left out.
Take the above code for example, prior #10677, the bit mask for set "(a)" was `001`, while after #10677 the bit mask was changed to `011`. However, the `nonNullBitmask` was not changed accordingly.
This pr will fix this problem.
## How was this patch tested?
add integration tests
Author: wangyang <[email protected]>
Closes #15416 from yangw1234/groupingid.
(cherry picked from commit fb0d608)
Signed-off-by: Herman van Hovell <[email protected]>
## What changes were proposed in this pull request?
Prior this pr, the following code would cause an NPE:
`case class point(a:String, b:String, c:String, d: Int)`
`val data = Seq(
point("1","2","3", 1),
point("4","5","6", 1),
point("7","8","9", 1)
)`
`sc.parallelize(data).toDF().registerTempTable("table")`
`spark.sql("select a, b, c, count(d) from table group by a, b, c GROUPING SETS ((a)) ").show()`
The reason is that when the grouping_id() behavior was changed in #10677, some code (which should be changed) was left out.
Take the above code for example, prior #10677, the bit mask for set "(a)" was `001`, while after #10677 the bit mask was changed to `011`. However, the `nonNullBitmask` was not changed accordingly.
This pr will fix this problem.
## How was this patch tested?
add integration tests
Author: wangyang <[email protected]>
Closes #15416 from yangw1234/groupingid.
(cherry picked from commit fb0d608)
Signed-off-by: Herman van Hovell <[email protected]>
|
LGTM - Merging to master/2.1/2.0. Thanks! |
## What changes were proposed in this pull request?
Prior this pr, the following code would cause an NPE:
`case class point(a:String, b:String, c:String, d: Int)`
`val data = Seq(
point("1","2","3", 1),
point("4","5","6", 1),
point("7","8","9", 1)
)`
`sc.parallelize(data).toDF().registerTempTable("table")`
`spark.sql("select a, b, c, count(d) from table group by a, b, c GROUPING SETS ((a)) ").show()`
The reason is that when the grouping_id() behavior was changed in apache#10677, some code (which should be changed) was left out.
Take the above code for example, prior apache#10677, the bit mask for set "(a)" was `001`, while after apache#10677 the bit mask was changed to `011`. However, the `nonNullBitmask` was not changed accordingly.
This pr will fix this problem.
## How was this patch tested?
add integration tests
Author: wangyang <[email protected]>
Closes apache#15416 from yangw1234/groupingid.
What changes were proposed in this pull request?
Prior this pr, the following code would cause an NPE:
case class point(a:String, b:String, c:String, d: Int)val data = Seq( point("1","2","3", 1), point("4","5","6", 1), point("7","8","9", 1) )sc.parallelize(data).toDF().registerTempTable("table")spark.sql("select a, b, c, count(d) from table group by a, b, c GROUPING SETS ((a)) ").show()The reason is that when the grouping_id() behavior was changed in #10677, some code (which should be changed) was left out.
Take the above code for example, prior #10677, the bit mask for set "(a)" was
001, while after #10677 the bit mask was changed to011. However, thenonNullBitmaskwas not changed accordingly.This pr will fix this problem.
How was this patch tested?
add integration tests