[SPARK-17849] [SQL] Fix NPE problem when using grouping sets #15416

yangw1234 · 2016-10-10T07:32:18Z

What changes were proposed in this pull request?

Prior this pr, the following code would cause an NPE:
case class point(a:String, b:String, c:String, d: Int)

val data = Seq( point("1","2","3", 1), point("4","5","6", 1), point("7","8","9", 1) )
sc.parallelize(data).toDF().registerTempTable("table")
spark.sql("select a, b, c, count(d) from table group by a, b, c GROUPING SETS ((a)) ").show()

The reason is that when the grouping_id() behavior was changed in #10677, some code (which should be changed) was left out.

Take the above code for example, prior #10677, the bit mask for set "(a)" was 001, while after #10677 the bit mask was changed to 011. However, the nonNullBitmask was not changed accordingly.

This pr will fix this problem.

How was this patch tested?

add integration tests

yangw1234 · 2016-10-10T07:34:02Z

cc @davies Would you help reviewing this?

yangw1234 · 2016-10-10T07:36:09Z

also cc @hvanhovell

rxin · 2016-10-10T07:53:28Z

sql/core/src/test/scala/org/apache/spark/sql/SQLQuerySuite.scala

    }
  }

+  test("SPARK-17849: grouping set throws NPE") {


maybe we can move this into SQLQueryTestSuite, by creating a new grouping_set.q file??

SparkQA · 2016-10-10T15:03:43Z

Test build #3309 has finished for PR 15416 at commit 42f7a63.

This patch passes all tests.
This patch merges cleanly.
This patch adds no public classes.

hvanhovell · 2016-10-10T19:03:25Z

sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/analysis/Analyzer.scala

        }

-        val nonNullBitmask = x.bitmasks.reduce(_ & _)
+        val nonNullBitmask = ~ x.bitmasks.reduce(_ | _)


Bit manipulation magic is hard to follow. This is should be documented better. Could you add a line or two to explain how the bitmasks are structured?

Ok, I'll do it.

@hvanhovell @rxin comments are added

davies · 2016-10-11T03:44:38Z

sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/analysis/Analyzer.scala

+        // The left most bit in the bitmasks corresponds to the last expression in groupByAliases
+        // with 0 indicating this expression is in the grouping set. The following line of code
+        // calculates the bit mask representing the expressions that exist in all the grouping sets.
+        val nonNullBitmask = ~ x.bitmasks.reduce(_ | _)


Could you remove the '~' here, and use (nonNullBitmask & (1 << (attrLength - idx - 1))) == 1?

Do you mean ((nonNullBitmask >> (attrLength - idx - 1)) & 1) == 1? We can only test on 0 if we left shift 1, right? @davies

davies · 2016-10-11T04:04:22Z

@yangw1234 Thanks for working on this, could you also double check that all the places that use bitmasks are correct?

yangw1234 · 2016-10-11T04:16:14Z

@davies Other places all seem to be correct.

davies · 2016-10-11T04:44:01Z

sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/analysis/Analyzer.scala

+        // The rightmost bit in the bitmasks corresponds to the last expression in groupByAliases with 0
+        // indicating this expression is in the grouping set. The following line of code calculates the
+        // bitmask representing the expressions that exist in all the grouping sets (also indicated by 0).
+        val nonNullBitmask = x.bitmasks.reduce(_ | _)


Should we call this nullBitmask now? (1 means it's nullable)

done @davies

SparkQA · 2016-10-13T05:57:38Z

Test build #3337 has finished for PR 15416 at commit 69f6e4f.

This patch fails Scala style tests.
This patch does not merge cleanly.
This patch adds no public classes.

rxin · 2016-10-13T20:35:57Z

@yangw1234 can you fix the scala styles?

yangw1234 · 2016-10-14T08:20:41Z

scala style fixed.
I didn't notice. Sorry for the delay. @rxin

hvanhovell · 2016-10-14T17:49:30Z

retest this please

davies · 2016-10-14T18:03:50Z

LGTM, pending tests

SparkQA · 2016-10-14T20:11:25Z

Test build #66970 has finished for PR 15416 at commit 0ad7aba.

This patch passes all tests.
This patch merges cleanly.
This patch adds no public classes.

yangw1234 · 2016-11-05T07:04:00Z

@rxin @davies Will this patch be merged in 2.0.2? Kind of need this to upgrade our production environment. Thanks.

hvanhovell · 2016-11-05T10:36:48Z

retest this please

hvanhovell · 2016-11-05T10:37:18Z

I'll merge after a successfull test run

SparkQA · 2016-11-05T13:25:05Z

Test build #68203 has finished for PR 15416 at commit 0ad7aba.

This patch passes all tests.
This patch merges cleanly.
This patch adds no public classes.

## What changes were proposed in this pull request? Prior this pr, the following code would cause an NPE: `case class point(a:String, b:String, c:String, d: Int)` `val data = Seq( point("1","2","3", 1), point("4","5","6", 1), point("7","8","9", 1) )` `sc.parallelize(data).toDF().registerTempTable("table")` `spark.sql("select a, b, c, count(d) from table group by a, b, c GROUPING SETS ((a)) ").show()` The reason is that when the grouping_id() behavior was changed in #10677, some code (which should be changed) was left out. Take the above code for example, prior #10677, the bit mask for set "(a)" was `001`, while after #10677 the bit mask was changed to `011`. However, the `nonNullBitmask` was not changed accordingly. This pr will fix this problem. ## How was this patch tested? add integration tests Author: wangyang <[email protected]> Closes #15416 from yangw1234/groupingid. (cherry picked from commit fb0d608) Signed-off-by: Herman van Hovell <[email protected]>

hvanhovell · 2016-11-05T13:33:46Z

LGTM - Merging to master/2.1/2.0. Thanks!

## What changes were proposed in this pull request? Prior this pr, the following code would cause an NPE: `case class point(a:String, b:String, c:String, d: Int)` `val data = Seq( point("1","2","3", 1), point("4","5","6", 1), point("7","8","9", 1) )` `sc.parallelize(data).toDF().registerTempTable("table")` `spark.sql("select a, b, c, count(d) from table group by a, b, c GROUPING SETS ((a)) ").show()` The reason is that when the grouping_id() behavior was changed in apache#10677, some code (which should be changed) was left out. Take the above code for example, prior apache#10677, the bit mask for set "(a)" was `001`, while after apache#10677 the bit mask was changed to `011`. However, the `nonNullBitmask` was not changed accordingly. This pr will fix this problem. ## How was this patch tested? add integration tests Author: wangyang <[email protected]> Closes apache#15416 from yangw1234/groupingid.

wangyang added 3 commits October 10, 2016 12:19

fix grouping id bug

a6d4651

add test

ab82b21

add test

c849b6a

yangw1234 changed the title ~~[SPARK-17849] Fix NPE problem when using grouping sets~~ [SPARK-17849] [SQL] Fix NPE problem when using grouping sets Oct 10, 2016

rxin reviewed Oct 10, 2016

View reviewed changes

mv test to SQLQueryTestSuit

42f7a63

hvanhovell reviewed Oct 10, 2016

View reviewed changes

add comment

9d7f33a

davies reviewed Oct 11, 2016

View reviewed changes

address davies' comments

9893c03

davies reviewed Oct 11, 2016

View reviewed changes

refine name

69f6e4f

fix scala style

0ad7aba

asfgit closed this in fb0d608 Nov 5, 2016

[SPARK-17849] [SQL] Fix NPE problem when using grouping sets #15416

[SPARK-17849] [SQL] Fix NPE problem when using grouping sets #15416

Uh oh!

Conversation

yangw1234 commented Oct 10, 2016

What changes were proposed in this pull request?

How was this patch tested?

Uh oh!

yangw1234 commented Oct 10, 2016

Uh oh!

yangw1234 commented Oct 10, 2016

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

SparkQA commented Oct 10, 2016

Uh oh!

hvanhovell Oct 10, 2016 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

davies commented Oct 11, 2016

Uh oh!

yangw1234 commented Oct 11, 2016

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

SparkQA commented Oct 13, 2016

Uh oh!

rxin commented Oct 13, 2016

Uh oh!

yangw1234 commented Oct 14, 2016

Uh oh!

hvanhovell commented Oct 14, 2016

Uh oh!

davies commented Oct 14, 2016

Uh oh!

SparkQA commented Oct 14, 2016

Uh oh!

yangw1234 commented Nov 5, 2016

Uh oh!

hvanhovell commented Nov 5, 2016

Uh oh!

hvanhovell commented Nov 5, 2016

Uh oh!

SparkQA commented Nov 5, 2016

Uh oh!

hvanhovell commented Nov 5, 2016

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

5 participants

hvanhovell Oct 10, 2016 •

edited

Loading