SPARK-8336 Fix NullPointerException with functions.rand() #6793

tedyu · 2015-06-12T22:58:52Z

This PR fixes the problem reported by Justin Yip in the thread 'NullPointerException with functions.rand()'

Tested using spark-shell and verified that the following works:
sqlContext.createDataFrame(Seq((1,2), (3, 100))).withColumn("index", rand(30)).show()

JoshRosen · 2015-06-12T23:50:56Z

Can you file a JIRA for this?

JoshRosen · 2015-06-13T02:02:47Z

Jenkins, retest this please.

squito · 2015-06-13T02:07:27Z

can you please add a test case to prevent regression

tedyu · 2015-06-13T02:17:48Z

Mind telling me which suite the new test should be added to ?

Thanks

tedyu · 2015-06-13T02:30:06Z

At first glance, none of the test suites under sql/catalyst/src/test//scala/org/apache/spark/sql seems proper for the new test.

rxin · 2015-06-13T03:10:53Z

We should create a RandomSuite.scala in expressions, and add tests for that. Take a look at other suites in that package.

tedyu · 2015-06-13T04:16:03Z

I looked at UnsafeFixedWidthAggregationMapSuite.scala in expressions package.

Is RandomSuite.scala going to test Rand and Randn only ?

A bit more hint is appreciated.

rxin · 2015-06-13T06:09:31Z

Ok you managed to pick one suite that isn't a good example. Take a look at https://github.com/apache/spark/blob/master/sql/catalyst/src/test/scala/org/apache/spark/sql/catalyst/expressions/ConditionalExpressionSuite.scala

basically use checkEvaluation function.

tedyu · 2015-06-13T16:06:52Z

I am trying to figure out how checkEvaluation should be used for the new test.

protected def checkEvaluation(
expression: Expression, expected: Any, inputRow: Row = EmptyRow): Unit = {

w.r.t. Rand(), the expected value is not deterministic.

tedyu · 2015-06-13T20:44:35Z

Looking at ArithmeticExpressionSuite.scala, it has some checks in the following form:
checkDoubleEvaluation(c1 - c2, (-0.9 +- 0.001), row)

This seems to be better fit for checking the return value from Rand()

JoshRosen · 2015-06-13T20:45:22Z

Don't we have some way of setting the RNG seed for testing?

SparkQA · 2015-06-13T22:53:15Z

Test build #34839 has finished for PR 6793 at commit 750f92c.

This patch passes all tests.
This patch merges cleanly.
This patch adds no public classes.

rxin · 2015-06-13T22:54:32Z

...yst/src/test/scala/org/apache/spark/sql/catalyst/expressions/ArithmeticExpressionSuite.scala

can we create a new test case, instead of adding it to the existing one?

I've been meaning to take the existing one apart for a while.

Also we should have a case where we explicitly set taskcontext

Looking at the tests under sql, I don't see how TaskContext is explicitly set.

Creating a new test is fine. The new test would contain a method containing one line.
Just want to make sure this is fine.

I am in Beijing now.
Except for difficulty of accessing gmail, github is quite slow as well.

SparkQA · 2015-06-15T23:44:04Z

Test build #34961 has finished for PR 6793 at commit 62fd97b.

This patch passes all tests.
This patch merges cleanly.
This patch adds no public classes.

rxin · 2015-06-16T00:00:00Z

Thanks. Merging in master & branch-1.4.

This PR fixes the problem reported by Justin Yip in the thread 'NullPointerException with functions.rand()' Tested using spark-shell and verified that the following works: sqlContext.createDataFrame(Seq((1,2), (3, 100))).withColumn("index", rand(30)).show() Author: tedyu <[email protected]> Closes #6793 from tedyu/master and squashes the following commits: 62fd97b [tedyu] Create RandomSuite 750f92c [tedyu] Add test for Rand() with seed a1d66c5 [tedyu] Fix NullPointerException with functions.rand() (cherry picked from commit 1a62d61) Signed-off-by: Reynold Xin <[email protected]>

punya · 2015-06-16T15:40:18Z

@rxin it looks like the branch-1.4 cherry-pick of this commit broke a unit test, because it relies on ExpressionEvalHelper (which is absent on 1.4 afaik). I found this out because I was trying to backport an unrelated docs fix to branch-1.4. Any idea why the automated tests didn't catch this?

punya · 2015-06-16T15:56:03Z

Here's the relevant bit of the log (from testing #6842):

[error] /home/jenkins/workspace/SparkPullRequestBuilder/sql/catalyst/src/test/scala/org/apache/spark/sql/catalyst/expressions/RandomSuite.scala:27: not found: type ExpressionEvalHelper
[error] class RandomSuite extends SparkFunSuite with ExpressionEvalHelper {
[error]                                              ^
[error] /home/jenkins/workspace/SparkPullRequestBuilder/sql/catalyst/src/test/scala/org/apache/spark/sql/catalyst/expressions/RandomSuite.scala:30: not found: value create_row
[error]     val row = create_row(1.1, 2.0, 3.1, null)
[error]               ^
[error] /home/jenkins/workspace/SparkPullRequestBuilder/sql/catalyst/src/test/scala/org/apache/spark/sql/catalyst/expressions/RandomSuite.scala:31: not found: value checkDoubleEvaluation
[error]     checkDoubleEvaluation(Rand(30), (0.7363714192755834 +- 0.001), row)
[error]     ^

rxin · 2015-06-16T16:47:59Z

Jenkins only run against master. Do you mind submitting a fix against branch-1.4 for this? I will merge it.

rxin this is the fix you requested for the break introduced by backporting #6793 Author: Punya Biswal <[email protected]> Closes #6850 from punya/feature/fix-backport-break and squashes the following commits: fdc3693 [Punya Biswal] Fix break introduced by backport

This PR fixes the problem reported by Justin Yip in the thread 'NullPointerException with functions.rand()' Tested using spark-shell and verified that the following works: sqlContext.createDataFrame(Seq((1,2), (3, 100))).withColumn("index", rand(30)).show() Author: tedyu <[email protected]> Closes apache#6793 from tedyu/master and squashes the following commits: 62fd97b [tedyu] Create RandomSuite 750f92c [tedyu] Add test for Rand() with seed a1d66c5 [tedyu] Fix NullPointerException with functions.rand()

Fix NullPointerException with functions.rand()

a1d66c5

tedyu changed the title ~~Fix NullPointerException with functions.rand()~~ SPARK-8336 Fix NullPointerException with functions.rand() Jun 12, 2015

Add test for Rand() with seed

750f92c

rxin reviewed Jun 13, 2015
View reviewed changes

Create RandomSuite

62fd97b

asfgit closed this in 1a62d61 Jun 16, 2015

punya mentioned this pull request Jun 17, 2015

Fix break introduced by backport #6850

Closed

SPARK-8336 Fix NullPointerException with functions.rand() #6793

SPARK-8336 Fix NullPointerException with functions.rand() #6793

Uh oh!

Conversation

tedyu commented Jun 12, 2015

Uh oh!

JoshRosen commented Jun 12, 2015

Uh oh!

JoshRosen commented Jun 13, 2015

Uh oh!

squito commented Jun 13, 2015

Uh oh!

tedyu commented Jun 13, 2015

Uh oh!

tedyu commented Jun 13, 2015

Uh oh!

rxin commented Jun 13, 2015

Uh oh!

tedyu commented Jun 13, 2015

Uh oh!

rxin commented Jun 13, 2015

Uh oh!

tedyu commented Jun 13, 2015

Uh oh!

tedyu commented Jun 13, 2015

Uh oh!

JoshRosen commented Jun 13, 2015

Uh oh!

SparkQA commented Jun 13, 2015

Uh oh!

rxin Jun 13, 2015

Choose a reason for hiding this comment

Uh oh!

rxin Jun 13, 2015

Choose a reason for hiding this comment

Uh oh!

tedyu Jun 14, 2015

Choose a reason for hiding this comment

Uh oh!

tedyu Jun 14, 2015

Choose a reason for hiding this comment

Uh oh!

SparkQA commented Jun 15, 2015

Uh oh!

rxin commented Jun 16, 2015

Uh oh!

punya commented Jun 16, 2015

Uh oh!

punya commented Jun 16, 2015

Uh oh!

rxin commented Jun 16, 2015

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

6 participants