Skip to content

Conversation

@tedyu
Copy link
Contributor

@tedyu tedyu commented Jun 12, 2015

This PR fixes the problem reported by Justin Yip in the thread 'NullPointerException with functions.rand()'

Tested using spark-shell and verified that the following works:
sqlContext.createDataFrame(Seq((1,2), (3, 100))).withColumn("index", rand(30)).show()

@JoshRosen
Copy link
Contributor

Can you file a JIRA for this?

@tedyu tedyu changed the title Fix NullPointerException with functions.rand() SPARK-8336 Fix NullPointerException with functions.rand() Jun 12, 2015
@JoshRosen
Copy link
Contributor

Jenkins, retest this please.

@squito
Copy link
Contributor

squito commented Jun 13, 2015

can you please add a test case to prevent regression

@tedyu
Copy link
Contributor Author

tedyu commented Jun 13, 2015

Mind telling me which suite the new test should be added to ?

Thanks

@tedyu
Copy link
Contributor Author

tedyu commented Jun 13, 2015

At first glance, none of the test suites under sql/catalyst/src/test//scala/org/apache/spark/sql seems proper for the new test.

@rxin
Copy link
Contributor

rxin commented Jun 13, 2015

We should create a RandomSuite.scala in expressions, and add tests for that. Take a look at other suites in that package.

@tedyu
Copy link
Contributor Author

tedyu commented Jun 13, 2015

I looked at UnsafeFixedWidthAggregationMapSuite.scala in expressions package.

Is RandomSuite.scala going to test Rand and Randn only ?

A bit more hint is appreciated.

@rxin
Copy link
Contributor

rxin commented Jun 13, 2015

Ok you managed to pick one suite that isn't a good example. Take a look at https://github.com/apache/spark/blob/master/sql/catalyst/src/test/scala/org/apache/spark/sql/catalyst/expressions/ConditionalExpressionSuite.scala

basically use checkEvaluation function.

@tedyu
Copy link
Contributor Author

tedyu commented Jun 13, 2015

I am trying to figure out how checkEvaluation should be used for the new test.

protected def checkEvaluation(
expression: Expression, expected: Any, inputRow: Row = EmptyRow): Unit = {

w.r.t. Rand(), the expected value is not deterministic.

@tedyu
Copy link
Contributor Author

tedyu commented Jun 13, 2015

Looking at ArithmeticExpressionSuite.scala, it has some checks in the following form:
checkDoubleEvaluation(c1 - c2, (-0.9 +- 0.001), row)

This seems to be better fit for checking the return value from Rand()

@JoshRosen
Copy link
Contributor

Don't we have some way of setting the RNG seed for testing?

@SparkQA
Copy link

SparkQA commented Jun 13, 2015

Test build #34839 has finished for PR 6793 at commit 750f92c.

  • This patch passes all tests.
  • This patch merges cleanly.
  • This patch adds no public classes.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

can we create a new test case, instead of adding it to the existing one?

I've been meaning to take the existing one apart for a while.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Also we should have a case where we explicitly set taskcontext

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looking at the tests under sql, I don't see how TaskContext is explicitly set.

Creating a new test is fine. The new test would contain a method containing one line.
Just want to make sure this is fine.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I am in Beijing now.
Except for difficulty of accessing gmail, github is quite slow as well.

@SparkQA
Copy link

SparkQA commented Jun 15, 2015

Test build #34961 has finished for PR 6793 at commit 62fd97b.

  • This patch passes all tests.
  • This patch merges cleanly.
  • This patch adds no public classes.

@rxin
Copy link
Contributor

rxin commented Jun 16, 2015

Thanks. Merging in master & branch-1.4.

asfgit pushed a commit that referenced this pull request Jun 16, 2015
This PR fixes the problem reported by Justin Yip in the thread 'NullPointerException with functions.rand()'

Tested using spark-shell and verified that the following works:
sqlContext.createDataFrame(Seq((1,2), (3, 100))).withColumn("index", rand(30)).show()

Author: tedyu <[email protected]>

Closes #6793 from tedyu/master and squashes the following commits:

62fd97b [tedyu] Create RandomSuite
750f92c [tedyu] Add test for Rand() with seed
a1d66c5 [tedyu] Fix NullPointerException with functions.rand()

(cherry picked from commit 1a62d61)
Signed-off-by: Reynold Xin <[email protected]>
@asfgit asfgit closed this in 1a62d61 Jun 16, 2015
@punya
Copy link
Contributor

punya commented Jun 16, 2015

@rxin it looks like the branch-1.4 cherry-pick of this commit broke a unit test, because it relies on ExpressionEvalHelper (which is absent on 1.4 afaik). I found this out because I was trying to backport an unrelated docs fix to branch-1.4. Any idea why the automated tests didn't catch this?

@punya
Copy link
Contributor

punya commented Jun 16, 2015

Here's the relevant bit of the log (from testing #6842):

[error] /home/jenkins/workspace/SparkPullRequestBuilder/sql/catalyst/src/test/scala/org/apache/spark/sql/catalyst/expressions/RandomSuite.scala:27: not found: type ExpressionEvalHelper
[error] class RandomSuite extends SparkFunSuite with ExpressionEvalHelper {
[error]                                              ^
[error] /home/jenkins/workspace/SparkPullRequestBuilder/sql/catalyst/src/test/scala/org/apache/spark/sql/catalyst/expressions/RandomSuite.scala:30: not found: value create_row
[error]     val row = create_row(1.1, 2.0, 3.1, null)
[error]               ^
[error] /home/jenkins/workspace/SparkPullRequestBuilder/sql/catalyst/src/test/scala/org/apache/spark/sql/catalyst/expressions/RandomSuite.scala:31: not found: value checkDoubleEvaluation
[error]     checkDoubleEvaluation(Rand(30), (0.7363714192755834 +- 0.001), row)
[error]     ^

@rxin
Copy link
Contributor

rxin commented Jun 16, 2015

Jenkins only run against master. Do you mind submitting a fix against branch-1.4 for this? I will merge it.

asfgit pushed a commit that referenced this pull request Jun 17, 2015
rxin this is the fix you requested for the break introduced by backporting #6793

Author: Punya Biswal <[email protected]>

Closes #6850 from punya/feature/fix-backport-break and squashes the following commits:

fdc3693 [Punya Biswal] Fix break introduced by backport
nemccarthy pushed a commit to nemccarthy/spark that referenced this pull request Jun 19, 2015
This PR fixes the problem reported by Justin Yip in the thread 'NullPointerException with functions.rand()'

Tested using spark-shell and verified that the following works:
sqlContext.createDataFrame(Seq((1,2), (3, 100))).withColumn("index", rand(30)).show()

Author: tedyu <[email protected]>

Closes apache#6793 from tedyu/master and squashes the following commits:

62fd97b [tedyu] Create RandomSuite
750f92c [tedyu] Add test for Rand() with seed
a1d66c5 [tedyu] Fix NullPointerException with functions.rand()
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

6 participants