Skip to content

Conversation

@MechCoder
Copy link
Contributor

The densities in KernelDensity are scaled down by
(number of parallel processes X number of points). It should be just no.of samples. This results in broken tests in KernelDensitySuite which haven't been tested properly.

The densities in KernelDensity are scaled down by
(number of parallel processes X number of points). This results in broken tests
in KernelDensitySuite which haven't been tested properly.
@MechCoder MechCoder changed the title [SPARK-7844] Fix broken tests in KernelDensity [SPARK-7844] [MLlib] Fix broken tests in KernelDensity May 24, 2015
@MechCoder
Copy link
Contributor Author

Note: In master

import org.apache.commons.math3.distribution.NormalDistribution
import org.apache.spark.mllib.stat.KernelDensity

val rdd = sc.parallelize(Array(5.0))
val evaluationPoints = Array(5.0, 6.0)
val densities = new KernelDensity().setSample(rdd).setBandwidth(3.0).estimate(evaluationPoints)
val normal = new NormalDistribution(5.0, 3.0)

densities(0) - normal.density(5.0)
res1: Double = -0.06649038006690546

val rdd = sc.parallelize(Array(5.0, 10.0))
val evaluationPoints = Array(5.0, 6.0)
val densities = new KernelDensity().setSample(rdd).setBandwidth(3.0).estimate(evaluationPoints)
val normal1 = new NormalDistribution(5.0, 3.0)
val normal2 = new NormalDistribution(10.0, 3.0)


densities(0) - (normal1.density(5.0) + normal2.density(5.0)) / 2
res2: Double = -0.04153495159951512

Hence the tests pass

cc @jkbradley @mengxr

@SparkQA
Copy link

SparkQA commented May 24, 2015

Test build #33426 has finished for PR 6383 at commit a92fe50.

  • This patch passes all tests.
  • This patch merges cleanly.
  • This patch adds no public classes.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Please aggregate densities and count in a single pass.

@MechCoder
Copy link
Contributor Author

@mengxr fixed!

@MechCoder
Copy link
Contributor Author

Also, we might probably want to generalize this n dimensions (Some code can be borrowed from the GaussianMixture ) and add a user guide. Is anyone actively working on these?

@SparkQA
Copy link

SparkQA commented May 26, 2015

Test build #33501 has finished for PR 6383 at commit 9b8ed50.

  • This patch passes all tests.
  • This patch merges cleanly.
  • This patch adds no public classes.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

minor: Math is deprecated. Use math instead.

@mengxr
Copy link
Contributor

mengxr commented May 26, 2015

LGTM. Also ping @sryza for verification.

@SparkQA
Copy link

SparkQA commented May 26, 2015

Test build #33530 has finished for PR 6383 at commit ab81302.

  • This patch passes all tests.
  • This patch merges cleanly.
  • This patch adds no public classes.

@asfgit asfgit closed this in 6166473 May 26, 2015
asfgit pushed a commit that referenced this pull request May 26, 2015
The densities in KernelDensity are scaled down by
(number of parallel processes X number of points). It should be just no.of samples. This results in broken tests in KernelDensitySuite which haven't been tested properly.

Author: MechCoder <[email protected]>

Closes #6383 from MechCoder/spark-7844 and squashes the following commits:

ab81302 [MechCoder] Math->math
9b8ed50 [MechCoder] Make one pass to update count
a92fe50 [MechCoder] [SPARK-7844] Fix broken tests in KernelDensity

(cherry picked from commit 6166473)
Signed-off-by: Xiangrui Meng <[email protected]>
@mengxr
Copy link
Contributor

mengxr commented May 26, 2015

Merged into master and branch-1.4. Please update #6387 with a doctest. Thanks!

@MechCoder MechCoder deleted the spark-7844 branch May 27, 2015 05:10
jeanlyn pushed a commit to jeanlyn/spark that referenced this pull request Jun 12, 2015
The densities in KernelDensity are scaled down by
(number of parallel processes X number of points). It should be just no.of samples. This results in broken tests in KernelDensitySuite which haven't been tested properly.

Author: MechCoder <[email protected]>

Closes apache#6383 from MechCoder/spark-7844 and squashes the following commits:

ab81302 [MechCoder] Math->math
9b8ed50 [MechCoder] Make one pass to update count
a92fe50 [MechCoder] [SPARK-7844] Fix broken tests in KernelDensity
nemccarthy pushed a commit to nemccarthy/spark that referenced this pull request Jun 19, 2015
The densities in KernelDensity are scaled down by
(number of parallel processes X number of points). It should be just no.of samples. This results in broken tests in KernelDensitySuite which haven't been tested properly.

Author: MechCoder <[email protected]>

Closes apache#6383 from MechCoder/spark-7844 and squashes the following commits:

ab81302 [MechCoder] Math->math
9b8ed50 [MechCoder] Make one pass to update count
a92fe50 [MechCoder] [SPARK-7844] Fix broken tests in KernelDensity
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants