Skip to content

Conversation

@tgaloppo
Copy link
Contributor

@tgaloppo tgaloppo commented Jan 7, 2015

Moving MutlivariateGaussian from private[mllib] to public. The class uses Breeze vectors internally, so this involves creating a public interface using MLlib vectors and matrices.

This initial commit provides public construction, accessors for mean/covariance, density and log-density.

Other potential methods include entropy and sample generation.

MultivariateGaussian.scala - Made class public and exposed public methods leveraging MLlib vectors and matrices. Added logpdf method providing log-density calculation.

MultivariateGaussianSuite.scala - Test are now performed through the public methods.
@SparkQA
Copy link

SparkQA commented Jan 7, 2015

Test build #25135 has started for PR 3923 at commit 0943dc4.

  • This patch merges cleanly.

@SparkQA
Copy link

SparkQA commented Jan 7, 2015

Test build #25135 has finished for PR 3923 at commit 0943dc4.

  • This patch passes all tests.
  • This patch merges cleanly.
  • This patch adds no public classes.

@AmplabJenkins
Copy link

Test PASSed.
Refer to this link for build results (access rights to CI server needed):
https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/25135/
Test PASSed.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'd add a check here to make sure that the dimensions of mu, sigma match up.

@jkbradley
Copy link
Member

@tgaloppo It's looking good. Mostly small comments, except for the first 2. Your call about naming mu,sigma vs. mean,covariance.

Modified calculations to avoid log(pow(x,y)) calculations
@SparkQA
Copy link

SparkQA commented Jan 8, 2015

Test build #25187 has started for PR 3923 at commit 8c35381.

  • This patch merges cleanly.

@SparkQA
Copy link

SparkQA commented Jan 8, 2015

Test build #25190 has started for PR 3923 at commit 91a5fae.

  • This patch merges cleanly.

@tgaloppo
Copy link
Contributor Author

tgaloppo commented Jan 8, 2015

@jkbradley Thanks! I have made the requested changes. Are there any other public methods that you think would be useful to add at this time?

@SparkQA
Copy link

SparkQA commented Jan 8, 2015

Test build #25187 has finished for PR 3923 at commit 8c35381.

  • This patch passes all tests.
  • This patch merges cleanly.
  • This patch adds no public classes.

@AmplabJenkins
Copy link

Test PASSed.
Refer to this link for build results (access rights to CI server needed):
https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/25187/
Test PASSed.

@SparkQA
Copy link

SparkQA commented Jan 8, 2015

Test build #25190 has finished for PR 3923 at commit 91a5fae.

  • This patch passes all tests.
  • This patch merges cleanly.
  • This patch adds no public classes.

@AmplabJenkins
Copy link

Test PASSed.
Refer to this link for build results (access rights to CI server needed):
https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/25190/
Test PASSed.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is fine, but you can also use require for checking input arguments (since that throws IllegalArgumentException and is a bit shorter).

@jkbradley
Copy link
Member

@tgaloppo Two small comments, but they are just superficial. After those, I think it will be ready. I don't see a need to add other methods for now. Thanks!

@tgaloppo
Copy link
Contributor Author

tgaloppo commented Jan 8, 2015

@jkbradley Changes made. Thanks!

@SparkQA
Copy link

SparkQA commented Jan 8, 2015

Test build #25261 has started for PR 3923 at commit 9fa3bb7.

  • This patch merges cleanly.

@jkbradley
Copy link
Member

@tgaloppo Thanks!

LGTM pending Jenkins tests

CC: @mengxr

@SparkQA
Copy link

SparkQA commented Jan 8, 2015

Test build #25261 has finished for PR 3923 at commit 9fa3bb7.

  • This patch passes all tests.
  • This patch merges cleanly.
  • This patch adds no public classes.

@AmplabJenkins
Copy link

Test PASSed.
Refer to this link for build results (access rights to CI server needed):
https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/25261/
Test PASSed.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Remove space after { and before }.

Moved MultivariateGaussian (and test suite) from stat.impl to stat.distribution (required updates in GaussianMixture{EM,Model}.scala)
Marked MultivariateGaussian as @DeveloperAPI
Fixed style error
@SparkQA
Copy link

SparkQA commented Jan 9, 2015

Test build #25339 has started for PR 3923 at commit e30a100.

  • This patch does not merge cleanly.

@SparkQA
Copy link

SparkQA commented Jan 9, 2015

Test build #25339 has finished for PR 3923 at commit e30a100.

  • This patch passes all tests.
  • This patch does not merge cleanly.
  • This patch adds no public classes.

@AmplabJenkins
Copy link

Test PASSed.
Refer to this link for build results (access rights to CI server needed):
https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/25339/
Test PASSed.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

sort import alphabetically

@mengxr
Copy link
Contributor

mengxr commented Jan 10, 2015

@tgaloppo Besides inline comments, please resolve conflicts with the master branch. The patch does not merge cleanly.

Conflicts:
	mllib/src/main/scala/org/apache/spark/mllib/clustering/GaussianMixtureEM.scala
@SparkQA
Copy link

SparkQA commented Jan 10, 2015

Test build #25360 has started for PR 3923 at commit b4121b4.

  • This patch merges cleanly.

@tgaloppo
Copy link
Contributor Author

I have made the requested changes and resolved the merge conflicts.

Question: MutlivariateGuassian now keeps a private Breeze version of the mean vector rather than convert the MLlib version to a Breeze vector with each call to {log}pdf(). Is this worthwhile? Or is the .toBreeze.toDenseVector sequence negligible enough that this is unnecessary?

CC: @mengxr @jkbradley

@SparkQA
Copy link

SparkQA commented Jan 10, 2015

Test build #25360 has finished for PR 3923 at commit b4121b4.

  • This patch passes all tests.
  • This patch merges cleanly.
  • This patch adds no public classes.

@AmplabJenkins
Copy link

Test PASSed.
Refer to this link for build results (access rights to CI server needed):
https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/25360/
Test PASSed.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Organize imports (scala/java come first)

@jkbradley
Copy link
Member

Thanks for the updates! LGTM except for organizing the imports in my 1 comment.

As far as I understand, the conversions to and from Breeze should be efficient since they mainly involve copying references, not the underlying data (as long as the representations remain the same: dense to dense, or sparse to sparse).

@SparkQA
Copy link

SparkQA commented Jan 11, 2015

Test build #25367 has started for PR 3923 at commit 2b15587.

  • This patch merges cleanly.

@tgaloppo
Copy link
Contributor Author

Thanks, @jkbradley
I have made the style correction.

@SparkQA
Copy link

SparkQA commented Jan 11, 2015

Test build #25367 has finished for PR 3923 at commit 2b15587.

  • This patch passes all tests.
  • This patch merges cleanly.
  • This patch adds no public classes.

@AmplabJenkins
Copy link

Test PASSed.
Refer to this link for build results (access rights to CI server needed):
https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/25367/
Test PASSed.

@mengxr
Copy link
Contributor

mengxr commented Jan 12, 2015

Merged into master. Thanks!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

5 participants