Skip to content

Conversation

@srowen
Copy link
Member

@srowen srowen commented Nov 16, 2015

Add computePrincipalComponentsAndVariance to also compute PCA's explained variance.

CC @mengxr

@SparkQA
Copy link

SparkQA commented Nov 16, 2015

Test build #2065 has finished for PR 9736 at commit e0b26c4.

  • This patch fails MiMa tests.
  • This patch merges cleanly.
  • This patch adds no public classes.

@SparkQA
Copy link

SparkQA commented Nov 16, 2015

Test build #45998 has started for PR 9736 at commit 253235a.

@shaneknapp
Copy link
Contributor

i'm gonna kill this job -- it's hanging on amp-jenkins-worker-07 and eating up machine resources. i'll kick it back off after i clean up.

@shaneknapp
Copy link
Contributor

jenkins, test this please

@SparkQA
Copy link

SparkQA commented Nov 16, 2015

Test build #46014 has finished for PR 9736 at commit 253235a.

  • This patch fails Spark unit tests.
  • This patch merges cleanly.
  • This patch adds no public classes.

@SparkQA
Copy link

SparkQA commented Nov 17, 2015

Test build #2072 has finished for PR 9736 at commit 253235a.

  • This patch passes all tests.
  • This patch merges cleanly.
  • This patch adds no public classes.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It is not very clear from the doc whether we return the absolute variance explained or the proportions. How about a vector of proportions of variance explained by each principal component and change the method name to computePrincipalComponentsAndExplainedVariance?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Definitely, sounds good.

@SparkQA
Copy link

SparkQA commented Nov 18, 2015

Test build #46211 has finished for PR 9736 at commit e290c11.

  • This patch fails Spark unit tests.
  • This patch merges cleanly.
  • This patch adds no public classes.

@SparkQA
Copy link

SparkQA commented Nov 18, 2015

Test build #46214 has finished for PR 9736 at commit c012276.

  • This patch passes all tests.
  • This patch merges cleanly.
  • This patch adds no public classes.

@SparkQA
Copy link

SparkQA commented Nov 20, 2015

Test build #46419 has finished for PR 9736 at commit 0291a30.

  • This patch fails to build.
  • This patch merges cleanly.
  • This patch adds no public classes.

@SparkQA
Copy link

SparkQA commented Nov 20, 2015

Test build #46420 has finished for PR 9736 at commit 011e674.

  • This patch passes all tests.
  • This patch merges cleanly.
  • This patch adds no public classes.

@SparkQA
Copy link

SparkQA commented Nov 24, 2015

Test build #46596 has finished for PR 9736 at commit a75bdd0.

  • This patch fails PySpark unit tests.
  • This patch merges cleanly.
  • This patch adds no public classes.

@SparkQA
Copy link

SparkQA commented Nov 24, 2015

Test build #2102 has finished for PR 9736 at commit a75bdd0.

  • This patch passes all tests.
  • This patch merges cleanly.
  • This patch adds no public classes.

@srowen
Copy link
Member Author

srowen commented Nov 25, 2015

@mengxr what do you think about this one? it's adding some weight to the API; that's my main concern. Having the explained variance is nice though.

@srowen
Copy link
Member Author

srowen commented Dec 1, 2015

@mengxr I'm going to push this for 1.7, not 1.6 -- thoughts?

@SparkQA
Copy link

SparkQA commented Dec 6, 2015

Test build #2175 has finished for PR 9736 at commit a75bdd0.

  • This patch passes all tests.
  • This patch merges cleanly.
  • This patch adds no public classes.

@srowen
Copy link
Member Author

srowen commented Dec 10, 2015

Merged to master

@srowen srowen closed this Dec 10, 2015
@srowen srowen deleted the SPARK-11530 branch December 10, 2015 14:06
asfgit pushed a commit that referenced this pull request Dec 10, 2015
Add `computePrincipalComponentsAndVariance` to also compute PCA's explained variance.

CC mengxr

Author: Sean Owen <[email protected]>

Closes #9736 from srowen/SPARK-11530.
@jkbradley
Copy link
Member

@srowen I didn't see this PR before, but there will need to be a follow-up in order to make the model save/load backwards compatible. We can check the saved metadata for the Spark version, and then expect the explainedVariance val to be there only if the version is > 1.6.

I just created [https://issues.apache.org/jira/browse/SPARK-12349] for this.

@srowen
Copy link
Member Author

srowen commented Dec 16, 2015

@jkbradley ah, even if it's an experimental API? OK, seems simple enough. It's no problem to ignore the extra new column in old versions. If the new column isn't present, it's a little funny to figure out what the explained variance should be. Probably empty or null is the best that can be done -- or is it best to fail anyway?

@jkbradley
Copy link
Member

@srowen I agree we didn't really promise to have backwards compatibility yet, but it'd be great if we could maintain it.

Good question about API. Rather than being fancy about throwing errors, I'd say just set it to be an empty Vector and note in the .load Scala docs that it will be empty for models saved using Spark 1.6.

@srowen
Copy link
Member Author

srowen commented Dec 17, 2015

@jkbradley great, have a look at #10327

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

5 participants