-
Notifications
You must be signed in to change notification settings - Fork 28.9k
[SPARK-11530] [MLLIB] Return eigenvalues with PCA model #9736
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
|
Test build #2065 has finished for PR 9736 at commit
|
|
Test build #45998 has started for PR 9736 at commit |
|
i'm gonna kill this job -- it's hanging on amp-jenkins-worker-07 and eating up machine resources. i'll kick it back off after i clean up. |
|
jenkins, test this please |
|
Test build #46014 has finished for PR 9736 at commit
|
|
Test build #2072 has finished for PR 9736 at commit
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It is not very clear from the doc whether we return the absolute variance explained or the proportions. How about a vector of proportions of variance explained by each principal component and change the method name to computePrincipalComponentsAndExplainedVariance?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Definitely, sounds good.
|
Test build #46211 has finished for PR 9736 at commit
|
|
Test build #46214 has finished for PR 9736 at commit
|
|
Test build #46419 has finished for PR 9736 at commit
|
|
Test build #46420 has finished for PR 9736 at commit
|
|
Test build #46596 has finished for PR 9736 at commit
|
|
Test build #2102 has finished for PR 9736 at commit
|
|
@mengxr what do you think about this one? it's adding some weight to the API; that's my main concern. Having the explained variance is nice though. |
|
@mengxr I'm going to push this for 1.7, not 1.6 -- thoughts? |
|
Test build #2175 has finished for PR 9736 at commit
|
|
Merged to master |
Add `computePrincipalComponentsAndVariance` to also compute PCA's explained variance. CC mengxr Author: Sean Owen <[email protected]> Closes #9736 from srowen/SPARK-11530.
|
@srowen I didn't see this PR before, but there will need to be a follow-up in order to make the model save/load backwards compatible. We can check the saved metadata for the Spark version, and then expect the explainedVariance val to be there only if the version is > 1.6. I just created [https://issues.apache.org/jira/browse/SPARK-12349] for this. |
|
@jkbradley ah, even if it's an experimental API? OK, seems simple enough. It's no problem to ignore the extra new column in old versions. If the new column isn't present, it's a little funny to figure out what the explained variance should be. Probably empty or null is the best that can be done -- or is it best to fail anyway? |
|
@srowen I agree we didn't really promise to have backwards compatibility yet, but it'd be great if we could maintain it. Good question about API. Rather than being fancy about throwing errors, I'd say just set it to be an empty Vector and note in the .load Scala docs that it will be empty for models saved using Spark 1.6. |
|
@jkbradley great, have a look at #10327 |
Add
computePrincipalComponentsAndVarianceto also compute PCA's explained variance.CC @mengxr