-
Notifications
You must be signed in to change notification settings - Fork 28.9k
[SPARK-7368][MLlib] Add QR decomposition for RowMatrix #5909
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
|
Test build #31874 has finished for PR 5909 at commit
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
rowMatrix -> [[RowMatrix]]
|
@hhbyyh I made one pass:
I hope we can get this one in 1.5. Do you have time to address my comments (before Thursday)? Or I can send you an update directly. Thanks! |
|
@mengxr Thanks a lot for the review. I'll start working according to the comments. Meanwhile, if there's anything you want to compose or update, please feel free to send it. |
|
@hhbyyh Thanks! I will wait for your changes first:) |
|
Test build #38834 has finished for PR 5909 at commit
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
space before {
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
remove :
|
Last batch of comments:) |
|
Test build #38936 has finished for PR 5909 at commit
|
|
Test build #38951 has finished for PR 5909 at commit
|
|
cc @mengxr new update. Thanks for the careful review. |
|
LGTM. Merged into master. Thanks! Sorry for the long delay on code review! |
jira: https://issues.apache.org/jira/browse/SPARK-7368
Add QR decomposition for RowMatrix.
I'm not sure what's the blueprint about the distributed Matrix from community and whether this will be a desirable feature , so I sent a prototype for discussion. I'll go on polish the code and provide ut and performance statistics if it's acceptable.
The implementation refers to the [paper: https://www.cs.purdue.edu/homes/dgleich/publications/Benson%202013%20-%20direct-tsqr.pdf]
Austin R. Benson, David F. Gleich, James Demmel. "Direct QR factorizations for tall-and-skinny matrices in MapReduce architectures", 2013 IEEE International Conference on Big Data, which is a stable algorithm with good scalability.
Currently I tried it on a 400000 * 500 rowMatrix (16 partitions) and it can bring down the computation time from 8.8 mins (using breeze.linalg.qr.reduced) to 2.6 mins on a 4 worker cluster. I think there will still be some room for performance improvement.
Any trial and suggestion is welcome.