Skip to content

Conversation

@hhbyyh
Copy link
Contributor

@hhbyyh hhbyyh commented May 5, 2015

jira: https://issues.apache.org/jira/browse/SPARK-7368
Add QR decomposition for RowMatrix.

I'm not sure what's the blueprint about the distributed Matrix from community and whether this will be a desirable feature , so I sent a prototype for discussion. I'll go on polish the code and provide ut and performance statistics if it's acceptable.

The implementation refers to the [paper: https://www.cs.purdue.edu/homes/dgleich/publications/Benson%202013%20-%20direct-tsqr.pdf]
Austin R. Benson, David F. Gleich, James Demmel. "Direct QR factorizations for tall-and-skinny matrices in MapReduce architectures", 2013 IEEE International Conference on Big Data, which is a stable algorithm with good scalability.

Currently I tried it on a 400000 * 500 rowMatrix (16 partitions) and it can bring down the computation time from 8.8 mins (using breeze.linalg.qr.reduced) to 2.6 mins on a 4 worker cluster. I think there will still be some room for performance improvement.

Any trial and suggestion is welcome.

@SparkQA
Copy link

SparkQA commented May 5, 2015

Test build #31874 has finished for PR 5909 at commit 39b0b22.

  • This patch passes all tests.
  • This patch merges cleanly.
  • This patch adds no public classes.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

rowMatrix -> [[RowMatrix]]

@mengxr
Copy link
Contributor

mengxr commented Jul 28, 2015

@hhbyyh I made one pass:

  1. We need unit tests.
  2. There are couple ways to compute Q. For the first PR, I think we should use the simplest approach: Q = A * inv(R). It is not numerically stable if A is ill-conditioned. But it is simple and works for most cases. Preconditioning could come in a follow-up PR. So the final block of the PR could be simplified.

I hope we can get this one in 1.5. Do you have time to address my comments (before Thursday)? Or I can send you an update directly. Thanks!

@hhbyyh
Copy link
Contributor Author

hhbyyh commented Jul 29, 2015

@mengxr Thanks a lot for the review. I'll start working according to the comments. Meanwhile, if there's anything you want to compose or update, please feel free to send it.

@mengxr
Copy link
Contributor

mengxr commented Jul 29, 2015

@hhbyyh Thanks! I will wait for your changes first:)

@SparkQA
Copy link

SparkQA commented Jul 29, 2015

Test build #38834 has finished for PR 5909 at commit 3fbdb61.

  • This patch passes all tests.
  • This patch merges cleanly.
  • This patch adds the following public classes (experimental):
    • case class QRDecomposition[UType, VType](Q: UType, R: VType)

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

space before {

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

remove :

@mengxr
Copy link
Contributor

mengxr commented Jul 30, 2015

Last batch of comments:)

@SparkQA
Copy link

SparkQA commented Jul 30, 2015

Test build #38936 has finished for PR 5909 at commit 0fb1012.

  • This patch passes all tests.
  • This patch merges cleanly.
  • This patch adds the following public classes (experimental):
    • case class QRDecomposition[UType, VType](Q: UType, R: VType)

@SparkQA
Copy link

SparkQA commented Jul 30, 2015

Test build #38951 has finished for PR 5909 at commit cec797b.

  • This patch passes all tests.
  • This patch merges cleanly.
  • This patch adds the following public classes (experimental):
    • case class QRDecomposition[UType, VType](Q: UType, R: VType)

@hhbyyh
Copy link
Contributor Author

hhbyyh commented Jul 30, 2015

cc @mengxr new update. Thanks for the careful review.

@mengxr
Copy link
Contributor

mengxr commented Jul 30, 2015

LGTM. Merged into master. Thanks! Sorry for the long delay on code review!

@asfgit asfgit closed this in d31c618 Jul 30, 2015
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants