-
Notifications
You must be signed in to change notification settings - Fork 28.9k
[SPARK-16372][MLlib] Retag RDD to tallSkinnyQR of RowMatrix #14051
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
|
Test build #61737 has finished for PR 14051 at commit
|
|
Test build #61739 has finished for PR 14051 at commit
|
| val col = numCols().toInt | ||
| // split rows horizontally into smaller matrices, and compute QR for each of them | ||
| val blockQRs = rows.glom().map { partRows => | ||
| val blockQRs = rows.retag(classOf[Vector]).glom().map { partRows => |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Where does the exception actually occur? I guess I'm surprised if this is the only place this is needed.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Since it's a known Java type erased issue (https://issues.apache.org/jira/browse/SPARK-2737), I am not sure wether to fix it or not. If leaving it as is, then Java users should aware of it and retag the JavaRDD themselves. Otherwise we fix its constructors with either retaging the rows or adding a new JavaRDD constructor. However this may not be a single sample.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I agree with fixing it, just wonder exactly where the exception arises (not the nature of the problem; I get that) to verify this is the right place to retag. It seemed a little surprising but I assume you're right.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
From my log, I can see that it arises at the glom() function. Just like the collect(), they have a similar operation (iter: Iterator[T]) => iter.toArray. So I think maybe here is the best place to call retag.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I also tried other interfaces of RowMatrix, all work good:
JavaRDD<Vector> rows = jsc.parallelize(Arrays.asList(v1, v2, v3), 1);
Matrix dm = Matrices.dense(3, 2, new double[] {1.0, 3.0, 5.0, 2.0, 4.0, 6.0});
RowMatrix mat = new RowMatrix(rows.rdd());
mat.computeGramianMatrix();
mat.columnSimilarities();
mat.columnSimilarities(0.5);
mat.computeColumnSummaryStatistics();
mat.computeCovariance();
mat.computePrincipalComponents(1);
mat.computeSVD(1, false, 1e-9);
mat.toBreeze();
mat.rows();
mat.numCols();
mat.numRows();
mat.multiply(dm);## What changes were proposed in this pull request? The following Java code because of type erasing: ```Java JavaRDD<Vector> rows = jsc.parallelize(...); RowMatrix mat = new RowMatrix(rows.rdd()); QRDecomposition<RowMatrix, Matrix> result = mat.tallSkinnyQR(true); ``` We should use retag to restore the type to prevent the following exception: ```Java java.lang.ClassCastException: [Ljava.lang.Object; cannot be cast to [Lorg.apache.spark.mllib.linalg.Vector; ``` ## How was this patch tested? Java unit test Author: Xusen Yin <[email protected]> Closes #14051 from yinxusen/SPARK-16372. (cherry picked from commit 4c6f00d) Signed-off-by: Sean Owen <[email protected]>
## What changes were proposed in this pull request? The following Java code because of type erasing: ```Java JavaRDD<Vector> rows = jsc.parallelize(...); RowMatrix mat = new RowMatrix(rows.rdd()); QRDecomposition<RowMatrix, Matrix> result = mat.tallSkinnyQR(true); ``` We should use retag to restore the type to prevent the following exception: ```Java java.lang.ClassCastException: [Ljava.lang.Object; cannot be cast to [Lorg.apache.spark.mllib.linalg.Vector; ``` ## How was this patch tested? Java unit test Author: Xusen Yin <[email protected]> Closes #14051 from yinxusen/SPARK-16372. (cherry picked from commit 4c6f00d) Signed-off-by: Sean Owen <[email protected]>
|
Merged to master/2.0/1.6. I think it's a reasonably important bug fix. |
|
This one broke branch 1.6. I just reverted it. Please resubmit a backport for branch 1.6. |
|
@zsxwing crumbs, thanks for that. It looks reasonably sure it's related, though, I still can't quite figure out how it would cause this failure: Well, maybe safest to just leave this out of 1.6 in any event |
## What changes were proposed in this pull request? The following Java code because of type erasing: ```Java JavaRDD<Vector> rows = jsc.parallelize(...); RowMatrix mat = new RowMatrix(rows.rdd()); QRDecomposition<RowMatrix, Matrix> result = mat.tallSkinnyQR(true); ``` We should use retag to restore the type to prevent the following exception: ```Java java.lang.ClassCastException: [Ljava.lang.Object; cannot be cast to [Lorg.apache.spark.mllib.linalg.Vector; ``` ## How was this patch tested? Java unit test Author: Xusen Yin <[email protected]> Closes apache#14051 from yinxusen/SPARK-16372. (cherry picked from commit 4c6f00d) Signed-off-by: Sean Owen <[email protected]> (cherry picked from commit 45dda92)
What changes were proposed in this pull request?
The following Java code because of type erasing:
We should use retag to restore the type to prevent the following exception:
How was this patch tested?
Java unit test