Skip to content

Conversation

@yinxusen
Copy link
Contributor

@yinxusen yinxusen commented Jul 5, 2016

What changes were proposed in this pull request?

The following Java code because of type erasing:

JavaRDD<Vector> rows = jsc.parallelize(...);
RowMatrix mat = new RowMatrix(rows.rdd());
QRDecomposition<RowMatrix, Matrix> result = mat.tallSkinnyQR(true);

We should use retag to restore the type to prevent the following exception:

java.lang.ClassCastException: [Ljava.lang.Object; cannot be cast to [Lorg.apache.spark.mllib.linalg.Vector;

How was this patch tested?

Java unit test

@SparkQA
Copy link

SparkQA commented Jul 5, 2016

Test build #61737 has finished for PR 14051 at commit 82b4edd.

  • This patch fails PySpark unit tests.
  • This patch merges cleanly.
  • This patch adds no public classes.

@SparkQA
Copy link

SparkQA commented Jul 5, 2016

Test build #61739 has finished for PR 14051 at commit 0acd1e0.

  • This patch passes all tests.
  • This patch merges cleanly.
  • This patch adds no public classes.

@yinxusen yinxusen changed the title [SPARK-16372][MLlib] RowMatrix constructor should use retag for Java compatibility [SPARK-16372][MLlib] Retag RDD to tallSkinnyQR of RowMatrix Jul 5, 2016
val col = numCols().toInt
// split rows horizontally into smaller matrices, and compute QR for each of them
val blockQRs = rows.glom().map { partRows =>
val blockQRs = rows.retag(classOf[Vector]).glom().map { partRows =>
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Where does the exception actually occur? I guess I'm surprised if this is the only place this is needed.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Since it's a known Java type erased issue (https://issues.apache.org/jira/browse/SPARK-2737), I am not sure wether to fix it or not. If leaving it as is, then Java users should aware of it and retag the JavaRDD themselves. Otherwise we fix its constructors with either retaging the rows or adding a new JavaRDD constructor. However this may not be a single sample.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I agree with fixing it, just wonder exactly where the exception arises (not the nature of the problem; I get that) to verify this is the right place to retag. It seemed a little surprising but I assume you're right.

Copy link
Contributor Author

@yinxusen yinxusen Jul 5, 2016

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

From my log, I can see that it arises at the glom() function. Just like the collect(), they have a similar operation (iter: Iterator[T]) => iter.toArray. So I think maybe here is the best place to call retag.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I also tried other interfaces of RowMatrix, all work good:

JavaRDD<Vector> rows = jsc.parallelize(Arrays.asList(v1, v2, v3), 1);
Matrix dm = Matrices.dense(3, 2, new double[] {1.0, 3.0, 5.0, 2.0, 4.0, 6.0});
RowMatrix mat = new RowMatrix(rows.rdd());

mat.computeGramianMatrix();
mat.columnSimilarities();
mat.columnSimilarities(0.5);
mat.computeColumnSummaryStatistics();
mat.computeCovariance();
mat.computePrincipalComponents(1);
mat.computeSVD(1, false, 1e-9);
mat.toBreeze();
mat.rows();
mat.numCols();
mat.numRows();
mat.multiply(dm);

asfgit pushed a commit that referenced this pull request Jul 7, 2016
## What changes were proposed in this pull request?

The following Java code because of type erasing:

```Java
JavaRDD<Vector> rows = jsc.parallelize(...);
RowMatrix mat = new RowMatrix(rows.rdd());
QRDecomposition<RowMatrix, Matrix> result = mat.tallSkinnyQR(true);
```

We should use retag to restore the type to prevent the following exception:

```Java
java.lang.ClassCastException: [Ljava.lang.Object; cannot be cast to [Lorg.apache.spark.mllib.linalg.Vector;
```

## How was this patch tested?

Java unit test

Author: Xusen Yin <[email protected]>

Closes #14051 from yinxusen/SPARK-16372.

(cherry picked from commit 4c6f00d)
Signed-off-by: Sean Owen <[email protected]>
asfgit pushed a commit that referenced this pull request Jul 7, 2016
## What changes were proposed in this pull request?

The following Java code because of type erasing:

```Java
JavaRDD<Vector> rows = jsc.parallelize(...);
RowMatrix mat = new RowMatrix(rows.rdd());
QRDecomposition<RowMatrix, Matrix> result = mat.tallSkinnyQR(true);
```

We should use retag to restore the type to prevent the following exception:

```Java
java.lang.ClassCastException: [Ljava.lang.Object; cannot be cast to [Lorg.apache.spark.mllib.linalg.Vector;
```

## How was this patch tested?

Java unit test

Author: Xusen Yin <[email protected]>

Closes #14051 from yinxusen/SPARK-16372.

(cherry picked from commit 4c6f00d)
Signed-off-by: Sean Owen <[email protected]>
@srowen
Copy link
Member

srowen commented Jul 7, 2016

Merged to master/2.0/1.6. I think it's a reasonably important bug fix.

@asfgit asfgit closed this in 4c6f00d Jul 7, 2016
@zsxwing
Copy link
Member

zsxwing commented Jul 7, 2016

This one broke branch 1.6. I just reverted it. Please resubmit a backport for branch 1.6.

@srowen
Copy link
Member

srowen commented Jul 7, 2016

@zsxwing crumbs, thanks for that. It looks reasonably sure it's related, though, I still can't quite figure out how it would cause this failure:

[error] /home/jenkins/workspace/spark-branch-1.6-compile-maven-scala-2.11/mllib/src/test/java/org/apache/spark/mllib/linalg/distributed/JavaRowMatrixSuite.java:24: error: cannot find symbol
[error] import org.apache.spark.SharedSparkSession;
[error]                        ^
[error]   symbol:   class SharedSparkSession
[error]   location: package org.apache.spark
[error] /home/jenkins/workspace/spark-branch-1.6-compile-maven-scala-2.11/mllib/src/test/java/org/apache/spark/mllib/linalg/distributed/JavaRowMatrixSuite.java:31: error: cannot find symbol
[error] public class JavaRowMatrixSuite extends SharedSparkSession {
[error]                                         ^
[error]   symbol: class SharedSparkSession
[error] /home/jenkins/workspace/spark-branch-1.6-compile-maven-scala-2.11/mllib/src/test/java/org/apache/spark/mllib/linalg/distributed/JavaRowMatrixSuite.java:39: error: cannot find symbol
[error]     JavaRDD<Vector> rows = jsc.parallelize(Arrays.asList(v1, v2, v3), 1);
[error]                            ^

Well, maybe safest to just leave this out of 1.6 in any event

zzcclp pushed a commit to zzcclp/spark that referenced this pull request Jul 8, 2016
## What changes were proposed in this pull request?

The following Java code because of type erasing:

```Java
JavaRDD<Vector> rows = jsc.parallelize(...);
RowMatrix mat = new RowMatrix(rows.rdd());
QRDecomposition<RowMatrix, Matrix> result = mat.tallSkinnyQR(true);
```

We should use retag to restore the type to prevent the following exception:

```Java
java.lang.ClassCastException: [Ljava.lang.Object; cannot be cast to [Lorg.apache.spark.mllib.linalg.Vector;
```

## How was this patch tested?

Java unit test

Author: Xusen Yin <[email protected]>

Closes apache#14051 from yinxusen/SPARK-16372.

(cherry picked from commit 4c6f00d)
Signed-off-by: Sean Owen <[email protected]>
(cherry picked from commit 45dda92)
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

4 participants