Skip to content

Commit 06dda1d

Browse files
ghotosrowen
authored andcommitted
[SPARK-20687][MLLIB] mllib.Matrices.fromBreeze may crash when converting from Breeze sparse matrix
## What changes were proposed in this pull request? When two Breeze SparseMatrices are operated, the result matrix may contain provisional 0 values extra in rowIndices and data arrays. This causes an incoherence with the colPtrs data, but Breeze get away with this incoherence by keeping a counter of the valid data. In spark, when this matrices are converted to SparseMatrices, Sparks relies solely on rowIndices, data, and colPtrs, but these might be incorrect because of breeze internal hacks. Therefore, we need to slice both rowIndices and data, using their counter of active data This method is at least called by BlockMatrix when performing distributed block operations, causing exceptions on valid operations. See http://stackoverflow.com/questions/33528555/error-thrown-when-using-blockmatrix-add ## How was this patch tested? Added a test to MatricesSuite that verifies that the conversions are valid and that code doesn't crash. Originally the same code would crash on Spark. Bugfix for https://issues.apache.org/jira/browse/SPARK-20687 Author: Ignacio Bermudez <[email protected]> Author: Ignacio Bermudez Corrales <[email protected]> Closes #17940 from ghoto/bug-fix/SPARK-20687.
1 parent a2b3b67 commit 06dda1d

File tree

2 files changed

+30
-1
lines changed

2 files changed

+30
-1
lines changed

mllib/src/main/scala/org/apache/spark/mllib/linalg/Matrices.scala

Lines changed: 10 additions & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -992,7 +992,16 @@ object Matrices {
992992
new DenseMatrix(dm.rows, dm.cols, dm.data, dm.isTranspose)
993993
case sm: BSM[Double] =>
994994
// There is no isTranspose flag for sparse matrices in Breeze
995-
new SparseMatrix(sm.rows, sm.cols, sm.colPtrs, sm.rowIndices, sm.data)
995+
val nsm = if (sm.rowIndices.length > sm.activeSize) {
996+
// This sparse matrix has trailing zeros.
997+
// Remove them by compacting the matrix.
998+
val csm = sm.copy
999+
csm.compact()
1000+
csm
1001+
} else {
1002+
sm
1003+
}
1004+
new SparseMatrix(nsm.rows, nsm.cols, nsm.colPtrs, nsm.rowIndices, nsm.data)
9961005
case _ =>
9971006
throw new UnsupportedOperationException(
9981007
s"Do not support conversion from type ${breeze.getClass.getName}.")

mllib/src/test/scala/org/apache/spark/mllib/linalg/MatricesSuite.scala

Lines changed: 20 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -513,6 +513,26 @@ class MatricesSuite extends SparkFunSuite {
513513
Matrices.fromBreeze(sum)
514514
}
515515

516+
test("Test FromBreeze when Breeze.CSCMatrix.rowIndices has trailing zeros. - SPARK-20687") {
517+
// (2, 0, 0)
518+
// (2, 0, 0)
519+
val mat1Brz = Matrices.sparse(2, 3, Array(0, 2, 2, 2), Array(0, 1), Array(2, 2)).asBreeze
520+
// (2, 1E-15, 1E-15)
521+
// (2, 1E-15, 1E-15)
522+
val mat2Brz = Matrices.sparse(2, 3,
523+
Array(0, 2, 4, 6),
524+
Array(0, 0, 0, 1, 1, 1),
525+
Array(2, 1E-15, 1E-15, 2, 1E-15, 1E-15)).asBreeze
526+
val t1Brz = mat1Brz - mat2Brz
527+
val t2Brz = mat2Brz - mat1Brz
528+
// The following operations raise exceptions on un-patch Matrices.fromBreeze
529+
val t1 = Matrices.fromBreeze(t1Brz)
530+
val t2 = Matrices.fromBreeze(t2Brz)
531+
// t1 == t1Brz && t2 == t2Brz
532+
assert((t1.asBreeze - t1Brz).iterator.map((x) => math.abs(x._2)).sum < 1E-15)
533+
assert((t2.asBreeze - t2Brz).iterator.map((x) => math.abs(x._2)).sum < 1E-15)
534+
}
535+
516536
test("row/col iterator") {
517537
val dm = new DenseMatrix(3, 2, Array(0, 1, 2, 3, 4, 0))
518538
val sm = dm.toSparse

0 commit comments

Comments
 (0)