Commit ca8243f
[MINOR][ML] Minor correction in the powerIterationSuite
## What changes were proposed in this pull request?
Currently the power iteration clustering test in spark ml, maps the results to the labels 0 and 1 for assertion. Since the clustering outputs need not be the same as the mapped labels, it may cause failure in the test case. Even if it correctly maps, theoretically we cannot guarantee which set belongs to which cluster label. KMeans can assign label 0 to either of the set.
PowerIterationClusteringSuite in the MLLib checks the clustering results without mapping to the particular cluster label, as shown below.
`` val predictions = Array.fill(2)(mutable.Set.empty[Long])
model.assignments.collect().foreach { a =>
predictions(a.cluster) += a.id
}
assert(predictions.toSet == Set((0 until n1).toSet, (n1 until n).toSet))
``
## How was this patch tested?
Existing tests
Author: Shahid <[email protected]>
Closes #21689 from shahidki31/picTestSuiteMinorCorrection.1 parent 1a2655a commit ca8243f
File tree
1 file changed
+20
-10
lines changed- mllib/src/test/scala/org/apache/spark/ml/clustering
1 file changed
+20
-10
lines changedLines changed: 20 additions & 10 deletions
| Original file line number | Diff line number | Diff line change | |
|---|---|---|---|
| |||
17 | 17 | | |
18 | 18 | | |
19 | 19 | | |
| 20 | + | |
| 21 | + | |
20 | 22 | | |
21 | 23 | | |
22 | 24 | | |
| |||
76 | 78 | | |
77 | 79 | | |
78 | 80 | | |
79 | | - | |
80 | | - | |
81 | | - | |
82 | | - | |
83 | | - | |
84 | | - | |
| 81 | + | |
| 82 | + | |
| 83 | + | |
| 84 | + | |
| 85 | + | |
| 86 | + | |
| 87 | + | |
| 88 | + | |
| 89 | + | |
85 | 90 | | |
86 | 91 | | |
87 | 92 | | |
88 | 93 | | |
89 | 94 | | |
90 | 95 | | |
91 | 96 | | |
92 | | - | |
93 | | - | |
94 | | - | |
95 | | - | |
| 97 | + | |
| 98 | + | |
| 99 | + | |
| 100 | + | |
| 101 | + | |
| 102 | + | |
| 103 | + | |
| 104 | + | |
| 105 | + | |
96 | 106 | | |
97 | 107 | | |
98 | 108 | | |
| |||
0 commit comments