Skip to content

Commit 20d8ef8

Browse files
committed
[SPARK-12703][MLLIB][DOC][PYTHON] Fixed pyspark.mllib.clustering.KMeans user guide example
Fixed WSSSE computeCost in Python mllib KMeans user guide example by using new computeCost method API in Python. Author: Joseph K. Bradley <[email protected]> Closes #10707 from jkbradley/kmeans-doc-fix.
1 parent 021dafc commit 20d8ef8

File tree

1 file changed

+1
-5
lines changed

1 file changed

+1
-5
lines changed

docs/mllib-clustering.md

Lines changed: 1 addition & 5 deletions
Original file line numberDiff line numberDiff line change
@@ -152,11 +152,7 @@ clusters = KMeans.train(parsedData, 2, maxIterations=10,
152152
runs=10, initializationMode="random")
153153

154154
# Evaluate clustering by computing Within Set Sum of Squared Errors
155-
def error(point):
156-
center = clusters.centers[clusters.predict(point)]
157-
return sqrt(sum([x**2 for x in (point - center)]))
158-
159-
WSSSE = parsedData.map(lambda point: error(point)).reduce(lambda x, y: x + y)
155+
WSSSE = clusters.computeCost(parsedData)
160156
print("Within Set Sum of Squared Error = " + str(WSSSE))
161157

162158
# Save and load model

0 commit comments

Comments
 (0)