Skip to content

Commit 9e9589f

Browse files
committed
update doc to explain logic for non-double type
1 parent c0722f1 commit 9e9589f

File tree

1 file changed

+6
-0
lines changed

1 file changed

+6
-0
lines changed

mllib/src/main/scala/org/apache/spark/ml/feature/Imputer.scala

Lines changed: 6 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -87,6 +87,12 @@ private[feature] trait ImputerParams extends Params with HasInputCols with HasOu
8787
* numeric type. Currently Imputer does not support categorical features
8888
* (SPARK-15041) and possibly creates incorrect values for a categorical feature.
8989
*
90+
* Note that the input columns are converted to Double data type internally to compute
91+
* the mean/median value and impute the missing values, which are then casted back to
92+
* the original data type in the output. So the output column always has the same data
93+
* type as the input. As an example, if the input column is IntegerType (1, 2, 4, null),
94+
* the output will be IntegerType (1, 2, 4, 2) after mean imputation.
95+
*
9096
* Note that the mean/median value is computed after filtering out missing values.
9197
* All Null values in the input columns are treated as missing, and so are also imputed. For
9298
* computing median, DataFrameStatFunctions.approxQuantile is used with a relative error of 0.001.

0 commit comments

Comments
 (0)