update doc to explain logic for non-double type

actuaryzhang · actuaryzhang · commit 9e9589f83be8 · 2019-07-31T10:52:08.000-07:00
diff --git a/mllib/src/main/scala/org/apache/spark/ml/feature/Imputer.scala b/mllib/src/main/scala/org/apache/spark/ml/feature/Imputer.scala
@@ -87,6 +87,12 @@ private[feature] trait ImputerParams extends Params with HasInputCols with HasOu
  * numeric type. Currently Imputer does not support categorical features
  * (SPARK-15041) and possibly creates incorrect values for a categorical feature.
  *
+ * Note that the input columns are converted to Double data type internally to compute
+ * the mean/median value and impute the missing values, which are then casted back to
+ * the original data type in the output. So the output column always has the same data
+ * type as the input. As an example, if the input column is IntegerType (1, 2, 4, null),
+ * the output will be IntegerType (1, 2, 4, 2) after mean imputation.
+ *
  * Note that the mean/median value is computed after filtering out missing values.
  * All Null values in the input columns are treated as missing, and so are also imputed. For
  * computing median, DataFrameStatFunctions.approxQuantile is used with a relative error of 0.001.