Skip to content

Commit 2f98d31

Browse files
author
VinceShieh
committed
revert changes in feature.py
Signed-off-by: VinceShieh <[email protected]>
1 parent 5274d4a commit 2f98d31

File tree

1 file changed

+0
-7
lines changed

1 file changed

+0
-7
lines changed

python/pyspark/ml/feature.py

Lines changed: 0 additions & 7 deletions
Original file line numberDiff line numberDiff line change
@@ -1155,13 +1155,6 @@ class QuantileDiscretizer(JavaEstimator, HasInputCol, HasOutputCol, JavaMLReadab
11551155
11561156
`QuantileDiscretizer` takes a column with continuous features and outputs a column with binned
11571157
categorical features. The number of bins can be set using the :py:attr:`numBuckets` parameter.
1158-
It is possible that the number of buckets used will be less than this value, for example, if
1159-
there are too few distinct values of the input to create enough distinct quantiles. Note also
1160-
that QuantileDiscretizer will raise an error when it finds NaN value in the dataset, but user
1161-
can also choose to either keep or remove NaN values within the dataset by setting
1162-
handleInvalid. If user chooses to keep NaN values, they will be handled specially and placed
1163-
into their own bucket, for example, if 4 buckets are used, then non-NaN data will be put into
1164-
buckets[0-3], but NaNs will be counted in a special bucket[4].
11651158
The bin ranges are chosen using an approximate algorithm (see the documentation for
11661159
:py:meth:`~.DataFrameStatFunctions.approxQuantile` for a detailed description).
11671160
The precision of the approximation can be controlled with the

0 commit comments

Comments
 (0)