From 628edbe66e7e861a0db18c7908641a37542495d7 Mon Sep 17 00:00:00 2001 From: Zheng RuiFeng Date: Sat, 5 Nov 2016 13:58:40 +0800 Subject: [PATCH 1/3] create pr --- docs/ml-classification-regression.md | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/docs/ml-classification-regression.md b/docs/ml-classification-regression.md index bb2e404330cc..0566e9d9ecae 100644 --- a/docs/ml-classification-regression.md +++ b/docs/ml-classification-regression.md @@ -765,7 +765,7 @@ is treated as piecewise linear function. The rules for prediction therefore are: predictions of the two closest features. In case there are multiple values with the same feature then the same rules as in previous point are used. -### Examples +**Example**
From e2de1b7050f5be2f49213ba67457e7e179b73a64 Mon Sep 17 00:00:00 2001 From: Zheng RuiFeng Date: Sat, 5 Nov 2016 14:20:21 +0800 Subject: [PATCH 2/3] create pr --- docs/ml-classification-regression.md | 30 ++++++++++++++-------------- docs/ml-clustering.md | 8 +++++--- docs/ml-features.md | 30 ++++++++++++++++++++++++++++ docs/ml-tuning.md | 4 ++-- 4 files changed, 52 insertions(+), 20 deletions(-) diff --git a/docs/ml-classification-regression.md b/docs/ml-classification-regression.md index 0566e9d9ecae..b10793d83ec6 100644 --- a/docs/ml-classification-regression.md +++ b/docs/ml-classification-regression.md @@ -46,7 +46,7 @@ parameter to select between these two algorithms, or leave it unset and Spark wi For more background and more details about the implementation of binomial logistic regression, refer to the documentation of [logistic regression in `spark.mllib`](mllib-linear-methods.html#logistic-regression). -**Example** +**Examples** The following example shows how to train binomial and multinomial logistic regression models for binary classification with elastic net regularization. `elasticNetParam` corresponds to @@ -137,7 +137,7 @@ We minimize the weighted negative log-likelihood, using a multinomial response m For a detailed derivation please see [here](https://en.wikipedia.org/wiki/Multinomial_logistic_regression#As_a_log-linear_model). -**Example** +**Examples** The following example shows how to train a multiclass logistic regression model with elastic net regularization. @@ -164,7 +164,7 @@ model with elastic net regularization. Decision trees are a popular family of classification and regression methods. More information about the `spark.ml` implementation can be found further in the [section on decision trees](#decision-trees). -**Example** +**Examples** The following examples load a dataset in LibSVM format, split it into training and test sets, train on the first dataset, and then evaluate on the held-out test set. We use two feature transformers to prepare the data; these help index categories for the label and categorical features, adding metadata to the `DataFrame` which the Decision Tree algorithm can recognize. @@ -201,7 +201,7 @@ More details on parameters can be found in the [Python API documentation](api/py Random forests are a popular family of classification and regression methods. More information about the `spark.ml` implementation can be found further in the [section on random forests](#random-forests). -**Example** +**Examples** The following examples load a dataset in LibSVM format, split it into training and test sets, train on the first dataset, and then evaluate on the held-out test set. We use two feature transformers to prepare the data; these help index categories for the label and categorical features, adding metadata to the `DataFrame` which the tree-based algorithms can recognize. @@ -234,7 +234,7 @@ Refer to the [Python API docs](api/python/pyspark.ml.html#pyspark.ml.classificat Gradient-boosted trees (GBTs) are a popular classification and regression method using ensembles of decision trees. More information about the `spark.ml` implementation can be found further in the [section on GBTs](#gradient-boosted-trees-gbts). -**Example** +**Examples** The following examples load a dataset in LibSVM format, split it into training and test sets, train on the first dataset, and then evaluate on the held-out test set. We use two feature transformers to prepare the data; these help index categories for the label and categorical features, adding metadata to the `DataFrame` which the tree-based algorithms can recognize. @@ -284,7 +284,7 @@ The number of nodes `$N$` in the output layer corresponds to the number of class MLPC employs backpropagation for learning the model. We use the logistic loss function for optimization and L-BFGS as an optimization routine. -**Example** +**Examples**
@@ -311,7 +311,7 @@ MLPC employs backpropagation for learning the model. We use the logistic loss fu Predictions are done by evaluating each binary classifier and the index of the most confident classifier is output as label. -**Example** +**Examples** The example below demonstrates how to load the [Iris dataset](http://www.csie.ntu.edu.tw/~cjlin/libsvmtools/datasets/multiclass/iris.scale), parse it as a DataFrame and perform multiclass classification using `OneVsRest`. The test error is calculated to measure the algorithm accuracy. @@ -348,7 +348,7 @@ naive Bayes](http://nlp.stanford.edu/IR-book/html/htmledition/naive-bayes-text-c and [Bernoulli naive Bayes](http://nlp.stanford.edu/IR-book/html/htmledition/the-bernoulli-model-1.html). More information can be found in the section on [Naive Bayes in MLlib](mllib-naive-bayes.html#naive-bayes-sparkmllib). -**Example** +**Examples**
@@ -383,7 +383,7 @@ summaries is similar to the logistic regression case. > When fitting LinearRegressionModel without intercept on dataset with constant nonzero column by "l-bfgs" solver, Spark MLlib outputs zero coefficients for constant nonzero columns. This behavior is the same as R glmnet but different from LIBSVM. -**Example** +**Examples** The following example demonstrates training an elastic net regularized linear @@ -511,7 +511,7 @@ others. -**Example** +**Examples** The following example demonstrates training a GLM with a Gaussian response and identity link function and extracting model summary statistics. @@ -544,7 +544,7 @@ Refer to the [Python API docs](api/python/pyspark.ml.html#pyspark.ml.regression. Decision trees are a popular family of classification and regression methods. More information about the `spark.ml` implementation can be found further in the [section on decision trees](#decision-trees). -**Example** +**Examples** The following examples load a dataset in LibSVM format, split it into training and test sets, train on the first dataset, and then evaluate on the held-out test set. We use a feature transformer to index categorical features, adding metadata to the `DataFrame` which the Decision Tree algorithm can recognize. @@ -579,7 +579,7 @@ More details on parameters can be found in the [Python API documentation](api/py Random forests are a popular family of classification and regression methods. More information about the `spark.ml` implementation can be found further in the [section on random forests](#random-forests). -**Example** +**Examples** The following examples load a dataset in LibSVM format, split it into training and test sets, train on the first dataset, and then evaluate on the held-out test set. We use a feature transformer to index categorical features, adding metadata to the `DataFrame` which the tree-based algorithms can recognize. @@ -612,7 +612,7 @@ Refer to the [Python API docs](api/python/pyspark.ml.html#pyspark.ml.regression. Gradient-boosted trees (GBTs) are a popular regression method using ensembles of decision trees. More information about the `spark.ml` implementation can be found further in the [section on GBTs](#gradient-boosted-trees-gbts). -**Example** +**Examples** Note: For this example dataset, `GBTRegressor` actually only needs 1 iteration, but that will not be true in general. @@ -700,7 +700,7 @@ The implementation matches the result from R's survival function > When fitting AFTSurvivalRegressionModel without intercept on dataset with constant nonzero column, Spark MLlib outputs zero coefficients for constant nonzero columns. This behavior is different from R survival::survreg. -**Example** +**Examples**
@@ -765,7 +765,7 @@ is treated as piecewise linear function. The rules for prediction therefore are: predictions of the two closest features. In case there are multiple values with the same feature then the same rules as in previous point are used. -**Example** +**Examples**
diff --git a/docs/ml-clustering.md b/docs/ml-clustering.md index 8a0a61cb595e..eedacb12bc46 100644 --- a/docs/ml-clustering.md +++ b/docs/ml-clustering.md @@ -65,7 +65,7 @@ called [kmeans||](http://theory.stanford.edu/~sergei/papers/vldb12-kmpar.pdf). -### Example +**Examples**
@@ -94,6 +94,8 @@ Refer to the [Python API docs](api/python/pyspark.ml.html#pyspark.ml.clustering. and generates a `LDAModel` as the base model. Expert users may cast a `LDAModel` generated by `EMLDAOptimizer` to a `DistributedLDAModel` if needed. +**Examples** +
@@ -128,7 +130,7 @@ Bisecting K-means can often be much faster than regular K-means, but it will gen `BisectingKMeans` is implemented as an `Estimator` and generates a `BisectingKMeansModel` as the base model. -### Example +**Examples**
@@ -210,7 +212,7 @@ model. -### Example +**Examples**
diff --git a/docs/ml-features.md b/docs/ml-features.md index 352887d3ba6e..3f64f2549082 100644 --- a/docs/ml-features.md +++ b/docs/ml-features.md @@ -112,6 +112,8 @@ can then be used as features for prediction, document similarity calculations, e Please refer to the [MLlib user guide on Word2Vec](mllib-feature-extraction.html#word2vec) for more details. +**Examples** + In the following code segment, we start with a set of documents, each of which is represented as a sequence of words. For each document, we transform it into a feature vector. This feature vector could then be passed to a learning algorithm.
@@ -220,6 +222,8 @@ for more details on the API. Alternatively, users can set parameter "gaps" to false indicating the regex "pattern" denotes "tokens" rather than splitting gaps, and find all matching occurrences as the tokenization result. +**Examples** +
@@ -321,6 +325,8 @@ An [n-gram](https://en.wikipedia.org/wiki/N-gram) is a sequence of $n$ tokens (t `NGram` takes as input a sequence of strings (e.g. the output of a [Tokenizer](ml-features.html#tokenizer)). The parameter `n` is used to determine the number of terms in each $n$-gram. The output will consist of a sequence of $n$-grams where each $n$-gram is represented by a space-delimited string of $n$ consecutive words. If the input sequence contains fewer than `n` strings, no output is produced. +**Examples** +
@@ -358,6 +364,8 @@ for binarization. Feature values greater than the threshold are binarized to 1.0 to or less than the threshold are binarized to 0.0. Both Vector and Double types are supported for `inputCol`. +**Examples** +
@@ -388,6 +396,8 @@ for more details on the API. [PCA](http://en.wikipedia.org/wiki/Principal_component_analysis) is a statistical procedure that uses an orthogonal transformation to convert a set of observations of possibly correlated variables into a set of values of linearly uncorrelated variables called principal components. A [PCA](api/scala/index.html#org.apache.spark.ml.feature.PCA) class trains a model to project vectors to a low-dimensional space using PCA. The example below shows how to project 5-dimensional feature vectors into 3-dimensional principal components. +**Examples** +
@@ -418,6 +428,8 @@ for more details on the API. [Polynomial expansion](http://en.wikipedia.org/wiki/Polynomial_expansion) is the process of expanding your features into a polynomial space, which is formulated by an n-degree combination of original dimensions. A [PolynomialExpansion](api/scala/index.html#org.apache.spark.ml.feature.PolynomialExpansion) class provides this functionality. The example below shows how to expand your features into a 3-degree polynomial space. +**Examples** +
@@ -458,6 +470,8 @@ for the transform is unitary. No shift is applied to the transformed sequence (e.g. the $0$th element of the transformed sequence is the $0$th DCT coefficient and _not_ the $N/2$th). +**Examples** +
@@ -663,6 +677,8 @@ for more details on the API. [One-hot encoding](http://en.wikipedia.org/wiki/One-hot) maps a column of label indices to a column of binary vectors, with at most a single one-value. This encoding allows algorithms which expect continuous features, such as Logistic Regression, to use categorical features. +**Examples** +
@@ -701,6 +717,8 @@ It can both automatically decide which features are categorical and convert orig Indexing categorical features allows algorithms such as Decision Trees and Tree Ensembles to treat categorical features appropriately, improving performance. +**Examples** + In the example below, we read in a dataset of labeled points and then use `VectorIndexer` to decide which features should be treated as categorical. We transform the categorical feature values to their indices. This transformed data could then be passed to algorithms such as `DecisionTreeRegressor` that handle categorical features.
@@ -734,6 +752,8 @@ for more details on the API. `Normalizer` is a `Transformer` which transforms a dataset of `Vector` rows, normalizing each `Vector` to have unit norm. It takes parameter `p`, which specifies the [p-norm](http://en.wikipedia.org/wiki/Norm_%28mathematics%29#p-norm) used for normalization. ($p = 2$ by default.) This normalization can help standardize your input data and improve the behavior of learning algorithms. +**Examples** + The following example demonstrates how to load a dataset in libsvm format and then normalize each row to have unit $L^1$ norm and unit $L^\infty$ norm.
@@ -774,6 +794,8 @@ for more details on the API. Note that if the standard deviation of a feature is zero, it will return default `0.0` value in the `Vector` for that feature. +**Examples** + The following example demonstrates how to load a dataset in libsvm format and then normalize each feature to have unit standard deviation.
@@ -819,6 +841,8 @@ For the case `$E_{max} == E_{min}$`, `$Rescaled(e_i) = 0.5 * (max + min)$` Note that since zero values will probably be transformed to non-zero values, output of the transformer will be `DenseVector` even for sparse input. +**Examples** + The following example demonstrates how to load a dataset in libsvm format and then rescale each feature to [0, 1].
@@ -860,6 +884,8 @@ data, and thus does not destroy any sparsity. `MaxAbsScaler` computes summary statistics on a data set and produces a `MaxAbsScalerModel`. The model can then transform each feature individually to range [-1, 1]. +**Examples** + The following example demonstrates how to load a dataset in libsvm format and then rescale each feature to [-1, 1].
@@ -903,6 +929,8 @@ Note also that the splits that you provided have to be in strictly increasing or More details can be found in the API docs for [Bucketizer](api/scala/index.html#org.apache.spark.ml.feature.Bucketizer). +**Examples** + The following example demonstrates how to bucketize a column of `Double`s into another index-wised column.
@@ -951,6 +979,8 @@ v_N \end{pmatrix} \]` +**Examples** + This example below demonstrates how to transform vectors using a transforming vector value.
diff --git a/docs/ml-tuning.md b/docs/ml-tuning.md index 2ca90c7092fd..3f05a73f0a2e 100644 --- a/docs/ml-tuning.md +++ b/docs/ml-tuning.md @@ -62,7 +62,7 @@ To help construct the parameter grid, users can use the [`ParamGridBuilder`](api After identifying the best `ParamMap`, `CrossValidator` finally re-fits the `Estimator` using the best `ParamMap` and the entire dataset. -## Example: model selection via cross-validation +## Examples: model selection via cross-validation The following example demonstrates using `CrossValidator` to select from a grid of parameters. @@ -102,7 +102,7 @@ It splits the dataset into these two parts using the `trainRatio` parameter. For Like `CrossValidator`, `TrainValidationSplit` finally fits the `Estimator` using the best `ParamMap` and the entire dataset. -## Example: model selection via train validation split +## Examples: model selection via train validation split
From bf59358b71efdcbe4aff237c0151a7613f6cfede Mon Sep 17 00:00:00 2001 From: Zheng RuiFeng Date: Mon, 7 Nov 2016 10:00:28 +0800 Subject: [PATCH 3/3] update --- docs/ml-collaborative-filtering.md | 2 +- docs/ml-tuning.md | 4 ++-- 2 files changed, 3 insertions(+), 3 deletions(-) diff --git a/docs/ml-collaborative-filtering.md b/docs/ml-collaborative-filtering.md index 1d02d6933cb4..4d19b4069a1f 100644 --- a/docs/ml-collaborative-filtering.md +++ b/docs/ml-collaborative-filtering.md @@ -59,7 +59,7 @@ This approach is named "ALS-WR" and discussed in the paper It makes `regParam` less dependent on the scale of the dataset, so we can apply the best parameter learned from a sampled subset to the full dataset and expect similar performance. -## Examples +**Examples**
diff --git a/docs/ml-tuning.md b/docs/ml-tuning.md index 3f05a73f0a2e..e4b070331db4 100644 --- a/docs/ml-tuning.md +++ b/docs/ml-tuning.md @@ -62,7 +62,7 @@ To help construct the parameter grid, users can use the [`ParamGridBuilder`](api After identifying the best `ParamMap`, `CrossValidator` finally re-fits the `Estimator` using the best `ParamMap` and the entire dataset. -## Examples: model selection via cross-validation +**Examples: model selection via cross-validation** The following example demonstrates using `CrossValidator` to select from a grid of parameters. @@ -102,7 +102,7 @@ It splits the dataset into these two parts using the `trainRatio` parameter. For Like `CrossValidator`, `TrainValidationSplit` finally fits the `Estimator` using the best `ParamMap` and the entire dataset. -## Examples: model selection via train validation split +**Examples: model selection via train validation split**