From c39aa6d54c7cd601e3dd2b499204d0c6f756606e Mon Sep 17 00:00:00 2001 From: sethah Date: Mon, 5 Dec 2016 13:46:41 +0800 Subject: [PATCH 1/4] update user guide --- docs/ml-advanced.md | 19 ++++++++++++------- 1 file changed, 12 insertions(+), 7 deletions(-) diff --git a/docs/ml-advanced.md b/docs/ml-advanced.md index 12a03d3c9198..b8592ed14de5 100644 --- a/docs/ml-advanced.md +++ b/docs/ml-advanced.md @@ -59,17 +59,22 @@ Given $n$ weighted observations $(w_i, a_i, b_i)$: The number of features for each observation is $m$. We use the following weighted least squares formulation: `\[ -minimize_{x}\frac{1}{2} \sum_{i=1}^n \frac{w_i(a_i^T x -b_i)^2}{\sum_{k=1}^n w_k} + \frac{1}{2}\frac{\lambda}{\delta}\sum_{j=1}^m(\sigma_{j} x_{j})^2 +\min_{\mathbf{x}}\frac{1}{2} \sum_{i=1}^n \frac{w_i(\mathbf{a}_i^T \mathbf{x} -b_i)^2}{\sum_{k=1}^n w_k} + \frac{1}{2}\frac{\lambda}{\delta}\sum_{j=1}^m(\sigma_{j} x_{j})^2 \]` where $\lambda$ is the regularization parameter, $\delta$ is the population standard deviation of the label and $\sigma_j$ is the population standard deviation of the j-th feature column. -This objective function has an analytic solution and it requires only one pass over the data to collect necessary statistics to solve. -Unlike the original dataset which can only be stored in a distributed system, -these statistics can be loaded into memory on a single machine if the number of features is relatively small, and then we can solve the objective function through Cholesky factorization on the driver. +This objective function has an analytic solution and it requires only one pass over the data to collect necessary statistics to solve. For an +$n \times m$ data matrix, these statistics require only $O(m^2)$ storage and so can be stored on a single machine when $n$ (the number of features) is +relatively small. We can then solve the normal equations on a single machine using local methods like direct Cholesky factorization or iterative optimization programs. -WeightedLeastSquares only supports L2 regularization and provides options to enable or disable regularization and standardization. -In order to make the normal equation approach efficient, WeightedLeastSquares requires that the number of features be no more than 4096. For larger problems, use L-BFGS instead. +Spark ML currently supports two types of solvers for the normal equations: Cholesky factorization and Quasi-Newton methods (L-BFGS/OWL-QN). Cholesky factorization +depends on a positive definite covariance matrix (e.g. columns of the data matrix must be linearly independent) and will fail if this condition is violated. Quasi-Newton methods +are still capable of providing a reasonable solution even when the covariance matrix is not positive definite, so the normal equation solver can also fall back to +Quasi-Newton methods in this case. This fallback is currently always enabled for the `LinearRegression` estimator. + +`WeightedLeastSquares` supports L1, L2, and elastic-net regularization and provides options to enable or disable regularization and standardization. +In order to make the normal equation approach efficient, `WeightedLeastSquares` requires that the number of features be no more than 4096. For larger problems, use L-BFGS instead. ## Iteratively reweighted least squares (IRLS) @@ -83,6 +88,6 @@ It solves certain optimization problems iteratively through the following proced * solve a weighted least squares (WLS) problem by WeightedLeastSquares. * repeat above steps until convergence. -Since it involves solving a weighted least squares (WLS) problem by WeightedLeastSquares in each iteration, +Since it involves solving a weighted least squares (WLS) problem by `WeightedLeastSquares` in each iteration, it also requires the number of features to be no more than 4096. Currently IRLS is used as the default solver of [GeneralizedLinearRegression](api/scala/index.html#org.apache.spark.ml.regression.GeneralizedLinearRegression). From 1049a6d1e8e2c3136b54598fef48c3106df4c4c4 Mon Sep 17 00:00:00 2001 From: sethah Date: Mon, 5 Dec 2016 18:08:37 +0800 Subject: [PATCH 2/4] typo --- docs/ml-advanced.md | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/docs/ml-advanced.md b/docs/ml-advanced.md index b8592ed14de5..e761479571e9 100644 --- a/docs/ml-advanced.md +++ b/docs/ml-advanced.md @@ -65,7 +65,7 @@ where $\lambda$ is the regularization parameter, $\delta$ is the population stan and $\sigma_j$ is the population standard deviation of the j-th feature column. This objective function has an analytic solution and it requires only one pass over the data to collect necessary statistics to solve. For an -$n \times m$ data matrix, these statistics require only $O(m^2)$ storage and so can be stored on a single machine when $n$ (the number of features) is +$n \times m$ data matrix, these statistics require only $O(m^2)$ storage and so can be stored on a single machine when $m$ (the number of features) is relatively small. We can then solve the normal equations on a single machine using local methods like direct Cholesky factorization or iterative optimization programs. Spark ML currently supports two types of solvers for the normal equations: Cholesky factorization and Quasi-Newton methods (L-BFGS/OWL-QN). Cholesky factorization From 2ab9675c51b4de021af9045b04e2258c1898b0f3 Mon Sep 17 00:00:00 2001 From: sethah Date: Wed, 7 Dec 2016 10:47:44 +0800 Subject: [PATCH 3/4] address review --- docs/ml-advanced.md | 15 +++++++++------ 1 file changed, 9 insertions(+), 6 deletions(-) diff --git a/docs/ml-advanced.md b/docs/ml-advanced.md index e761479571e9..f515a66d6121 100644 --- a/docs/ml-advanced.md +++ b/docs/ml-advanced.md @@ -59,21 +59,24 @@ Given $n$ weighted observations $(w_i, a_i, b_i)$: The number of features for each observation is $m$. We use the following weighted least squares formulation: `\[ -\min_{\mathbf{x}}\frac{1}{2} \sum_{i=1}^n \frac{w_i(\mathbf{a}_i^T \mathbf{x} -b_i)^2}{\sum_{k=1}^n w_k} + \frac{1}{2}\frac{\lambda}{\delta}\sum_{j=1}^m(\sigma_{j} x_{j})^2 +\min_{\mathbf{x}}\frac{1}{2} \sum_{i=1}^n \frac{w_i(\mathbf{a}_i^T \mathbf{x} -b_i)^2}{\sum_{k=1}^n w_k} + \frac{\lambda}{\delta}\left[\frac{1}{2}(1 - \alpha)\sum_{j=1}^m(\sigma_j x_j)^2 + \alpha\sum_{j=1}^m |\sigma_j x_j|\right] \]` -where $\lambda$ is the regularization parameter, $\delta$ is the population standard deviation of the label +where $\lambda$ is the regularization parameter, $\alpha$ is the elastic-net mixing parameter, $\delta$ is the population standard deviation of the label and $\sigma_j$ is the population standard deviation of the j-th feature column. -This objective function has an analytic solution and it requires only one pass over the data to collect necessary statistics to solve. For an +This objective function requires only one pass over the data to collect the statistics necessary to solve it. For an $n \times m$ data matrix, these statistics require only $O(m^2)$ storage and so can be stored on a single machine when $m$ (the number of features) is relatively small. We can then solve the normal equations on a single machine using local methods like direct Cholesky factorization or iterative optimization programs. -Spark ML currently supports two types of solvers for the normal equations: Cholesky factorization and Quasi-Newton methods (L-BFGS/OWL-QN). Cholesky factorization +Spark MLlib currently supports two types of solvers for the normal equations: Cholesky factorization and Quasi-Newton methods (L-BFGS/OWL-QN). Cholesky factorization depends on a positive definite covariance matrix (e.g. columns of the data matrix must be linearly independent) and will fail if this condition is violated. Quasi-Newton methods are still capable of providing a reasonable solution even when the covariance matrix is not positive definite, so the normal equation solver can also fall back to -Quasi-Newton methods in this case. This fallback is currently always enabled for the `LinearRegression` estimator. +Quasi-Newton methods in this case. This fallback is currently always enabled for the `LinearRegression` and `GeneralizedLinearRegression` estimators. + +`WeightedLeastSquares` supports L1, L2, and elastic-net regularization and provides options to enable or disable regularization and standardization. In the case where no +L1 regularization is applied (i.e. $\alpha = 0$), there exists an analytical solution and either Cholesky or Quasi-Newton solver may be used. When $\alpha > 0$ no analytical +solution exists and we instead use the Quasi-Newton solver to find the coefficients iteratively. -`WeightedLeastSquares` supports L1, L2, and elastic-net regularization and provides options to enable or disable regularization and standardization. In order to make the normal equation approach efficient, `WeightedLeastSquares` requires that the number of features be no more than 4096. For larger problems, use L-BFGS instead. ## Iteratively reweighted least squares (IRLS) From 49311332c81d398829925de429dc48a62a3bac0b Mon Sep 17 00:00:00 2001 From: sethah Date: Thu, 8 Dec 2016 09:42:43 +0800 Subject: [PATCH 4/4] eg to ie --- docs/ml-advanced.md | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/docs/ml-advanced.md b/docs/ml-advanced.md index f515a66d6121..2747f2df7cb1 100644 --- a/docs/ml-advanced.md +++ b/docs/ml-advanced.md @@ -69,7 +69,7 @@ $n \times m$ data matrix, these statistics require only $O(m^2)$ storage and so relatively small. We can then solve the normal equations on a single machine using local methods like direct Cholesky factorization or iterative optimization programs. Spark MLlib currently supports two types of solvers for the normal equations: Cholesky factorization and Quasi-Newton methods (L-BFGS/OWL-QN). Cholesky factorization -depends on a positive definite covariance matrix (e.g. columns of the data matrix must be linearly independent) and will fail if this condition is violated. Quasi-Newton methods +depends on a positive definite covariance matrix (i.e. columns of the data matrix must be linearly independent) and will fail if this condition is violated. Quasi-Newton methods are still capable of providing a reasonable solution even when the covariance matrix is not positive definite, so the normal equation solver can also fall back to Quasi-Newton methods in this case. This fallback is currently always enabled for the `LinearRegression` and `GeneralizedLinearRegression` estimators.