[SPARK-7222][ML] Added mathematical derivation in comment and compressed the model, removed the correction terms in LinearRegression with ElasticNet #5767

dbtsai · 2015-04-29T05:48:49Z

Added detailed mathematical derivation of how scaling and LeastSquaresAggregator work. Refactored the code so the model is compressed based on the storage. We may try compression based on the prediction time.

Also, I found that diffSum will be always zero mathematically, so no corrections are required.

SparkQA · 2015-04-29T07:40:51Z

Test build #31249 has finished for PR 5767 at commit f135c2b.

This patch passes all tests.
This patch merges cleanly.
This patch adds no public classes.
This patch does not change any dependencies.

SparkQA · 2015-04-29T07:58:50Z

Test build #31253 has finished for PR 5767 at commit 63f7d1e.

This patch passes all tests.
This patch merges cleanly.
This patch adds the following public classes (experimental):
- case class Repartition(numPartitions: Int, shuffle: Boolean, child: LogicalPlan)
- case class RepartitionByExpression(partitionExpressions: Seq[Expression], child: LogicalPlan)
- case class Repartition(numPartitions: Int, shuffle: Boolean, child: SparkPlan)
This patch does not change any dependencies.

viirya · 2015-04-29T10:52:34Z

mllib/src/main/scala/org/apache/spark/ml/regression/LinearRegression.scala

...the intercept is yMean...

Signed-off-by: DB Tsai <[email protected]>

SparkQA · 2015-04-29T18:34:35Z

Test build #31292 has finished for PR 5767 at commit 5929e49.

This patch passes all tests.
This patch merges cleanly.
This patch adds no public classes.
This patch does not change any dependencies.

mengxr · 2015-04-29T19:57:47Z

mllib/src/main/scala/org/apache/spark/ml/regression/LinearRegression.scala

Is it really rare? If yStd is 0.0 and then the optimal model would be empty with intercept yMean. In this case, a warning would be proper. Having this giant if ... else block making the code hard to read.

if (yStd == 0.0) { logWarning(...) if (handlePersistence) ... return new LinearRegressionModel(...) } // actual implementation

SparkQA · 2015-04-29T21:50:40Z

Test build #31316 has finished for PR 5767 at commit 69757b8.

This patch passes all tests.
This patch merges cleanly.
This patch adds no public classes.
This patch does not change any dependencies.

mengxr · 2015-04-29T21:53:06Z

mllib/src/main/scala/org/apache/spark/mllib/util/LinearDataGenerator.scala

minor: x.foreach { v =>

mengxr · 2015-04-29T21:54:52Z

LGTM. Merged into master. Thanks! @dbtsai Please address the comment in a separate PR.

SparkQA · 2015-04-29T22:14:48Z

Test build #31321 has finished for PR 5767 at commit fc9f582.

This patch passes all tests.
This patch merges cleanly.
This patch adds no public classes.
This patch does not change any dependencies.

SparkQA · 2015-04-29T22:46:54Z

Test build #31329 has finished for PR 5767 at commit 5e346c9.

This patch passes all tests.
This patch merges cleanly.
This patch adds no public classes.
This patch does not change any dependencies.

…ssed the model, removed the correction terms in LinearRegression with ElasticNet Added detailed mathematical derivation of how scaling and LeastSquaresAggregator work. Refactored the code so the model is compressed based on the storage. We may try compression based on the prediction time. Also, I found that diffSum will be always zero mathematically, so no corrections are required. Author: DB Tsai <[email protected]> Closes apache#5767 from dbtsai/lir-doc and squashes the following commits: 5e346c9 [DB Tsai] refactoring fc9f582 [DB Tsai] doc 58456d8 [DB Tsai] address feedback 69757b8 [DB Tsai] actually diffSum is mathematically zero! No correction is needed. 5929e49 [DB Tsai] typo 63f7d1e [DB Tsai] Added compression to the model based on storage 203a295 [DB Tsai] Add more documentation to LinearRegression in new ML framework.

dbtsai force-pushed the lir-doc branch from e24c7fb to f135c2b Compare April 29, 2015 05:52

DB Tsai added 2 commits April 28, 2015 23:10

Add more documentation to LinearRegression in new ML framework.

203a295

Added compression to the model based on storage

63f7d1e

dbtsai force-pushed the lir-doc branch from f135c2b to 63f7d1e Compare April 29, 2015 06:15

dbtsai changed the title ~~[SPARK-7222][ML] Added mathematical derivation in comment to LinearRegression with ElasticNet.~~ [SPARK-7222][ML] Added mathematical derivation in comment and compressed the model to LinearRegression with ElasticNet Apr 29, 2015

dbtsai changed the title ~~[SPARK-7222][ML] Added mathematical derivation in comment and compressed the model to LinearRegression with ElasticNet~~ [SPARK-7222][ML] Added mathematical derivation in comment and compressed the model in LinearRegression with ElasticNet Apr 29, 2015

viirya reviewed Apr 29, 2015
View reviewed changes

typo

5929e49

Signed-off-by: DB Tsai <[email protected]>

mengxr reviewed Apr 29, 2015
View reviewed changes

DB Tsai added 3 commits April 29, 2015 13:10

actually diffSum is mathematically zero! No correction is needed.

69757b8

address feedback

58456d8

doc

fc9f582

refactoring

5e346c9

mengxr reviewed Apr 29, 2015
View reviewed changes

mllib/src/main/scala/org/apache/spark/mllib/util/LinearDataGenerator.scala

Copy link

Contributor

mengxr Apr 29, 2015

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

minor: x.foreach { v =>

asfgit closed this in 15995c8 Apr 29, 2015

mengxr mentioned this pull request Apr 29, 2015

[SPARK-7176] [ml] Add validation functionality to Param #5740

Closed

dbtsai deleted the lir-doc branch April 30, 2015 16:51

[SPARK-7222][ML] Added mathematical derivation in comment and compressed the model, removed the correction terms in LinearRegression with ElasticNet #5767

[SPARK-7222][ML] Added mathematical derivation in comment and compressed the model, removed the correction terms in LinearRegression with ElasticNet #5767

Uh oh!

Conversation

dbtsai commented Apr 29, 2015

Uh oh!

SparkQA commented Apr 29, 2015

Uh oh!

SparkQA commented Apr 29, 2015

Uh oh!

viirya Apr 29, 2015

Choose a reason for hiding this comment

Uh oh!

dbtsai Apr 29, 2015

Choose a reason for hiding this comment

Uh oh!

SparkQA commented Apr 29, 2015

Uh oh!

mengxr Apr 29, 2015

Choose a reason for hiding this comment

Uh oh!

dbtsai Apr 29, 2015

Choose a reason for hiding this comment

Uh oh!

SparkQA commented Apr 29, 2015

Uh oh!

mengxr Apr 29, 2015

Choose a reason for hiding this comment

Uh oh!

mengxr commented Apr 29, 2015

Uh oh!

SparkQA commented Apr 29, 2015

Uh oh!

SparkQA commented Apr 29, 2015

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

4 participants