Skip to content

Conversation

@dbtsai
Copy link
Member

@dbtsai dbtsai commented Apr 29, 2015

Added detailed mathematical derivation of how scaling and LeastSquaresAggregator work. Refactored the code so the model is compressed based on the storage. We may try compression based on the prediction time.

Also, I found that diffSum will be always zero mathematically, so no corrections are required.

@dbtsai dbtsai changed the title [SPARK-7222][ML] Added mathematical derivation in comment to LinearRegression with ElasticNet. [SPARK-7222][ML] Added mathematical derivation in comment and compressed the model to LinearRegression with ElasticNet Apr 29, 2015
@dbtsai dbtsai changed the title [SPARK-7222][ML] Added mathematical derivation in comment and compressed the model to LinearRegression with ElasticNet [SPARK-7222][ML] Added mathematical derivation in comment and compressed the model in LinearRegression with ElasticNet Apr 29, 2015
@SparkQA
Copy link

SparkQA commented Apr 29, 2015

Test build #31249 has finished for PR 5767 at commit f135c2b.

  • This patch passes all tests.
  • This patch merges cleanly.
  • This patch adds no public classes.
  • This patch does not change any dependencies.

@SparkQA
Copy link

SparkQA commented Apr 29, 2015

Test build #31253 has finished for PR 5767 at commit 63f7d1e.

  • This patch passes all tests.
  • This patch merges cleanly.
  • This patch adds the following public classes (experimental):
    • case class Repartition(numPartitions: Int, shuffle: Boolean, child: LogicalPlan)
    • case class RepartitionByExpression(partitionExpressions: Seq[Expression], child: LogicalPlan)
    • case class Repartition(numPartitions: Int, shuffle: Boolean, child: SparkPlan)
  • This patch does not change any dependencies.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

...the intercept is yMean...

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

thanks.

Signed-off-by: DB Tsai <[email protected]>
@SparkQA
Copy link

SparkQA commented Apr 29, 2015

Test build #31292 has finished for PR 5767 at commit 5929e49.

  • This patch passes all tests.
  • This patch merges cleanly.
  • This patch adds no public classes.
  • This patch does not change any dependencies.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is it really rare? If yStd is 0.0 and then the optimal model would be empty with intercept yMean. In this case, a warning would be proper. Having this giant if ... else block making the code hard to read.

if (yStd == 0.0) {
  logWarning(...)
  if (handlePersistence) ...
  return new LinearRegressionModel(...)
}

// actual implementation

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

fair!

@dbtsai dbtsai changed the title [SPARK-7222][ML] Added mathematical derivation in comment and compressed the model in LinearRegression with ElasticNet [SPARK-7222][ML] Added mathematical derivation in comment and compressed the model, removed the correction terms in LinearRegression with ElasticNet Apr 29, 2015
@SparkQA
Copy link

SparkQA commented Apr 29, 2015

Test build #31316 has finished for PR 5767 at commit 69757b8.

  • This patch passes all tests.
  • This patch merges cleanly.
  • This patch adds no public classes.
  • This patch does not change any dependencies.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

minor: x.foreach { v =>

@asfgit asfgit closed this in 15995c8 Apr 29, 2015
@mengxr
Copy link
Contributor

mengxr commented Apr 29, 2015

LGTM. Merged into master. Thanks! @dbtsai Please address the comment in a separate PR.

@SparkQA
Copy link

SparkQA commented Apr 29, 2015

Test build #31321 has finished for PR 5767 at commit fc9f582.

  • This patch passes all tests.
  • This patch merges cleanly.
  • This patch adds no public classes.
  • This patch does not change any dependencies.

@SparkQA
Copy link

SparkQA commented Apr 29, 2015

Test build #31329 has finished for PR 5767 at commit 5e346c9.

  • This patch passes all tests.
  • This patch merges cleanly.
  • This patch adds no public classes.
  • This patch does not change any dependencies.

@dbtsai dbtsai deleted the lir-doc branch April 30, 2015 16:51
jeanlyn pushed a commit to jeanlyn/spark that referenced this pull request May 28, 2015
…ssed the model, removed the correction terms in LinearRegression with ElasticNet

Added detailed mathematical derivation of how scaling and LeastSquaresAggregator work. Refactored the code so the model is compressed based on the storage. We may try compression based on the prediction time.

Also, I found that diffSum will be always zero mathematically, so no corrections are required.

Author: DB Tsai <[email protected]>

Closes apache#5767 from dbtsai/lir-doc and squashes the following commits:

5e346c9 [DB Tsai] refactoring
fc9f582 [DB Tsai] doc
58456d8 [DB Tsai] address feedback
69757b8 [DB Tsai] actually diffSum is mathematically zero! No correction is needed.
5929e49 [DB Tsai] typo
63f7d1e [DB Tsai] Added compression to the model based on storage
203a295 [DB Tsai] Add more documentation to LinearRegression in new ML framework.
jeanlyn pushed a commit to jeanlyn/spark that referenced this pull request Jun 12, 2015
…ssed the model, removed the correction terms in LinearRegression with ElasticNet

Added detailed mathematical derivation of how scaling and LeastSquaresAggregator work. Refactored the code so the model is compressed based on the storage. We may try compression based on the prediction time.

Also, I found that diffSum will be always zero mathematically, so no corrections are required.

Author: DB Tsai <[email protected]>

Closes apache#5767 from dbtsai/lir-doc and squashes the following commits:

5e346c9 [DB Tsai] refactoring
fc9f582 [DB Tsai] doc
58456d8 [DB Tsai] address feedback
69757b8 [DB Tsai] actually diffSum is mathematically zero! No correction is needed.
5929e49 [DB Tsai] typo
63f7d1e [DB Tsai] Added compression to the model based on storage
203a295 [DB Tsai] Add more documentation to LinearRegression in new ML framework.
nemccarthy pushed a commit to nemccarthy/spark that referenced this pull request Jun 19, 2015
…ssed the model, removed the correction terms in LinearRegression with ElasticNet

Added detailed mathematical derivation of how scaling and LeastSquaresAggregator work. Refactored the code so the model is compressed based on the storage. We may try compression based on the prediction time.

Also, I found that diffSum will be always zero mathematically, so no corrections are required.

Author: DB Tsai <[email protected]>

Closes apache#5767 from dbtsai/lir-doc and squashes the following commits:

5e346c9 [DB Tsai] refactoring
fc9f582 [DB Tsai] doc
58456d8 [DB Tsai] address feedback
69757b8 [DB Tsai] actually diffSum is mathematically zero! No correction is needed.
5929e49 [DB Tsai] typo
63f7d1e [DB Tsai] Added compression to the model based on storage
203a295 [DB Tsai] Add more documentation to LinearRegression in new ML framework.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

4 participants