-
Notifications
You must be signed in to change notification settings - Fork 28.9k
[SPARK-7222][ML] Added mathematical derivation in comment and compressed the model, removed the correction terms in LinearRegression with ElasticNet #5767
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
|
Test build #31249 has finished for PR 5767 at commit
|
|
Test build #31253 has finished for PR 5767 at commit
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
...the intercept is yMean...
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
thanks.
Signed-off-by: DB Tsai <[email protected]>
|
Test build #31292 has finished for PR 5767 at commit
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Is it really rare? If yStd is 0.0 and then the optimal model would be empty with intercept yMean. In this case, a warning would be proper. Having this giant if ... else block making the code hard to read.
if (yStd == 0.0) {
logWarning(...)
if (handlePersistence) ...
return new LinearRegressionModel(...)
}
// actual implementationThere was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
fair!
|
Test build #31316 has finished for PR 5767 at commit
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
minor: x.foreach { v =>
|
LGTM. Merged into master. Thanks! @dbtsai Please address the comment in a separate PR. |
|
Test build #31321 has finished for PR 5767 at commit
|
|
Test build #31329 has finished for PR 5767 at commit
|
…ssed the model, removed the correction terms in LinearRegression with ElasticNet Added detailed mathematical derivation of how scaling and LeastSquaresAggregator work. Refactored the code so the model is compressed based on the storage. We may try compression based on the prediction time. Also, I found that diffSum will be always zero mathematically, so no corrections are required. Author: DB Tsai <[email protected]> Closes apache#5767 from dbtsai/lir-doc and squashes the following commits: 5e346c9 [DB Tsai] refactoring fc9f582 [DB Tsai] doc 58456d8 [DB Tsai] address feedback 69757b8 [DB Tsai] actually diffSum is mathematically zero! No correction is needed. 5929e49 [DB Tsai] typo 63f7d1e [DB Tsai] Added compression to the model based on storage 203a295 [DB Tsai] Add more documentation to LinearRegression in new ML framework.
…ssed the model, removed the correction terms in LinearRegression with ElasticNet Added detailed mathematical derivation of how scaling and LeastSquaresAggregator work. Refactored the code so the model is compressed based on the storage. We may try compression based on the prediction time. Also, I found that diffSum will be always zero mathematically, so no corrections are required. Author: DB Tsai <[email protected]> Closes apache#5767 from dbtsai/lir-doc and squashes the following commits: 5e346c9 [DB Tsai] refactoring fc9f582 [DB Tsai] doc 58456d8 [DB Tsai] address feedback 69757b8 [DB Tsai] actually diffSum is mathematically zero! No correction is needed. 5929e49 [DB Tsai] typo 63f7d1e [DB Tsai] Added compression to the model based on storage 203a295 [DB Tsai] Add more documentation to LinearRegression in new ML framework.
…ssed the model, removed the correction terms in LinearRegression with ElasticNet Added detailed mathematical derivation of how scaling and LeastSquaresAggregator work. Refactored the code so the model is compressed based on the storage. We may try compression based on the prediction time. Also, I found that diffSum will be always zero mathematically, so no corrections are required. Author: DB Tsai <[email protected]> Closes apache#5767 from dbtsai/lir-doc and squashes the following commits: 5e346c9 [DB Tsai] refactoring fc9f582 [DB Tsai] doc 58456d8 [DB Tsai] address feedback 69757b8 [DB Tsai] actually diffSum is mathematically zero! No correction is needed. 5929e49 [DB Tsai] typo 63f7d1e [DB Tsai] Added compression to the model based on storage 203a295 [DB Tsai] Add more documentation to LinearRegression in new ML framework.
Added detailed mathematical derivation of how scaling and LeastSquaresAggregator work. Refactored the code so the model is compressed based on the storage. We may try compression based on the prediction time.
Also, I found that diffSum will be always zero mathematically, so no corrections are required.