-
Notifications
You must be signed in to change notification settings - Fork 28.9k
[SPARK-1969][MLlib] Online summarizer APIs for mean, variance, min, and max #955
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
|
Merged build triggered. |
|
Merged build started. |
|
MultivariateStatisticalSummary is a public API -- we can't rename it arbitrarily. Why does it need to be renamed? |
|
Since the "Statistical" in MultivariateStatisticalSummary is already in the package name as "stat", I think it worths to have a concise name. Also, most people spell the abbreviation of statistics as "stats", so I changed it from "stat" to "stats". Since it's already a public API, I've no problem to change it back. |
|
Merged build triggered. |
|
Merged build started. |
|
Merged build finished. |
|
Refer to this link for build results: https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/15407/ |
|
Merged build triggered. |
|
Merged build started. |
|
Merged build finished. |
|
Refer to this link for build results: https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/15405/ |
|
Don't know why jenkins is not happy with removing "private class ColumnStatisticsAggregator(private val n: Int)". After all, it's a private class. |
|
Merged build finished. |
|
Refer to this link for build results: https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/15408/ |
|
Maybe this is a MIMA problem. Found this (from @pwendell): https://groups.google.com/forum/#!topic/migration-manager-user/5aQ0xxsL2lU |
|
@mengxr Get you. It's false-positive error. Do you have any comment or feedback moving it out as public api? I'm building a feature scaling api in MlUtils which depends on this. Thanks. |
|
@dbtsai The current workaround is excluding it in |
|
Merged build triggered. |
|
Merged build started. |
|
k... better to have Mima exclude the private class automatically, or we can have annotation for the private class. |
|
Merged build finished. |
|
Refer to this link for build results: https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/15492/ |
|
Build triggered. |
|
Build started. |
|
Build finished. All automated tests passed. |
|
All automated tests passed. |
|
@dbtsai About the package name, |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
merge or aggregate may be better than overloading add here.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
streaming has special meaning in spark. Change it to online?
|
QA tests have started for PR 955. This patch merges cleanly. |
|
QA tests have started for PR 955. This patch merges cleanly. |
|
QA tests have started for PR 955. This patch merges cleanly. |
|
QA results for PR 955: |
|
QA results for PR 955: |
|
QA results for PR 955: |
|
QA tests have started for PR 955. This patch merges cleanly. |
|
QA results for PR 955: |
|
Merged. Thanks! |
…nd max It basically moved the private ColumnStatisticsAggregator class from RowMatrix to public available DeveloperApi with documentation and unitests. Changes: 1) Moved the private implementation from org.apache.spark.mllib.linalg.ColumnStatisticsAggregator to org.apache.spark.mllib.stat.MultivariateOnlineSummarizer 2) When creating OnlineSummarizer object, the number of columns is not needed in the constructor. It's determined when users add the first sample. 3) Added the APIs documentation for MultivariateOnlineSummarizer. 4) Added the unittests for MultivariateOnlineSummarizer. Author: DB Tsai <[email protected]> Closes apache#955 from dbtsai/dbtsai-summarizer and squashes the following commits: b13ac90 [DB Tsai] dbtsai-summarizer
It basically moved the private ColumnStatisticsAggregator class from RowMatrix to public available DeveloperApi with documentation and unitests.
Changes: