-
Notifications
You must be signed in to change notification settings - Fork 28.9k
[SPARK-15339] [ML] ML 2.0 QA: Scala APIs and code audit for regression #13129
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
|
Test build #58629 has finished for PR 13129 at commit
|
| Array(0D), | ||
| Array(0D)) | ||
| return copyValues(model.setSummary(trainingSummary)) | ||
| return model.setSummary(trainingSummary) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This is a minor bug of LinearRegression, we should first copy values from parent estimator and then call findSummaryModelAndPredictionCol, otherwise we will always get empty predictionCol(and other params) for the LinearRegressionModel.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
So test cases didn't pick this up? We should look into why and amend the tests accordingly.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yes, this is due to we don't have excellent test coverage ...
I will add test case after collect comments.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@MLnick I added test case for this scenario and updated other test cases to ensure coping prediction column(and other params) correct in all situations.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We also need to setParent
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@jkbradley It does not necessary to setParent at here, because we have done it at Predictor.fit
override def fit(dataset: Dataset[_]): M = {
// This handles a few items such as schema validation.
// Developers only need to implement train().
transformSchema(dataset.schema, logging = true)
copyValues(train(dataset).setParent(this))
}
|
Test build #58673 has finished for PR 13129 at commit
|
| /** Checks whether the input has quantiles column name. */ | ||
| protected[regression] def hasQuantilesCol: Boolean = { | ||
| isDefined(quantilesCol) && $(quantilesCol) != "" | ||
| protected def hasQuantilesCol: Boolean = { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This is probably meant to be private[regression]
|
Test build #58742 has finished for PR 13129 at commit
|
|
Test build #58744 has finished for PR 13129 at commit
|
| } | ||
| val newSchema = super.validateAndTransformSchema(schema, fitting, featuresDataType) | ||
| if ($(linkPredictionCol).nonEmpty) { | ||
| if (isDefined(linkPredictionCol) && $(linkPredictionCol).nonEmpty) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This is used twice, perhaps makes sense to make a def hasLinkPredictionCol as for e.g. hasQuantilesCol?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Good point, updated. Thanks.
|
LGTM |
|
Test build #58777 has finished for PR 13129 at commit
|
|
Merged into master and branch-2.0. Thanks! |
## What changes were proposed in this pull request? * ```GeneralizedLinearRegression``` API docs enhancement. * The default value of ```GeneralizedLinearRegression``` ```linkPredictionCol``` is not set rather than empty. This will consistent with other similar params such as ```weightCol``` * Make some methods more private. * Fix a minor bug of LinearRegression. * Fix some other issues. ## How was this patch tested? Existing tests. Author: Yanbo Liang <[email protected]> Closes #13129 from yanboliang/spark-15339. (cherry picked from commit c94b34e) Signed-off-by: Xiangrui Meng <[email protected]>
…nkPredictionCol for GeneralizedLinearRegression ## What changes were proposed in this pull request? Default value mismatch of param linkPredictionCol for GeneralizedLinearRegression between PySpark and Scala. That is because default value conflict between #13106 and #13129. This causes ml.tests failed. ## How was this patch tested? Existing tests. Author: Liang-Chi Hsieh <[email protected]> Closes #13220 from viirya/hotfix-regresstion. (cherry picked from commit 4e73933) Signed-off-by: Nick Pentreath <[email protected]>
…nkPredictionCol for GeneralizedLinearRegression ## What changes were proposed in this pull request? Default value mismatch of param linkPredictionCol for GeneralizedLinearRegression between PySpark and Scala. That is because default value conflict between #13106 and #13129. This causes ml.tests failed. ## How was this patch tested? Existing tests. Author: Liang-Chi Hsieh <[email protected]> Closes #13220 from viirya/hotfix-regresstion.
What changes were proposed in this pull request?
GeneralizedLinearRegressionAPI docs enhancement.GeneralizedLinearRegressionlinkPredictionColis not set rather than empty. This will consistent with other similar params such asweightColHow was this patch tested?
Existing tests.