Skip to content

Conversation

@yanboliang
Copy link
Contributor

@yanboliang yanboliang commented May 16, 2016

What changes were proposed in this pull request?

  • GeneralizedLinearRegression API docs enhancement.
  • The default value of GeneralizedLinearRegression linkPredictionCol is not set rather than empty. This will consistent with other similar params such as weightCol
  • Make some methods more private.
  • Fix a minor bug of LinearRegression.
  • Fix some other issues.

How was this patch tested?

Existing tests.

@SparkQA
Copy link

SparkQA commented May 16, 2016

Test build #58629 has finished for PR 13129 at commit 254313c.

  • This patch passes all tests.
  • This patch merges cleanly.
  • This patch adds no public classes.

Array(0D),
Array(0D))
return copyValues(model.setSummary(trainingSummary))
return model.setSummary(trainingSummary)
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is a minor bug of LinearRegression, we should first copy values from parent estimator and then call findSummaryModelAndPredictionCol, otherwise we will always get empty predictionCol(and other params) for the LinearRegressionModel.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

So test cases didn't pick this up? We should look into why and amend the tests accordingly.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes, this is due to we don't have excellent test coverage ...
I will add test case after collect comments.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@MLnick I added test case for this scenario and updated other test cases to ensure coping prediction column(and other params) correct in all situations.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We also need to setParent

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@jkbradley It does not necessary to setParent at here, because we have done it at Predictor.fit

override def fit(dataset: Dataset[_]): M = {
    // This handles a few items such as schema validation.
    // Developers only need to implement train().
    transformSchema(dataset.schema, logging = true)
    copyValues(train(dataset).setParent(this))
  }

@SparkQA
Copy link

SparkQA commented May 17, 2016

Test build #58673 has finished for PR 13129 at commit 645f6c4.

  • This patch passes all tests.
  • This patch merges cleanly.
  • This patch adds no public classes.

/** Checks whether the input has quantiles column name. */
protected[regression] def hasQuantilesCol: Boolean = {
isDefined(quantilesCol) && $(quantilesCol) != ""
protected def hasQuantilesCol: Boolean = {
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is probably meant to be private[regression]

@SparkQA
Copy link

SparkQA commented May 18, 2016

Test build #58742 has finished for PR 13129 at commit 374e610.

  • This patch passes all tests.
  • This patch merges cleanly.
  • This patch adds no public classes.

@SparkQA
Copy link

SparkQA commented May 18, 2016

Test build #58744 has finished for PR 13129 at commit d38b1eb.

  • This patch passes all tests.
  • This patch merges cleanly.
  • This patch adds no public classes.

}
val newSchema = super.validateAndTransformSchema(schema, fitting, featuresDataType)
if ($(linkPredictionCol).nonEmpty) {
if (isDefined(linkPredictionCol) && $(linkPredictionCol).nonEmpty) {
Copy link
Contributor

@MLnick MLnick May 18, 2016

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is used twice, perhaps makes sense to make a def hasLinkPredictionCol as for e.g. hasQuantilesCol?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Good point, updated. Thanks.

@MLnick
Copy link
Contributor

MLnick commented May 18, 2016

LGTM

@SparkQA
Copy link

SparkQA commented May 18, 2016

Test build #58777 has finished for PR 13129 at commit 1fbd1dc.

  • This patch passes all tests.
  • This patch merges cleanly.
  • This patch adds no public classes.

@mengxr
Copy link
Contributor

mengxr commented May 20, 2016

Merged into master and branch-2.0. Thanks!

asfgit pushed a commit that referenced this pull request May 20, 2016
## What changes were proposed in this pull request?
* ```GeneralizedLinearRegression``` API docs enhancement.
* The default value of ```GeneralizedLinearRegression``` ```linkPredictionCol``` is not set rather than empty. This will consistent with other similar params such as ```weightCol```
* Make some methods more private.
* Fix a minor bug of LinearRegression.
* Fix some other issues.

## How was this patch tested?
Existing tests.

Author: Yanbo Liang <[email protected]>

Closes #13129 from yanboliang/spark-15339.

(cherry picked from commit c94b34e)
Signed-off-by: Xiangrui Meng <[email protected]>
@asfgit asfgit closed this in c94b34e May 20, 2016
@yanboliang yanboliang deleted the spark-15339 branch May 20, 2016 07:14
asfgit pushed a commit that referenced this pull request May 20, 2016
…nkPredictionCol for GeneralizedLinearRegression

## What changes were proposed in this pull request?

Default value mismatch of param linkPredictionCol for GeneralizedLinearRegression between PySpark and Scala. That is because default value conflict between #13106 and #13129. This causes ml.tests failed.

## How was this patch tested?
Existing tests.

Author: Liang-Chi Hsieh <[email protected]>

Closes #13220 from viirya/hotfix-regresstion.

(cherry picked from commit 4e73933)
Signed-off-by: Nick Pentreath <[email protected]>
asfgit pushed a commit that referenced this pull request May 20, 2016
…nkPredictionCol for GeneralizedLinearRegression

## What changes were proposed in this pull request?

Default value mismatch of param linkPredictionCol for GeneralizedLinearRegression between PySpark and Scala. That is because default value conflict between #13106 and #13129. This causes ml.tests failed.

## How was this patch tested?
Existing tests.

Author: Liang-Chi Hsieh <[email protected]>

Closes #13220 from viirya/hotfix-regresstion.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

5 participants