Skip to content

Conversation

@facaiy
Copy link
Contributor

@facaiy facaiy commented Jul 6, 2017

What changes were proposed in this pull request?

add setWeightCol method for OneVsRest.

weightCol is ignored if classifier doesn't inherit HasWeightCol trait.

How was this patch tested?

  • add an unit test.

@lins05
Copy link
Contributor

lins05 commented Jul 6, 2017

I guess we also need to update the python part: https://github.com/apache/spark/blob/v2.2.0-rc6/python/pyspark/ml/classification.py#L1563

@facaiy
Copy link
Contributor Author

facaiy commented Jul 6, 2017

@lins05 thanks, reasonable suggestion, I will fix it later.

@facaiy
Copy link
Contributor Author

facaiy commented Jul 7, 2017

I'm not familiar with R, and use git grep to search "OneVsRest" and get nothing. Hence it seems that nothing is needed to do with R part.

@MLnick
Copy link
Contributor

MLnick commented Jul 7, 2017

ok to test

@SparkQA
Copy link

SparkQA commented Jul 7, 2017

Test build #79325 has finished for PR 18554 at commit 25d681f.

  • This patch fails PySpark unit tests.
  • This patch merges cleanly.
  • This patch adds no public classes.

@SparkQA
Copy link

SparkQA commented Jul 7, 2017

Test build #79330 has finished for PR 18554 at commit e511b90.

  • This patch fails Python style tests.
  • This patch merges cleanly.
  • This patch adds no public classes.

@SparkQA
Copy link

SparkQA commented Jul 7, 2017

Test build #79331 has finished for PR 18554 at commit 1c215f3.

  • This patch passes all tests.
  • This patch merges cleanly.
  • This patch adds no public classes.

@facaiy
Copy link
Contributor Author

facaiy commented Jul 11, 2017

@srowen @yanboliang Could you help review the PR? Thanks.

case c: HasWeightCol if c.isDefined(c.weightCol) && c.getWeightCol.nonEmpty =>
dataset.select($(labelCol), $(featuresCol), c.getWeightCol)
case _ => dataset.select($(labelCol), $(featuresCol))
}
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

OneVsRest is a classification estimator, I think we should make weightCol a member param of it like featuresCol. For example:

val dataset = dataset       // This dataset has column: a, b, c.
val ova = new OneVsRest().setFeaturesCol("a").setClassifier(new LogisticRegression().setFeaturesCol("b"))

The features column used by OneVsRest is a. The features column set for OneVsRest will override corresponding set in OneVsRest.classifier. So we should follow this way for weightCol as well. Thanks.

Copy link
Contributor Author

@facaiy facaiy Jul 12, 2017

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hi, @yanboliang . As @MLnick said, not all classifiers inherits HasWeightCol, so it might cause confusion.

In my opinion, setWeightCol is an attribute owned by one specific classifier itself, like setProbabilityCol and setRawPredictionCol for Logistic Regreesion. So I'd suggest that user should configure the classifier itself, rather than OneVsRest.

Copy link
Contributor

@yanboliang yanboliang Jul 12, 2017

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@facaiy It doesn't matter. If the classifier doesn't inherit from HasWeightCol, we don't run setWeightCol for that classifier but to print out warning log to say weightCol doesn't take effect. You can refer these lines of code to learn how featuresCol be handled. We can do it in similar way. Thanks.

@MLnick
Copy link
Contributor

MLnick commented Jul 11, 2017 via email

@MLnick
Copy link
Contributor

MLnick commented Jul 11, 2017 via email

@facaiy facaiy changed the title [SPARK-21306][ML] OneVsRest should cache weightCol if necessary [SPARK-21306][ML] OneVsRest should support setWeightCol Jul 13, 2017
@SparkQA
Copy link

SparkQA commented Jul 13, 2017

Test build #79579 has finished for PR 18554 at commit a57f096.

  • This patch passes all tests.
  • This patch merges cleanly.
  • This patch adds no public classes.

@SparkQA
Copy link

SparkQA commented Jul 13, 2017

Test build #79581 has finished for PR 18554 at commit 54e0fca.

  • This patch passes all tests.
  • This patch merges cleanly.
  • This patch adds no public classes.

output = model.transform(df)
self.assertEqual(output.columns, ["label", "features", "prediction"])

def test_support_for_weightCol(self):
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Would it make sense to also test with a classifier that doesn't have a weight col?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Sure. Use DecisionTreeClassifier to test.

@SparkQA
Copy link

SparkQA commented Jul 19, 2017

Test build #79740 has finished for PR 18554 at commit 9ba0e2b.

  • This patch passes all tests.
  • This patch merges cleanly.
  • This patch adds no public classes.

@facaiy
Copy link
Contributor Author

facaiy commented Jul 26, 2017

ping @holdenk @yanboliang

Copy link
Contributor

@yanboliang yanboliang left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Left some minor comments, otherwise, LGTM.
cc @holdenk for double check. Thanks.

@keyword_only
def __init__(self, featuresCol="features", labelCol="label", predictionCol="prediction",
classifier=None):
weightCol=None, classifier=None):
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Usually we append new argument at the end, as users may use non-keyword argument, we should not break users' existing code. FYI: https://docs.python.org/3/tutorial/controlflow.html#keyword-arguments

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

moved.

lr = LogisticRegression(maxIter=5, regParam=0.01)
ovr = OneVsRest(classifier=lr, weightCol="weight")
self.assertIsNotNone(ovr.fit(df))
ovr2 = OneVsRest(classifier=lr).setWeightCol("weight")
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Minor: We can remove test of ovr2 and ovr4, setting param in different way will run the same code at backend.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

cleaned.

@SparkQA
Copy link

SparkQA commented Jul 26, 2017

Test build #79964 has finished for PR 18554 at commit 8c0beba.

  • This patch passes all tests.
  • This patch merges cleanly.
  • This patch adds no public classes.

@yanboliang
Copy link
Contributor

yanboliang commented Jul 28, 2017

Since this is a critical bug, I'll merge into master/branch-2.2/branch-2.1/branch-2.0, thanks for all.

@asfgit asfgit closed this in a5a3189 Jul 28, 2017
asfgit pushed a commit that referenced this pull request Jul 28, 2017
## What changes were proposed in this pull request?

add `setWeightCol` method for OneVsRest.

`weightCol` is ignored if classifier doesn't inherit HasWeightCol trait.

## How was this patch tested?

+ [x] add an unit test.

Author: Yan Facai (颜发才) <[email protected]>

Closes #18554 from facaiy/BUG/oneVsRest_missing_weightCol.

(cherry picked from commit a5a3189)
Signed-off-by: Yanbo Liang <[email protected]>
asfgit pushed a commit that referenced this pull request Jul 28, 2017
## What changes were proposed in this pull request?

add `setWeightCol` method for OneVsRest.

`weightCol` is ignored if classifier doesn't inherit HasWeightCol trait.

## How was this patch tested?

+ [x] add an unit test.

Author: Yan Facai (颜发才) <[email protected]>

Closes #18554 from facaiy/BUG/oneVsRest_missing_weightCol.

(cherry picked from commit a5a3189)
Signed-off-by: Yanbo Liang <[email protected]>
asfgit pushed a commit that referenced this pull request Jul 28, 2017
## What changes were proposed in this pull request?

add `setWeightCol` method for OneVsRest.

`weightCol` is ignored if classifier doesn't inherit HasWeightCol trait.

## How was this patch tested?

+ [x] add an unit test.

Author: Yan Facai (颜发才) <[email protected]>

Closes #18554 from facaiy/BUG/oneVsRest_missing_weightCol.

(cherry picked from commit a5a3189)
Signed-off-by: Yanbo Liang <[email protected]>
@facaiy facaiy deleted the BUG/oneVsRest_missing_weightCol branch July 28, 2017 22:37
facaiy added a commit to facaiy/spark that referenced this pull request Jul 28, 2017
## What changes were proposed in this pull request?

add `setWeightCol` method for OneVsRest.

`weightCol` is ignored if classifier doesn't inherit HasWeightCol trait.

## How was this patch tested?

+ [x] add an unit test.

Author: Yan Facai (颜发才) <[email protected]>

Closes apache#18554 from facaiy/BUG/oneVsRest_missing_weightCol.

(cherry picked from commit a5a3189)
Signed-off-by: Yanbo Liang <[email protected]>
facaiy added a commit to facaiy/spark that referenced this pull request Jul 28, 2017
## What changes were proposed in this pull request?

add `setWeightCol` method for OneVsRest.

`weightCol` is ignored if classifier doesn't inherit HasWeightCol trait.

## How was this patch tested?

+ [x] add an unit test.

Author: Yan Facai (颜发才) <[email protected]>

Closes apache#18554 from facaiy/BUG/oneVsRest_missing_weightCol.

(cherry picked from commit a5a3189)
Signed-off-by: Yanbo Liang <[email protected]>
asfgit pushed a commit that referenced this pull request Aug 8, 2017
The PR is related to #18554, and is modified for branch 2.1.

## What changes were proposed in this pull request?

add `setWeightCol` method for OneVsRest.

`weightCol` is ignored if classifier doesn't inherit HasWeightCol trait.

## How was this patch tested?

+ [x] add an unit test.

Author: Yan Facai (颜发才) <[email protected]>

Closes #18763 from facaiy/BUG/branch-2.1_OneVsRest_support_setWeightCol.
asfgit pushed a commit that referenced this pull request Aug 8, 2017
The PR is related to #18554, and is modified for branch 2.0.

## What changes were proposed in this pull request?

add `setWeightCol` method for OneVsRest.

`weightCol` is ignored if classifier doesn't inherit HasWeightCol trait.

## How was this patch tested?

+ [x] add an unit test.

Author: Yan Facai (颜发才) <[email protected]>

Closes #18764 from facaiy/BUG/branch-2.0_OneVsRest_support_setWeightCol.
MatthewRBruce pushed a commit to Shopify/spark that referenced this pull request Jul 31, 2018
## What changes were proposed in this pull request?

add `setWeightCol` method for OneVsRest.

`weightCol` is ignored if classifier doesn't inherit HasWeightCol trait.

## How was this patch tested?

+ [x] add an unit test.

Author: Yan Facai (颜发才) <[email protected]>

Closes apache#18554 from facaiy/BUG/oneVsRest_missing_weightCol.

(cherry picked from commit a5a3189)
Signed-off-by: Yanbo Liang <[email protected]>
jzhuge pushed a commit to jzhuge/spark that referenced this pull request Aug 20, 2018
The PR is related to apache#18554, and is modified for branch 2.1.

add `setWeightCol` method for OneVsRest.

`weightCol` is ignored if classifier doesn't inherit HasWeightCol trait.

+ [x] add an unit test.

Author: Yan Facai (颜发才) <[email protected]>

Closes apache#18763 from facaiy/BUG/branch-2.1_OneVsRest_support_setWeightCol.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

6 participants