[SPARK-12810][PySpark] PySpark CrossValidatorModel should support avgMetrics #12464

vectorijk · 2016-04-18T05:13:07Z

What changes were proposed in this pull request?

support avgMetrics in CrossValidatorModel with Python

How was this patch tested?

Doctest and test_save_load in pyspark/ml/test.py
JIRA

vectorijk · 2016-04-18T05:14:01Z

cc @feynmanliang @jkbradley @mengxr

vectorijk · 2016-04-18T05:36:20Z

Jenkins, test this please.

SparkQA · 2016-04-18T17:47:50Z

Test build #56069 has finished for PR 12464 at commit 93a43bc.

This patch passes all tests.
This patch merges cleanly.
This patch adds no public classes.

jkbradley · 2016-04-18T19:48:38Z

I'll take a look

jkbradley · 2016-04-18T20:45:23Z

python/pyspark/ml/tuning.py

            extra = dict()
-        return CrossValidatorModel(self.bestModel.copy(extra))
+        bestModel = self.bestModel.copy(extra)
+        avgMetrics = [am.copy(extra) for am in self.avgMetrics]


You're having to convert avgMetrics to a list since it's stored as a numpy.ndarray in CrossValidator. Could you update CrossValidator itself so that it just uses a list of floats, rather than a numpy array?

Sure, I will do this.

jkbradley · 2016-04-18T20:45:44Z

That's all for now, thanks!

holdenk · 2016-04-21T00:00:02Z

python/pyspark/ml/tests.py

        cvModel.save(cvModelPath)
        loadedModel = CrossValidatorModel.load(cvModelPath)
        self.assertEqual(loadedModel.bestModel.uid, cvModel.bestModel.uid)
+        for index in range(len(loadedModel.avgMetrics)):


Minor possible suggestion, there are some other places in the doctests where we use numpys assert_almost_equal, it seems like that might simplify things here a bit if you wanted to.

@holdenk Thanks for suggestion. I used assert_almost_equal here.

- update metrics to list of floats - use `numpy.testing.assert_almost_equal` to assert float list - test CrossValidator and CrossValidatorModel copy

vectorijk · 2016-04-27T12:24:08Z

@jkbradley 25959e5 this commit is trying to

update metrics in CrossValidator to float list (like [0.0] * number) .
use numpy.testing.assert_almost_equal to assert two float lists.
test CrossValidator and CrossValidatorModel with copy

Do you mind review again?

SparkQA · 2016-04-27T12:26:28Z

Test build #57121 has finished for PR 12464 at commit 25959e5.

This patch passes all tests.
This patch merges cleanly.
This patch adds no public classes.

vectorijk · 2016-04-27T12:31:07Z

I also notice that validationMetrics in TrainValidationSplitModel should also be supported in Python.
Should we support that after this PR?

jkbradley · 2016-04-27T20:34:12Z

python/pyspark/ml/tests.py

+
+        cvModel = cv.fit(dataset)
+        cvModelCopied = cvModel.copy()
+        assert_almost_equal(cvModel.avgMetrics, cvModelCopied.avgMetrics)


On second thought, I'd use self.assertEqual for this and the avgMetrics comparison below. We should know if copying or saving/loading causes loss of precision.

jkbradley · 2016-04-27T20:38:53Z

Thanks for the updates. I just had 2 small comments.

I also notice that validationMetrics in TrainValidationSplitModel should also be supported in Python.
Should we support that after this PR?

Yes please, can you create a JIRA?

SparkQA · 2016-04-28T11:43:15Z

Test build #57248 has finished for PR 12464 at commit cfe6a66.

This patch fails Python style tests.
This patch merges cleanly.
This patch adds no public classes.

SparkQA · 2016-04-28T11:56:33Z

Test build #57249 has finished for PR 12464 at commit 51b412f.

This patch passes all tests.
This patch merges cleanly.
This patch adds no public classes.

jkbradley · 2016-04-28T18:17:07Z

python/pyspark/ml/tests.py

+        cvModel = cv.fit(dataset)
+        cvModelCopied = cvModel.copy()
+        for index in range(len(cvModel.avgMetrics)):
+            self.assertTrue(abs(cvModel.avgMetrics[index] - cvModelCopied.avgMetrics[index])


Did you try assertEqual and find it did not work? Why do we need approximate equality here?

I have tried assertEqual before. This test case causes loss of precision under python2 if we use assertEqual. But under python3, it passes.

Interesting. OK thanks for checking!

jkbradley · 2016-04-28T21:18:34Z

LGTM
Merging with master
Thanks!

supporting avgMetrics in CrossValidatorModel with Python

93a43bc

jkbradley reviewed Apr 18, 2016
View reviewed changes

holdenk reviewed Apr 21, 2016
View reviewed changes

address comment

25959e5

- update metrics to list of floats - use `numpy.testing.assert_almost_equal` to assert float list - test CrossValidator and CrossValidatorModel copy

jkbradley reviewed Apr 27, 2016
View reviewed changes

address comment

51b412f

vectorijk force-pushed the spark-12810 branch from cfe6a66 to 51b412f Compare April 28, 2016 11:46

jkbradley reviewed Apr 28, 2016
View reviewed changes

asfgit closed this in d584a2b Apr 28, 2016

yhuai mentioned this pull request Aug 11, 2016

[SPARK-16831] [Python] Fixed bug in CrossValidator.avgMetrics #14456

Closed

[SPARK-12810][PySpark] PySpark CrossValidatorModel should support avgMetrics #12464

[SPARK-12810][PySpark] PySpark CrossValidatorModel should support avgMetrics #12464

Uh oh!

Conversation

vectorijk commented Apr 18, 2016

What changes were proposed in this pull request?

How was this patch tested?

Uh oh!

vectorijk commented Apr 18, 2016

Uh oh!

vectorijk commented Apr 18, 2016

Uh oh!

SparkQA commented Apr 18, 2016

Uh oh!

jkbradley commented Apr 18, 2016

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

jkbradley commented Apr 18, 2016

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

vectorijk commented Apr 27, 2016

Uh oh!

SparkQA commented Apr 27, 2016

Uh oh!

vectorijk commented Apr 27, 2016

Uh oh!

Choose a reason for hiding this comment

Uh oh!

jkbradley commented Apr 27, 2016

Uh oh!

SparkQA commented Apr 28, 2016

Uh oh!

SparkQA commented Apr 28, 2016

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

jkbradley commented Apr 28, 2016

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

4 participants