Skip to content

Conversation

@vectorijk
Copy link
Contributor

What changes were proposed in this pull request?

support avgMetrics in CrossValidatorModel with Python

How was this patch tested?

Doctest and test_save_load in pyspark/ml/test.py
JIRA

@vectorijk
Copy link
Contributor Author

cc @feynmanliang @jkbradley @mengxr

@vectorijk
Copy link
Contributor Author

Jenkins, test this please.

@SparkQA
Copy link

SparkQA commented Apr 18, 2016

Test build #56069 has finished for PR 12464 at commit 93a43bc.

  • This patch passes all tests.
  • This patch merges cleanly.
  • This patch adds no public classes.

@jkbradley
Copy link
Member

I'll take a look

extra = dict()
return CrossValidatorModel(self.bestModel.copy(extra))
bestModel = self.bestModel.copy(extra)
avgMetrics = [am.copy(extra) for am in self.avgMetrics]
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

You're having to convert avgMetrics to a list since it's stored as a numpy.ndarray in CrossValidator. Could you update CrossValidator itself so that it just uses a list of floats, rather than a numpy array?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Sure, I will do this.

@jkbradley
Copy link
Member

That's all for now, thanks!

cvModel.save(cvModelPath)
loadedModel = CrossValidatorModel.load(cvModelPath)
self.assertEqual(loadedModel.bestModel.uid, cvModel.bestModel.uid)
for index in range(len(loadedModel.avgMetrics)):
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Minor possible suggestion, there are some other places in the doctests where we use numpys assert_almost_equal, it seems like that might simplify things here a bit if you wanted to.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@holdenk Thanks for suggestion. I used assert_almost_equal here.

- update metrics to list of floats
- use `numpy.testing.assert_almost_equal` to assert float list
- test CrossValidator and CrossValidatorModel copy
@vectorijk
Copy link
Contributor Author

@jkbradley 25959e5 this commit is trying to

  • update metrics in CrossValidator to float list (like [0.0] * number) .
  • use numpy.testing.assert_almost_equal to assert two float lists.
  • test CrossValidator and CrossValidatorModel with copy

Do you mind review again?

@SparkQA
Copy link

SparkQA commented Apr 27, 2016

Test build #57121 has finished for PR 12464 at commit 25959e5.

  • This patch passes all tests.
  • This patch merges cleanly.
  • This patch adds no public classes.

@vectorijk
Copy link
Contributor Author

I also notice that validationMetrics in TrainValidationSplitModel should also be supported in Python.
Should we support that after this PR?


cvModel = cv.fit(dataset)
cvModelCopied = cvModel.copy()
assert_almost_equal(cvModel.avgMetrics, cvModelCopied.avgMetrics)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

On second thought, I'd use self.assertEqual for this and the avgMetrics comparison below. We should know if copying or saving/loading causes loss of precision.

@jkbradley
Copy link
Member

Thanks for the updates. I just had 2 small comments.

I also notice that validationMetrics in TrainValidationSplitModel should also be supported in Python.
Should we support that after this PR?

Yes please, can you create a JIRA?

@SparkQA
Copy link

SparkQA commented Apr 28, 2016

Test build #57248 has finished for PR 12464 at commit cfe6a66.

  • This patch fails Python style tests.
  • This patch merges cleanly.
  • This patch adds no public classes.

@SparkQA
Copy link

SparkQA commented Apr 28, 2016

Test build #57249 has finished for PR 12464 at commit 51b412f.

  • This patch passes all tests.
  • This patch merges cleanly.
  • This patch adds no public classes.

cvModel = cv.fit(dataset)
cvModelCopied = cvModel.copy()
for index in range(len(cvModel.avgMetrics)):
self.assertTrue(abs(cvModel.avgMetrics[index] - cvModelCopied.avgMetrics[index])
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Did you try assertEqual and find it did not work? Why do we need approximate equality here?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I have tried assertEqual before. This test case causes loss of precision under python2 if we use assertEqual. But under python3, it passes.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Interesting. OK thanks for checking!

@jkbradley
Copy link
Member

LGTM
Merging with master
Thanks!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

4 participants