-
Notifications
You must be signed in to change notification settings - Fork 28.9k
[SPARK-12810][PySpark] PySpark CrossValidatorModel should support avgMetrics #12464
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
|
Jenkins, test this please. |
|
Test build #56069 has finished for PR 12464 at commit
|
|
I'll take a look |
python/pyspark/ml/tuning.py
Outdated
| extra = dict() | ||
| return CrossValidatorModel(self.bestModel.copy(extra)) | ||
| bestModel = self.bestModel.copy(extra) | ||
| avgMetrics = [am.copy(extra) for am in self.avgMetrics] |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
You're having to convert avgMetrics to a list since it's stored as a numpy.ndarray in CrossValidator. Could you update CrossValidator itself so that it just uses a list of floats, rather than a numpy array?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Sure, I will do this.
|
That's all for now, thanks! |
| cvModel.save(cvModelPath) | ||
| loadedModel = CrossValidatorModel.load(cvModelPath) | ||
| self.assertEqual(loadedModel.bestModel.uid, cvModel.bestModel.uid) | ||
| for index in range(len(loadedModel.avgMetrics)): |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Minor possible suggestion, there are some other places in the doctests where we use numpys assert_almost_equal, it seems like that might simplify things here a bit if you wanted to.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
- update metrics to list of floats - use `numpy.testing.assert_almost_equal` to assert float list - test CrossValidator and CrossValidatorModel copy
|
@jkbradley 25959e5 this commit is trying to
Do you mind review again? |
|
Test build #57121 has finished for PR 12464 at commit
|
|
I also notice that |
python/pyspark/ml/tests.py
Outdated
|
|
||
| cvModel = cv.fit(dataset) | ||
| cvModelCopied = cvModel.copy() | ||
| assert_almost_equal(cvModel.avgMetrics, cvModelCopied.avgMetrics) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
On second thought, I'd use self.assertEqual for this and the avgMetrics comparison below. We should know if copying or saving/loading causes loss of precision.
|
Thanks for the updates. I just had 2 small comments.
Yes please, can you create a JIRA? |
|
Test build #57248 has finished for PR 12464 at commit
|
|
Test build #57249 has finished for PR 12464 at commit
|
| cvModel = cv.fit(dataset) | ||
| cvModelCopied = cvModel.copy() | ||
| for index in range(len(cvModel.avgMetrics)): | ||
| self.assertTrue(abs(cvModel.avgMetrics[index] - cvModelCopied.avgMetrics[index]) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Did you try assertEqual and find it did not work? Why do we need approximate equality here?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I have tried assertEqual before. This test case causes loss of precision under python2 if we use assertEqual. But under python3, it passes.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Interesting. OK thanks for checking!
|
LGTM |
What changes were proposed in this pull request?
support avgMetrics in CrossValidatorModel with Python
How was this patch tested?
Doctest and
test_save_loadinpyspark/ml/test.pyJIRA