-
Notifications
You must be signed in to change notification settings - Fork 617
CLN: Refactor f_scores and f_test #502
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
cc: @SSaishruthi |
Hi @Squadrick |
It's extended to include a case when |
Sounds good. Reason I asked is I had the same Thanks for the modifications. |
Hey, sorry about that, my bad. Should've checked before opening this PR. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Looks very good thanks @Squadrick. Haven't been able to do a full review yet -- but was wondering if @SSaishruthi you wouldn't mind taking a pass at a review seeing as this is familiar to you?
Also ping @PhilipMay for review |
@seanpmorgan Sure will do |
In case of "single-label categorial classification" where one sample belongs to exactly one class of many possible classes this looks good in my downstream task. Will test binary classification next. Somehow I have the feeling that this might not work correctly. |
Binary classification is working (but a little bit ugly): import tensorflow as tf
import numpy as np
import f_scores
from sklearn.metrics import f1_score
actuals = np.array([[0], [1], [1], [1]])
preds = np.array([[0.2], [0.3], [0.7], [0.9]])
f1 = f_scores.F1Score(num_classes=1,
average='micro', # the value here does not matter in binary case
threshold=0.5)
f1.update_state(actuals, preds)
f1_result = f1.result().numpy()
print('F1 from metric:', f1_result)
ytrue = actuals
ypred = np.rint(preds)
f1_result = f1_score(ytrue, ypred, average='binary', pos_label=1)
print("F1 from sklearn:", f1_result) Has this output (which is good):
What is ugly is that Also I see a problem with |
def __init__(self, num_classes, average, name='f1_score', | ||
def __init__(self, | ||
num_classes, | ||
average, |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
average
has no default value here. So it is inconsistent with FBetaScore which has average=None
as default.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Ok now.
@Squadrick May be use tf.shape? |
A small test for ``F1Score` would be good. That is still missing. IMO a test for binary case should be added. Maybe just use this here: #502 (comment) |
@PhilipMay I test for The current implementation of keeping track of the weights, and doing the final calculation in |
I've added a very simple |
@Squadrick I will start with sample weight addition after this PR gets merged |
Yes. Thanks. That's what I mean. A small "smoke test". |
LGTM - the bug seems to be fixed now. A different thought: On tf.keras.metrics there are already many implemented basic metrics. Confusion matrix, precision, recall and so on. Wouldn't it be a good idear to build on them? |
Both works in the same way. We change calculation according to the type. This seems to the better way after investigation. |
I don't quite understand. What are the alternatives that work the same way? |
Like I mentioned above, I think a better approach would be to use |
Hi, thank everyone:-) I haven't did a full review yet
+1, I'm wondering if we can refer to AUC metric.
I prefer to keep dependency minimal, and F1Score is not so complex that we cannot calculate it easily. Maybe we can learn test cases in #466
If I'm not wrong, tf.keras use one-hot encoding for labels What do you think, Philip, Saishruthi, Dheeraj? Thank all for your contribution |
y_pred = y_pred > self.threshold | ||
|
||
y_true = tf.cast(y_true, tf.int32) | ||
y_pred = tf.cast(y_pred, tf.int32) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This line is redundant to line number 124 where the same is executed.
I'd prefer using
I'm open to hard-coding the results if you think that's the better approach. What do you all think? @WindQAQ |
3f9b8f4
to
5f8a135
Compare
@Squadrick for me both ways are good. Both have pro and cons. |
I'm afraid that sklearn is too heavy, what do you think @seanpmorgan @WindQAQ ? |
@facaiy "too heavy" sounds very abstract for me. Can you explain what you think is the concrete disadvantage? Download needs too much time, installing docker image for testing needs too much time? What is it that makes you think "too heavy"? |
@PhilipMay Hi, Philip, it's easy to add a new dependency while difficult (sometimes impossible) to remove one, that's why I suggest to act conservatively. Moreover, sklearn is a quite complicated python wheel which has many dependencies on its own. |
Would it be a solution to split into test and install dependencies? See here: https://stackoverflow.com/questions/15422527/best-practices-how-do-you-list-required-dependencies-in-your-setup-py |
Sorry for the delay, Philip. I'm referring to test dependencies when I use 'dependency' above. Anyway, I'm not against the sklearn proposal if you insist :-) What do you think, Dheeraj @Squadrick ? |
Agree +1. As a plugin/addons package, it would be great if we could make the wheel lightweight. So in this case, if we could do unittests even without |
Ok. So let’s do this without Sklearn. For me finishing this PR has priority anyway. |
Agree we've had a similar discussion before... both options have pros and cons, though we have precedent throughout the repo of using pre-calucated values. |
* Add `threshold` param to f-scores * Tests now compare with sklearn * Add sklearn to requirements
* Register FBetaScore and F1Score as Keras custom objects * Update readme to separate both metrics
Resort to using hard coded test cases rather than comparing with sklearn
5f8a135
to
fb1883a
Compare
Sorry about the delay, hardcoded the tests. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM thanks for the refactor!
@Squadrick thanks for finalizing this. :-) |
threshold
param to f-scoresFixes #490