-
Notifications
You must be signed in to change notification settings - Fork 617
fix distributed training error and nan result bugs #721
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks for the PR!
return kp | ||
return tf.cond(tf.math.is_nan(denominator), | ||
true_fn=lambda: 0.0, | ||
false_fn=lambda: 1 - (numerator / denominator)) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Though I think the fix is correct, but a better approach should be change the division of these two lines into tf.math.divide_no_nan
. And modify kp
to kp = tf.math.divide_no_nan(denominator - numerator, denominator)
. How do you think about this @n3011?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@WindQAQ there are two problems with tf.math.divide_no_nan
approach:
- it will produce
nan
result only, asnan/nan
not defined. - it slightly slower than
tf.cond
based aproach.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Got it. After a rough test, your approach is actually faster :-) Thanks for the report.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Would you mind running make code-format
to format codes? Or see the following log to manually format codes. I'd like to merge this after tests pass. Thanks again for the fix!
https://source.cloud.google.com/results/invocations/ff1ff9f3-81b8-4671-b13c-9d40519190f1/log
@WindQAQ thanks, pushed the formatted changes. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thank you again for the contribution!
It fixes the following two bugs related to the CohenKappa metric: