fix distributed training error and nan result bugs #721

n3011 · 2019-11-27T14:46:47Z

It fixes the following two bugs related to the CohenKappa metric:

dtype issue when used within distributed strategy scope CohenKappa reset_states distributed training setup errors. #720
NaN result issue after state reset.

WindQAQ

Thanks for the PR!

WindQAQ · 2019-12-01T05:00:02Z

tensorflow_addons/metrics/cohens_kappa.py

-        return kp
+        return tf.cond(tf.math.is_nan(denominator),
+                       true_fn=lambda: 0.0,
+                       false_fn=lambda: 1 - (numerator / denominator))


Though I think the fix is correct, but a better approach should be change the division of these two lines into tf.math.divide_no_nan. And modify kp to kp = tf.math.divide_no_nan(denominator - numerator, denominator). How do you think about this @n3011?

@WindQAQ there are two problems with tf.math.divide_no_nan approach:

it will produce nan result only, as nan/nan not defined.

it slightly slower than tf.cond based aproach.

Got it. After a rough test, your approach is actually faster :-) Thanks for the report.

WindQAQ

Would you mind running make code-format to format codes? Or see the following log to manually format codes. I'd like to merge this after tests pass. Thanks again for the fix!

https://source.cloud.google.com/results/invocations/ff1ff9f3-81b8-4671-b13c-9d40519190f1/log

n3011 · 2019-12-01T09:02:58Z

@WindQAQ thanks, pushed the formatted changes.

WindQAQ

Thank you again for the contribution!

fix distributed training error and nan result bugs

6063449

n3011 requested a review from Squadrick as a code owner November 27, 2019 14:46

googlebot added the cla: yes label Nov 27, 2019

Squadrick added kokoro:force-run metrics labels Nov 27, 2019

kokoro-team removed the kokoro:force-run label Nov 27, 2019

WindQAQ reviewed Dec 1, 2019

View reviewed changes

WindQAQ self-requested a review December 1, 2019 08:45

WindQAQ reviewed Dec 1, 2019

View reviewed changes

reformat py file

552e10e

WindQAQ added the kokoro:force-run label Dec 1, 2019

kokoro-team removed the kokoro:force-run label Dec 1, 2019

WindQAQ self-requested a review December 1, 2019 19:12

WindQAQ approved these changes Dec 1, 2019

View reviewed changes

WindQAQ merged commit 9b7fdc7 into tensorflow:master Dec 1, 2019

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

fix distributed training error and nan result bugs #721

fix distributed training error and nan result bugs #721

Uh oh!

n3011 commented Nov 27, 2019

Uh oh!

WindQAQ left a comment

Uh oh!

WindQAQ Dec 1, 2019

Uh oh!

n3011 Dec 1, 2019

Uh oh!

WindQAQ Dec 1, 2019

Uh oh!

WindQAQ left a comment

Uh oh!

n3011 commented Dec 1, 2019

Uh oh!

WindQAQ left a comment

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

5 participants

fix distributed training error and nan result bugs #721

fix distributed training error and nan result bugs #721

Uh oh!

Conversation

n3011 commented Nov 27, 2019

Uh oh!

WindQAQ left a comment

Choose a reason for hiding this comment

Uh oh!

WindQAQ Dec 1, 2019

Choose a reason for hiding this comment

Uh oh!

n3011 Dec 1, 2019

Choose a reason for hiding this comment

Uh oh!

WindQAQ Dec 1, 2019

Choose a reason for hiding this comment

Uh oh!

WindQAQ left a comment

Choose a reason for hiding this comment

Uh oh!

n3011 commented Dec 1, 2019

Uh oh!

WindQAQ left a comment

Choose a reason for hiding this comment

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

5 participants