Skip to content

Project ZeroTangent to natural tangent for some number types #574

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Open
wants to merge 2 commits into
base: main
Choose a base branch
from

Conversation

ToucheSir
Copy link
Contributor

Currently, structured zero tangents are allowed to pass through during projection for (I believe) all number types. This was found while trying to write a rrule that passes https://github.com/FluxML/Zygote.jl/blob/v0.6.43/test/features.jl#L528 (itself found while working on FluxML/Zygote.jl#1284). This PR tries to take a conservative first step towards coercing those zeroes back to natural tangents by focusing on a tangent type which is unambiguously zero (ZeroTangent) and a relatively easy-to-comprehend numeric subspace (the Reals).

@ToucheSir ToucheSir force-pushed the bc/project-num-zerotangent branch from 49bc8ac to 81736ce Compare August 10, 2022 00:19
@ToucheSir
Copy link
Contributor Author

The purpose of splitting this out into two commits is to show which tests fail and (along with the corresponding behaviour) would need to change. This comment in particular worries me, but the parent PR #391 is a bit too intimidating to wade through looking for more context.

@codecov-commenter
Copy link

codecov-commenter commented Aug 10, 2022

Codecov Report

✅ All modified and coverable lines are covered by tests.
✅ Project coverage is 93.04%. Comparing base (fbb4936) to head (54fcbed).
⚠️ Report is 161 commits behind head on main.

Additional details and impacted files
@@            Coverage Diff             @@
##             main     #574      +/-   ##
==========================================
- Coverage   93.15%   93.04%   -0.11%     
==========================================
  Files          15       15              
  Lines         891      892       +1     
==========================================
  Hits          830      830              
- Misses         61       62       +1     

☔ View full report in Codecov by Sentry.
📢 Have feedback on the report? Share it here.

🚀 New features to boost your workflow:
  • ❄️ Test Analytics: Detect flaky tests, report on failures, and find test suite problems.

@oxinabox
Copy link
Member

oxinabox commented Aug 11, 2022

I'm not sure about this. ZeroTangent() is a strong zero, it defeats NaNs. Though that rarely should matter really.
But also it is more informative than 0.
I am not sure, I want to think about this more.

I think we were doing this in the original draft of ProjectTo and removed it.

@willtebbutt @mcabbott thoughts?

Comment on lines +40 to +42
complex_tangent = Tangent{ComplexF64}(; re=1, im=NoTangent())
@test ProjectTo(1.0f0 + 2im)(complex_tangent) === 1.0f0 + 0.0f0im
@test ProjectTo(1.0)(complex_tangent) === 1.0
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

unreleated fix?

@@ -212,7 +212,7 @@ struct NoSuperType end
@test ProjectTo(I)(123) === NoTangent()
@test ProjectTo(2 * I)(I * 3im) === 0.0 * I
@test ProjectTo((4 + 5im) * I)(Tangent{typeof(im * I)}(; λ = 6)) === (6.0 + 0.0im) * I
@test ProjectTo(7 * I)(Tangent{typeof(2I)}()) == ZeroTangent()
@test ProjectTo(7 * I)(Tangent{typeof(2I)}()) == 0.0I
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

hmm not sure that this is a good thing.
Does this also happen to other matrix types?

@mcabbott
Copy link
Member

Currently, structured zero tangents are allowed to pass through during projection for (I believe) all number types.

This was certainly the intention. The strong zero should mean no further computation is done.

This was found while trying to write a rrule that passes https://github.com/FluxML/Zygote.jl/blob/v0.6.43/test/features.jl#L528 (itself found while working on FluxML/Zygote.jl#1284).

But why does this issue imply that we should not preserve AbstractZero? I don't see the connection at all, but maybe I miss something. Can you spell it out?

There are some cases where not creating an AbstractZero seems like a good idea (e.g. for type-stability) but not preserving them seems odd to me.

@ToucheSir
Copy link
Contributor Author

But why does this issue imply that we should not preserve AbstractZero? I don't see the connection at all, but maybe I miss something. Can you spell it out?

There are some cases where not creating an AbstractZero seems like a good idea (e.g. for type-stability) but not preserving them seems odd to me.

The sequence of events was that fixing FluxML/Zygote.jl#1284 required switching keyword argument indexing to use (a modified) path for NamedTuples instead of Dicts. However, this breaks https://github.com/FluxML/Zygote.jl/blob/v0.6.43/test/features.jl#L528 because [gradient].y becomes nothing instead of zero. I thought about converting https://github.com/FluxML/Zygote.jl/blob/4bb6b4dd4a4b6eb0e40126587a7a170216c97448/src/lib/base.jl#L123 to a rrule so as to lean on projection for regaining the natural tangent, but found that it just passes through a ZeroTangent (which then gets converted into nothing by Zygote).

Based on this discussion, it seems like the two options are to change the test and call the existing behaviour a bug, or implement ad-hoc projection logic just for that particular adjoint. I am certainly not qualified to engage in the discussion on zero types brought up in this PR :)

@mcabbott
Copy link
Member

However, this breaks https://github.com/FluxML/Zygote.jl/blob/v0.6.43/test/features.jl#L528 because [gradient].y becomes nothing instead of zero.

That test was added fairly recently, in FluxML/Zygote.jl#1059 . I'd probably expect a hard Zero there (i.e. nothing) but 0.0 also seems acceptable. My guess is that the test only has 0.0 because that was the current behaviour.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants