-
Notifications
You must be signed in to change notification settings - Fork 542
Optim-wip: Add Activation Atlas setup tutorial #729
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Optim-wip: Add Activation Atlas setup tutorial #729
Conversation
* Update test_atlas.py
…tim-wip-activation-atlas
* Vectorize heatmap function. * Add sample count to vec coords. * Add labels for inception v1 model.
Sample collection should now be faster & use less memory.
* Activation atlas visualization with 1.2 million samples
…tim-wip-activation-atlas
* Docs for the WhitenedNeuronDirection objective. * Fix for visualizations being flipped horizontally.
* Also changed the `self.direction` variable to `self.vec`.
* Improved a number of text cells in both atlas tutorials * Add sections about atlas reproducibility. * Give the user the option to see all class ids and their corresponding class names.
* Remove old code and text for slicing incomplete atlases. * Use more precise language. * Improve the flow of language used between steps.
* Hopefully sample collection is easier to understand this way, as it was previously added as a commented out section to the main activation atlas tutorial. * Improved the description of activation and attributions samples in both visualizing atlases notebooks.
* Also improved the activation atlas sample collection tutorial.
* Move activation atlas tutorials to their own folder. * Move activation atlas sample collection functions to the sample collection tutorial.
Thank you for splitting this PRs, @ProGamerGov. It looks like new PRs show 95 other commits in the commit history. Is it possible to clean commit history, squash it into one commit and have the longer commits to be associated only with the original PR ?
|
|
A relaxed pooling layer, that's useful for calculating attributions of spatial | ||
positions. This layer reduces Noise in the gradient through the use of a | ||
continuous relaxation of the gradient. | ||
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Can we perhaps cite it with a reference to the code in lucid ? Gradient of what ? Of output with respect to this layer ?
) | ||
|
||
def forward(self, x: torch.Tensor) -> torch.Tensor: | ||
return self.maxpool(x.detach()) + self.avgpool(x) - self.avgpool(x.detach()) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This part is a bit unclear. One time we add and then subtract avgpool for the detached input ?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@NarineK For this layer to work correctly, we want the gradient of the input passed through nn.AvgPool2d
, while also using the tensor values from the input passed through nn.MaxPool2d
. This is what the line does:
max_input_grad_avg = self.maxpool(x.detach()) + self.avgpool(x) - self.avgpool(x.detach())
As I couldn't seem to separately modify the gradient of the input in the forward pass, this was the solution that Chris came up with.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@ProGamerGov, thank you for the explanation. Does this mean that you want the forward pass to return self.maxpool(x.detach())
but backward pass to be computed for self.avgpool(x)
?
I think it would be great to document it since if we look at it from certain perspective we can say self.avgpool(x) - self.avgpool(x.detach()) = 0
, so why do we need it.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@NarineK Yes, we want the forward pass to return self.maxpool(x.detach())
while the backward pass uses self.avgpool(x)
. Lucid uses TensorFlow's gradient override system to perform the same task with slightly different methodology.
The Activation Atlas paper references this Lucid attribution notebook for where the idea was first used. In the notebook, we see the Lucid version of MaxPool2dRelaxed
has the following description:
Construct a pooling function where, if we backprop through it, gradients get allocated proportional to the input activation. Then backprop through that instead.
In some ways, this is kind of spiritually similar to SmoothGrad (Smilkov et al.). To see the connection, note that MaxPooling introduces a pretty arbitrary discontinuity to your gradient; with the right distribution of input noise to the MaxPool op, you'd probably smooth out to this. It seems like this is one of the most natural ways to smooth.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Chris helped me a fair bit with recreating the algorithm in PyTorch, so I may not be explaining how it works correctly. I can mark it down that we need to come back to improve the description in a future PR if you want.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@ProGamerGov, thank you for the explanation. Do you mind adding this description in the code so that in the future we can understand it easily?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@NarineK Okay, I've updated the classes description!
@ProGamerGov, thank you for the replies. Let me know if you create a new PR with cleaned commit history. I'll take a look into it. |
See #579 for more details