Skip to content

Conversation

ProGamerGov
Copy link
Contributor

See #579 for more details

ProGamerGov and others added 30 commits January 5, 2021 18:28

* Update test_atlas.py
* Vectorize heatmap function.

* Add sample count to vec coords.

* Add labels for inception v1 model.
Sample collection should now be faster & use less memory.
* Activation atlas visualization with 1.2 million samples
* Docs for the WhitenedNeuronDirection objective.
* Fix for visualizations being flipped horizontally.
ProGamerGov added 20 commits May 2, 2021 14:04
* Also changed the `self.direction` variable to `self.vec`.
* Improved a number of text cells in both atlas tutorials
* Add sections about atlas reproducibility.
* Give the user the option to see all class ids and their corresponding class names.
* Remove old code and text for slicing incomplete atlases.
* Use more precise language.
* Improve the flow of language used between steps.
* Hopefully sample collection is easier to understand this way, as it was previously added as a commented out section to the main activation atlas tutorial.
* Improved the description of activation and attributions samples in both visualizing atlases notebooks.
* Also improved the activation atlas sample collection tutorial.
* Move activation atlas tutorials to their own folder.
* Move activation atlas sample collection functions to the sample collection tutorial.
@NarineK
Copy link
Contributor

NarineK commented Aug 13, 2021

Thank you for splitting this PRs, @ProGamerGov. It looks like new PRs show 95 other commits in the commit history. Is it possible to clean commit history, squash it into one commit and have the longer commits to be associated only with the original PR ?

  1. For the notebook Collecting Samples for Activation Atlases with captum.optim do you mind pointing to the notebook in lucid? Activation atlas has multiple notebooks and it is unclear which one does this notebook refer to.
  2. I think I'm not ver clear why By default the activation samples will not have the right class attributions. What do we mean by attribution here ? I think that it would be great to define and describe it in the notebook.
  3. Are attributions activations for the modified network ?
Why are we enabling the gradients if we are computing only the activations ?

with torch.set_grad_enabled(True): target_activ_attr_dict = opt.models.collect_activations( attr_model, attr_targets, inputs )

  4. I think that you might want to describe how is the attribution computed using input * gradients and why are we calling autograd twice. We need to describe the trick that we are using here and reference to the documentation otherwise it's unclear if we try to understand it from the tutorial.
  5. It looks like in the attribute_samples function we are saving activation and attribution vectors in a separate file. Wouldn't it be better to concatenate them in the memory and save them all together in one file instead of concatenating them later in consolidate_samples function ?
  6. In the consolidate_samples function when you mention [n_channels, n_samples]. with n_channels do you mean the number of target classes ?
  7. I looked into the tutorial and tutorial related changes but it looks like there is a lot more code in this PR that is not used in the tutorial and there are same copies in PR Optim-wip: Add Class Activation Atlas tutorial #730. For example AngledNeuronDirection is defined in both PR. I think that the changes that aren't related to this tutorial we can leave in one place. e.g. in PR#730

@ProGamerGov
Copy link
Contributor Author

@NarineK

  1. This is the equivalent Lucid notebook: https://colab.research.google.com/github/tensorflow/lucid/blob/master/notebooks/activation-atlas/activation-atlas-collect.ipynb

  2. I'll try to explain it better!

  3. We only need gradients when calculating the attributions, so for the sake of speed and efficiency I added the with torch.set_grad_enabled(True) lines to the attribution part only. If the user isn't interested in collecting attributions, then the code becomes a lot faster. The attributions are collected from the modified model using the special pooling layers while the activations are collected from the unmodified model.

  4. I'll add some references to the double backwards trick like in the Lucid Notebook!

  5. In my testing I found that dumping the collected activations and attributions to files increased the speed by a significant amount. When dealing with 1 million training images, it can slow to a crawl and potentially crash from out of memory errors if I keep them in memory. Saving the batches as files individually also means that you can still have usable data if it crashes at 99%.

  6. n_channels in the consolidate_samples docs refers to the number of output channels in the saved / loaded activation tensors. I should probably clarify this better!

  7. This PR was meant to be reviewed after: Optim-wip: Add Activation Atlas tutorial & functions #579, so I kept the core code. But I can remove the shared code. I may have to just make a new PR and close this one in order to clean up the commits.

A relaxed pooling layer, that's useful for calculating attributions of spatial
positions. This layer reduces Noise in the gradient through the use of a
continuous relaxation of the gradient.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can we perhaps cite it with a reference to the code in lucid ? Gradient of what ? Of output with respect to this layer ?

)

def forward(self, x: torch.Tensor) -> torch.Tensor:
return self.maxpool(x.detach()) + self.avgpool(x) - self.avgpool(x.detach())
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This part is a bit unclear. One time we add and then subtract avgpool for the detached input ?

Copy link
Contributor Author

@ProGamerGov ProGamerGov Sep 6, 2021

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@NarineK For this layer to work correctly, we want the gradient of the input passed through nn.AvgPool2d, while also using the tensor values from the input passed through nn.MaxPool2d. This is what the line does:

max_input_grad_avg = self.maxpool(x.detach()) + self.avgpool(x) - self.avgpool(x.detach())

As I couldn't seem to separately modify the gradient of the input in the forward pass, this was the solution that Chris came up with.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@ProGamerGov, thank you for the explanation. Does this mean that you want the forward pass to return self.maxpool(x.detach()) but backward pass to be computed for self.avgpool(x) ?
I think it would be great to document it since if we look at it from certain perspective we can say self.avgpool(x) - self.avgpool(x.detach()) = 0, so why do we need it.

Copy link
Contributor Author

@ProGamerGov ProGamerGov Sep 24, 2021

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@NarineK Yes, we want the forward pass to return self.maxpool(x.detach()) while the backward pass uses self.avgpool(x). Lucid uses TensorFlow's gradient override system to perform the same task with slightly different methodology.

The Activation Atlas paper references this Lucid attribution notebook for where the idea was first used. In the notebook, we see the Lucid version of MaxPool2dRelaxed has the following description:

Construct a pooling function where, if we backprop through it, gradients get allocated proportional to the input activation. Then backprop through that instead.

In some ways, this is kind of spiritually similar to SmoothGrad (Smilkov et al.). To see the connection, note that MaxPooling introduces a pretty arbitrary discontinuity to your gradient; with the right distribution of input noise to the MaxPool op, you'd probably smooth out to this. It seems like this is one of the most natural ways to smooth.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Chris helped me a fair bit with recreating the algorithm in PyTorch, so I may not be explaining how it works correctly. I can mark it down that we need to come back to improve the description in a future PR if you want.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@ProGamerGov, thank you for the explanation. Do you mind adding this description in the code so that in the future we can understand it easily?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@NarineK Okay, I've updated the classes description!

@NarineK
Copy link
Contributor

NarineK commented Aug 16, 2021

@ProGamerGov, thank you for the replies. Let me know if you create a new PR with cleaned commit history. I'll take a look into it.

@ProGamerGov
Copy link
Contributor Author

@NarineK Closing this PR in favor of: #750

@ProGamerGov ProGamerGov closed this Sep 8, 2021
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants