Skip to content

Conversation

ProGamerGov
Copy link
Contributor

@ProGamerGov ProGamerGov commented Jan 6, 2021

This PR implements support for Activation Atlases and Class Activation Atlases, based on the Activation Atlas research paper: https://distill.pub/2019/activation-atlas/

  • Added atlas functions and tests for them.

  • Added full documentation for all the new atlas functions.

  • Added new Activation Atlas tutorial. The corresponding Lucid tutorial notebook can be found here.

  • Added new Class Activation Atlas tutorial. The corresponding Lucid tutorial notebook can be found here.

  • Both atlas tutorials share some of their text cells and code cells, so keep that in mind when reviewing them.

  • I re-added the RandomRotation transform as the Torchvision one does not accept lists of degree values. Also added tests for the new RandomRotation transform.

  • Vectorized the weights_to_heatmap_2d heatmap function so that it's faster and more efficient. Also improved the tests for the function.

  • Fixed nchannels_to_rgb so that it works properly with CUDA inputs, and also improved the tests for the function. This function isn't used by the atlas related code and tutorials, so it's a bit out of scope for this PR.

  • Added umap-learn package to the tutorial requirements in setup.py. See: https://umap-learn.readthedocs.io/en/latest/ & https://arxiv.org/abs/1802.03426 for more information about Uniform Manifold Approximation and Projection for Dimension Reduction (UMAP). UMAP is used for calculating the atlas structure.

  • Added a relaxed version of the MaxPool2d class for calculating neuron class attributions.

  • Added new Activation Atlas sample collection tutorial. The corresponding Lucid tutorial notebook can be found here. It might be better if we had an easy to download dataset for demonstration purposes. Also, should we have the sample collection functions be inside Captum, or should they only be in the sample collection tutorial?

Atlases might also make a good banner image for Captum as well.

@ProGamerGov
Copy link
Contributor Author

ProGamerGov commented Jan 6, 2021

@NarineK Do you have any ideas on how to generate irregular grids of (x, y) coordinates for tests? I'm note sure how to generate the test inputs I need for the atlas related functions. The test inputs need to let us test the minimum point density parameter as well.

Edit: I think that I can just use torch.arange for the tests.

@ProGamerGov ProGamerGov force-pushed the optim-wip-activation-atlas branch from e40da13 to 740fcde Compare January 6, 2021 16:23
@ProGamerGov
Copy link
Contributor Author

ProGamerGov commented Jan 14, 2021

@NarineK So, Lucid calculates the activations' spatial attributions for a model's labels / classes by using this function:

def fwd_gradients(ys, xs, d_xs):
  
  """Forward-mode pushforward analogous to the pullback defined by tf.gradients.
  With tf.gradients, grad_ys is the vector being pulled back, and here d_xs is
  the vector being pushed forward.
  
  By [email protected] from
  https://github.com/renmengye/tensorflow-forward-ad/issues/2
  """
  
  v = tf.zeros_like(ys)
  g = tf.gradients(ys, xs, grad_ys=v)
  return tf.gradients(g, v, grad_ys=d_xs)

Source: https://colab.research.google.com/github/tensorflow/lucid/blob/master/notebooks/activation-atlas/activation-atlas-collect.ipynb

When I looking looking up how to recreate the same basic algorithm, I found that the above may not be currently possible in PyTorch as I think it's Forward AD: https://pytorch.org/docs/stable/autograd.html#torch.autograd.functional.jvp

we don’t have support for forward mode AD in PyTorch at the moment.

Looking at PyTorch's development of the required feature makes it seem like it isn't coming anytime soon: pytorch/pytorch#10223, pytorch/rfcs#11

So, is there a way I can use Captum to calculate activations' spatial attributions for the different labels / classes of a model?

* Vectorize heatmap function.

* Add sample count to vec coords.

* Add labels for inception v1 model.
@ProGamerGov
Copy link
Contributor Author

I've made progress on getting the attribution stuff working:

def fwd_gradients(ys, xs, d_xs):
    v = torch.zeros_like(ys)
    v.requires_grad = True
    g = torch.autograd.grad(
            outputs=[ys],
            inputs=xs,
            grad_outputs=[v],
            create_graph=True,        
        )[0]

    return torch.autograd.grad(
            outputs=[g],
            inputs=v,
            grad_outputs=d_xs,
            create_graph=True,   
        )[0]

@NarineK
Copy link
Contributor

NarineK commented Jan 15, 2021

Thank you @ProGamerGov! This week has been a bit busy. I will look into your PRs next week or on the weekend.

@ProGamerGov
Copy link
Contributor Author

@NarineK No worries! Atlas attributions may have be added a future PR. I also don't really have a way to display the atlas attribution information either right now (Lucid leaves that to Distill.pub's interactive HTML stuff).

I also used the heatmap function to help visualize the sample counts for each atlas visualization. I'm not sure if there's a better way to display this information right now as it's also left to Distill.pub in Lucid.

@ProGamerGov
Copy link
Contributor Author

ProGamerGov commented Jan 16, 2021

Another potential issue is that downloading the ImageNet 2012 training dataset & collecting samples from it for the tutorial notebook could be difficult and time consuming for users. So, it may be wise to host a few pre-collected layer samples somewhere for users to download and use, as the samples are only around 100mb for about 100k samples. Though if we did that, someone might want to collect the full 1 million samples for the sake of atlas visualization quality.

Edit:

I just tested the size of the samples file for Mixed5a & Mixed5b using torch.randn(1200000, 832) and the resulting file was 3.71 GB. Layers with channel size 528 are 2.35GB & channel size 192 is 878MB. I'm not sure if these sizes are too large for hosting externally somewhere?

I'm also not sure if my capture_activation_samples can handle these sizes at the moment, especially as the tensor is stored in a variable?

@ProGamerGov
Copy link
Contributor Author

ProGamerGov commented Jan 17, 2021

After some testing, I think that I can make sample collection a lot faster and more memory efficient. Saving every sample batch using torch.save and then loading all the files at once & concatenating them together seems to be significantly faster than leaving the samples in a python variable.

@ProGamerGov
Copy link
Contributor Author

ProGamerGov commented Jan 23, 2021

So, quick update!

I resized every image in the ImageNet2012 dataset to 256 and replaced the Pillow library with pillow-simd to speed things up. So, sample collection memory usage and speed is no longer an issue.

According to tqdm, collecting samples for the entire dataset of 1281167 images for all of the main layers at once took just under 4 and half hours, and averaged 80 images a second ([4:23:59<00:00, 80.88 images/s]).

The issue now is that I am running out of memory when using UMAP on the resulting samples tensors.

I also tested the visualization steps with the sample tensors, and there were no memory issues for those steps. So, the issue seems to be with only doing the UMAP calculations.

Edit:

I am able to run UMAP on the 1.2 million samples from Mixed4c using an AWS instance with 64GB of RAM. I still ran out of memory with the Mixed5b samples though. Switching to an instance with 128GB of RAM let me run UMAP successfully on the 1.2 million samples for Mixed5b.

ProGamerGov added 10 commits May 3, 2021 19:47
* Remove old code and text for slicing incomplete atlases.
* Use more precise language.
* Improve the flow of language used between steps.
* Hopefully sample collection is easier to understand this way, as it was previously added as a commented out section to the main activation atlas tutorial.
* Improved the description of activation and attributions samples in both visualizing atlases notebooks.
* Also improved the activation atlas sample collection tutorial.
@ProGamerGov
Copy link
Contributor Author

ProGamerGov commented May 18, 2021

@NarineK There's a small section at the end of the Class Activation Atlas tutorial, which helps demonstrate one of the ways that you can use the information obtained from an activation atlas. For this section I was using two images from the Lucid servers, but they stopped working yesterday and Chris has no idea when it'll be fixed (he doesn't have access to the server anymore). Luckily I had both images saved on my PC just in case something like this happened, and so I uploaded them to a temporary GitHub link (via https://user-images.githubusercontent.com) so that the tutorial will continue to work as normal.

@ProGamerGov
Copy link
Contributor Author

This PR now implements the main activation atlas tutorial and it's supporting core code.

Activation atlas sample collection / advanced setup was moved to: #729

Class Activation Atlases were moved to: #730

@ProGamerGov
Copy link
Contributor Author

ProGamerGov commented Sep 12, 2021

I may close this PR and the class atlas one in favor of newer PRs where I can compress all 95 commits into a single commit, or a small number of commits. Unless that seems like a bad idea?

@ProGamerGov
Copy link
Contributor Author

@NarineK I have created a cleaned up version of this PR, and Github has it listed as changing 953 lines (including empty space and formatting, so in practice fewer lines were added). Do you want me to include the main tutorial in the same PR, or should I leave it for the PR with the second tutorial? I think the tutorial adds around 444 lines (including empty / formatting lines, and excluding .ipynb formatting related code).

@NarineK
Copy link
Contributor

NarineK commented Oct 3, 2021

@NarineK I have created a cleaned up version of this PR, and Github has it listed as changing 953 lines (including empty space and formatting, so in practice fewer lines were added). Do you want me to include the main tutorial in the same PR, or should I leave it for the PR with the second tutorial? I think the tutorial adds around 444 lines (including empty / formatting lines, and excluding .ipynb formatting related code).

@ProGamerGov, thank you for the update. I think if the main is 953 and the tutorial adds 444, it's fine to have the tutorial in that same PR. So the total LOC will be 953 + 444 for that PR ? What's the id of that new PR ?

@ProGamerGov
Copy link
Contributor Author

@NarineK The new PR can be found here: #782

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants