Skip to content

Conversation

@g-w1
Copy link
Contributor

@g-w1 g-w1 commented Feb 28, 2024

This can be useful for hyperparam search.

@g-w1 g-w1 force-pushed the parallel-more-hyps branch 2 times, most recently from d5d9ddf to 59785e2 Compare February 29, 2024 16:26
g-w1 added 3 commits February 29, 2024 20:04
This allows one to train multiple autoencoders off of a single layer since
we aren't indexing a dictionary off the same thing.

This could be useful for something like hyperparamater tuning where you
only want to change one thing at a time.

Here's an example:
```
submodules = [model.gpt_neox.layers[3].mlp, model.gpt_neox.layers[3].mlp, model.gpt_neox.layers[3].mlp]
activation_dim = 512 # output dimension of the MLP
dictionary_size = 16 * activation_dim
learning_rates = [3e-4, 1e-3, 3e-3]
```
This allows one to re-train a sparse autoencoder on the same layer
without re-generating all of the activations to train on.
@g-w1 g-w1 force-pushed the parallel-more-hyps branch from a3ff78e to 72e23be Compare February 29, 2024 20:09
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant