Skip to content

Conversation

adamkarvonen
Copy link
Collaborator

Add sparsity warmup to Gated, JumpReLU, and Standard Trainers. I implemented this as it was simple and GDM / Anthropic use it for all SAEs as far as I can tell. I left P Anneal variants alone as it could mess with the p annealing schedule. From testing, this seems to make a relatively minor difference and slightly increases the number of alive features.

@adamkarvonen adamkarvonen merged commit a11670f into main Dec 27, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant