Add option to normalize dataset, track thresholds for TopK SAEs, Fix Standard SAE #31

adamkarvonen · 2024-12-26T22:40:38Z

Changes

1. Activation Normalization (Optional)

Added optional activation normalization following the Gemascope paper methodology
Before training, finds a fixed scalar to set unit mean squared norm of activations to 1
Feature is disabled by default

The scaling factors are applied to SAE thresholds and biases during model saving, eliminating the need for normalization at inference time. This approach significantly improves hyperparameter transfer across different layers and models. In particular, Jump ReLU requires significant hyperparameter tuning without this.

2. Global Threshold Implementation for BatchTopK SAE

Implemented fixed global threshold to remove inter-input dependencies
Tracks average minimum non-zero activation value during training
Uses this global threshold by default in encode() method

I also track this global threshold for the TopK SAE during training, but I don't use it by default in the encode method. Using a global threshold provides several benefits: it achieves better loss recovery compared to standard TopK, eliminates feature dependencies within inputs, and enables forward pass on a limited subset of SAE latents (useful for steering, autointerp, etc).

3. Standard SAE Improvements

With the standard SAE, W_dec is now W_enc.T. I also used the correct reconstruction loss for standard and p anneal, which was already used in all other trainers. The initialization provided a major benefit and the reconstruction loss provided a minor benefit.

… W_enc.T

adamkarvonen added 12 commits December 20, 2024 03:46

Ensure activation buffer has the correct dtype

d416eab

Fix JumpReLU training and loading

552a8c2

Begin creation of demo script

712eb98

Modularize demo script

dcc02f0

Track threshold for batchtopk, rename for consistency

32d198f

Track thresholds for topk and batchtopk during training

b5821fd

Remove demo script and graphing notebook

57f451b

Add option to normalize dataset activations

81968f2

Fix topk bfloat16 dtype error

488a154

Add bias scaling to topk saes

484ca01

Use the correct standard SAE reconstruction loss, initialize W_dec to…

8b95ec9

… W_enc.T

Also scale topk thresholds when scaling biases

efd76b1

adamkarvonen merged commit 67a7857 into main Dec 26, 2024

adamkarvonen mentioned this pull request Dec 26, 2024

Batchtopk into jumprelu #29

Closed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Add option to normalize dataset, track thresholds for TopK SAEs, Fix Standard SAE #31

Add option to normalize dataset, track thresholds for TopK SAEs, Fix Standard SAE #31

Uh oh!

adamkarvonen commented Dec 26, 2024

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

Add option to normalize dataset, track thresholds for TopK SAEs, Fix Standard SAE #31

Add option to normalize dataset, track thresholds for TopK SAEs, Fix Standard SAE #31

Uh oh!

Conversation

adamkarvonen commented Dec 26, 2024

Changes

1. Activation Normalization (Optional)

2. Global Threshold Implementation for BatchTopK SAE

3. Standard SAE Improvements

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant