This subdirectory provides a minimal, directly runnable pipeline to:
- Select a subset of samples from an embedding matrix
- Train a dual-head VAE with a shared encoder
- Generate noisy sample pairs using the trained VAE
- Train a Bradley–Terry reward model, with optional baseline comparison
run_pipeline.sh: Simplified pipeline script (uses relative paths)select_samples.py,train_vae_dual.py,generate_noisy_pairs.py,train_bt_vae.py: Core Python scriptsrequirements.txt: Python dependenciesrun_100k_reference.sh: Original large script for reference only
- Python >= 3.9 (3.10/3.11 recommended)
- Install dependencies (prefer a virtualenv):
pip install -r requirements.txt- For GPU usage, install a CUDA-enabled
torchbuild.
Expected inputs (example names, paths are configurable):
data/train_100k.npy: Training embeddings (shape: [N, D])data/multi_response_embeddings.npy: Multi-response featuresdata/multi_response_rewards.npy: Corresponding rewards
You may place them anywhere and pass their paths via script arguments.
bash run_pipeline.sh --input_file data/train_100k.npy --output_dir outputs/llama_instruct_10k --multi_response_features_path data/multi_response_embeddings.npy --multi_response_rewards_path data/multi_response_rewards.npy --num_samples 50000 --latent_dim 16 --hidden_dims 64 --batch_size 128 --epochs 20 --lr 1e-4 --temperature 1.0 --contrastive_weight 0.01 --noise_std 0.01 --num_variants 1 --n_noise 10 --train_size 1000.0 --hidden_dim 512 --dropout 0.0 --seed 44 --run_comparison true- Outputs are saved under
--output_dir, including:seeds_samples/: Selected subsetvae_model/: Trained VAE (best_model.pt)generated_pairs/: Noisy pairs (generated_noisy_pairs.npy)reward_model_with_vae/andreward_model_baseline/: Reward model results (gold_reward_results.json, etc.)
Note: This minimal version removes dependencies on
jq/bcand does not enforce extra comparison report generation, while keeping core functionality intact.
- Without a GPU,
torchwill fall back to CPU and be slower. - If defaults in
train_vae_dual.pyortrain_bt_vae.pydiffer, override them via the same-namedrun_pipeline.shargs. - Ensure input
.npyfiles have the expected shapes.
Follow your repo's license; if unspecified, default to MIT.