2026.TA.gemma2_2b_chat_truncate_tc8192_decb_l1w0.001_tarbb_lb2.0_ln1_dr10000_lr8e-04_sl14797889
Sparse transcoder adapter trained with bridging mode.
Full name: 2026.TA.gemma2_2b_chat_truncate_tc8192_decb_l1w0.001_tarbb_lb2.0_ln1_dr10000_lr8e-04_bs4_sl14797889
Model Details
- Base model: google/gemma-2-2b
- Reference model: google/gemma-2-2b-it
- Architecture: gemma2
- Training mode: bridging
- Tokenizer: google/gemma-2-2b-it
- Training config: training-config.yaml
- GitHub: https://github.com/Sid-MB/transcoder-adapters
- W&B run: https://wandb.ai/siddharth-stanford/sparse-adaptation/runs/gdw3aozy
Transcoder Configuration
- n_features: 8192
- dec_bias: True
- l1_weight: 0.001
Training
- Learning rate: 0.0008
- Batch size: 4
- Epochs: 1
- Warmup ratio: 0.05
- Loss type: kl
- lambda_adapt: 1.0
- lambda_bridge: 2.0
- lambda_nmse: 1
- n_cutoffs: 1
- backbone: target
Training Data
- Downloads last month
- 53
Inference Providers NEW
This model isn't deployed by any Inference Provider. 🙋 Ask for provider support
Model tree for siddharthmb/2026.TA.gemma2_2b_chat_truncate_tc8192_decb_l1w0.001_tarbb_lb2.0_ln1_dr10000_lr8e-04_sl14797889
Base model
google/gemma-2-2b