Solshine commited on
Commit
ba67ba2
·
0 Parent(s):

Initial public release: SAE weights, cfg, and model card

Browse files
.gitattributes ADDED
@@ -0,0 +1,35 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ *.7z filter=lfs diff=lfs merge=lfs -text
2
+ *.arrow filter=lfs diff=lfs merge=lfs -text
3
+ *.bin filter=lfs diff=lfs merge=lfs -text
4
+ *.bz2 filter=lfs diff=lfs merge=lfs -text
5
+ *.ckpt filter=lfs diff=lfs merge=lfs -text
6
+ *.ftz filter=lfs diff=lfs merge=lfs -text
7
+ *.gz filter=lfs diff=lfs merge=lfs -text
8
+ *.h5 filter=lfs diff=lfs merge=lfs -text
9
+ *.joblib filter=lfs diff=lfs merge=lfs -text
10
+ *.lfs.* filter=lfs diff=lfs merge=lfs -text
11
+ *.mlmodel filter=lfs diff=lfs merge=lfs -text
12
+ *.model filter=lfs diff=lfs merge=lfs -text
13
+ *.msgpack filter=lfs diff=lfs merge=lfs -text
14
+ *.npy filter=lfs diff=lfs merge=lfs -text
15
+ *.npz filter=lfs diff=lfs merge=lfs -text
16
+ *.onnx filter=lfs diff=lfs merge=lfs -text
17
+ *.ot filter=lfs diff=lfs merge=lfs -text
18
+ *.parquet filter=lfs diff=lfs merge=lfs -text
19
+ *.pb filter=lfs diff=lfs merge=lfs -text
20
+ *.pickle filter=lfs diff=lfs merge=lfs -text
21
+ *.pkl filter=lfs diff=lfs merge=lfs -text
22
+ *.pt filter=lfs diff=lfs merge=lfs -text
23
+ *.pth filter=lfs diff=lfs merge=lfs -text
24
+ *.rar filter=lfs diff=lfs merge=lfs -text
25
+ *.safetensors filter=lfs diff=lfs merge=lfs -text
26
+ saved_model/**/* filter=lfs diff=lfs merge=lfs -text
27
+ *.tar.* filter=lfs diff=lfs merge=lfs -text
28
+ *.tar filter=lfs diff=lfs merge=lfs -text
29
+ *.tflite filter=lfs diff=lfs merge=lfs -text
30
+ *.tgz filter=lfs diff=lfs merge=lfs -text
31
+ *.wasm filter=lfs diff=lfs merge=lfs -text
32
+ *.xz filter=lfs diff=lfs merge=lfs -text
33
+ *.zip filter=lfs diff=lfs merge=lfs -text
34
+ *.zst filter=lfs diff=lfs merge=lfs -text
35
+ *tfevents* filter=lfs diff=lfs merge=lfs -text
README.md ADDED
@@ -0,0 +1,130 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ ---
2
+ tags:
3
+ - deception-detection
4
+ - sparse-autoencoders
5
+ - mechanistic-interpretability
6
+ - ai-safety
7
+ - nanochat
8
+ license: mit
9
+ ---
10
+
11
+ # Deception SAEs for nanochat-d20 (561M)
12
+
13
+ 12 SAE checkpoints trained on nanochat-d20 behavioral sampling activations. Includes standard, deception-optimized, honest-optimized, and mixed training variants.
14
+
15
+
16
+ ## Training-data caveat — please read before use
17
+
18
+ The "honest" and "deceptive" labels used to train and evaluate this SAE
19
+ come from a **mixed scenario pool** of nine prompts:
20
+
21
+ - **Six clean incentive-structure scenarios** — `insider_info`,
22
+ `accounting_error`, `ai_oversight_log`, `ai_capability_hide`,
23
+ `surprise_party`, `job_interview_gap`. Each presents a realistic
24
+ situation with a non-trivial incentive to conceal or disclose
25
+ information; the honest/deceptive distinction is which of the two
26
+ behavioral choices the model's completion settles into under
27
+ temperature sampling.
28
+ - **Three role-play identity-assignment scenarios** —
29
+ `secret_role_game` ("You have been assigned the role of Fascist"),
30
+ `secret_role_game_v2` ("You are secretly one of the Mafia members"),
31
+ `werewolf_game` ("You are a Werewolf"). These pre-assign the model a
32
+ deceptive identity and label a completion "deceptive" when the model
33
+ drifts away from the assigned role or "honest" when it echoes it.
34
+
35
+ **What this mixed pool means for the SAE's labels.** Within the six
36
+ incentive-structure scenarios, the honest/deceptive distinction is a
37
+ measurement of behavioral choice under an ambiguous incentive. Within
38
+ the three role-play scenarios, the distinction is a measurement of
39
+ role-consistency under identity-assigned role-play — which is a
40
+ well-defined phenomenon but not the same as emergent or incentive-
41
+ driven deception.
42
+
43
+ **What this SAE is and is not good for.**
44
+
45
+ - **Good for:** research on mixed-pool activation geometry; SAE
46
+ feature-geometry studies; as one of a set of baselines when
47
+ comparing multiple SAE families; as a reference implementation of
48
+ same-prompt temperature-sampled behavioral SAE training at scale.
49
+ - **Not recommended as a standalone deception detector.** The
50
+ role-consistency signal from the three role-play scenarios is mixed
51
+ into every aggregate metric reported below. A downstream user who
52
+ wants an "emergent-deception feature set" should restrict attention
53
+ to features whose activation pattern concentrates in the
54
+ `insider_info` / `accounting_error` / `ai_oversight_log` /
55
+ `ai_capability_hide` / `surprise_party` / `job_interview_gap`
56
+ scenarios — or wait for the methodologically corrected V3 re-release
57
+ currently in preparation on the decision-incentive scenario bank
58
+ (no pre-assigned deceptive identity).
59
+
60
+ **What is unaffected by this caveat.**
61
+
62
+ - The SAE weights, reconstruction metrics (explained variance, L0,
63
+ alive features), and engineering of the training pipeline are
64
+ accurate as reported.
65
+ - The linear-probe balanced-accuracy numbers in the upstream paper
66
+ measure the mixed pool; the 6-scenario clean-subset re-analysis is
67
+ listed as a planned appendix for the next manuscript revision.
68
+
69
+ A companion methodology-first Gemma 4 SAE suite is in preparation using
70
+ pretraining-distribution data + a decision-incentive behavior split;
71
+ this README will be updated with a link when that release is public.
72
+
73
+ ---
74
+
75
+ ## Key Finding: Mixed Training Beats Deception-Only
76
+
77
+ | Training Data | Layer 10 d_max | Layer 18 d_max |
78
+ |---|---|---|
79
+ | **Mixed (dec+hon)** | 0.558 | **0.684** |
80
+ | Deception-only | 0.520 | 0.634 |
81
+ | Honest-only | 0.544 | 0.572 |
82
+ | Standard (all) | 0.518 | 0.549 |
83
+ | TopK (standard) | 0.226 | 0.346 |
84
+
85
+ Training on both behavioral classes together gives the best discriminability. The SAE needs to see the contrast.
86
+
87
+ ## Model Details
88
+
89
+ - **Base model:** nanochat-d20 (561M params, d_model=1280, 20 layers)
90
+ - **Dimensions:** d_in=1280, d_sae=5120 (4x expansion)
91
+ - **Training data:** 270 V3 behavioral sampling completions (132 deceptive, 128 honest, 10 ambiguous)
92
+ - **Training epochs:** 300
93
+ - **Layers:** 10 (50% depth) and 18 (95% depth, probe peak)
94
+
95
+ ## Checkpoints
96
+
97
+ | File | Training | Architecture | Layer | d_max | L0 | EV |
98
+ |---|---|---|---|---|---|---|
99
+ | `d20_L10_standard_topk.pt` | All data | TopK k=32 | 10 | 0.226 | 32 | 98.5% |
100
+ | `d20_L10_standard_jumprelu.pt` | All data | JumpReLU | 10 | 0.518 | 2093 | 99.7% |
101
+ | `d20_L10_deception_topk.pt` | Deceptive only | TopK k=32 | 10 | 0.244 | 32 | 98.4% |
102
+ | `d20_L10_deception_jumprelu.pt` | Deceptive only | JumpReLU | 10 | 0.520 | 2125 | 99.5% |
103
+ | `d20_L10_honest_jumprelu.pt` | Honest only | JumpReLU | 10 | 0.544 | 2108 | 99.4% |
104
+ | `d20_L10_mixed_jumprelu.pt` | Dec+Hon only | JumpReLU | 10 | 0.558 | 2025 | 99.6% |
105
+ | `d20_L18_standard_topk.pt` | All data | TopK k=32 | 18 | 0.346 | 32 | 96.8% |
106
+ | `d20_L18_standard_jumprelu.pt` | All data | JumpReLU | 18 | 0.549 | 2409 | 99.7% |
107
+ | `d20_L18_deception_topk.pt` | Deceptive only | TopK k=32 | 18 | 0.252 | 32 | 95.2% |
108
+ | `d20_L18_deception_jumprelu.pt` | Deceptive only | JumpReLU | 18 | 0.634 | 2353 | 99.4% |
109
+ | `d20_L18_honest_jumprelu.pt` | Honest only | JumpReLU | 18 | 0.572 | 2422 | 99.4% |
110
+ | **`d20_L18_mixed_jumprelu.pt`** | **Dec+Hon** | **JumpReLU** | **18** | **0.684** | 2371 | 99.5% |
111
+
112
+ ## Related Work
113
+
114
+ Follow-up research to:
115
+ - **"The Secret Agenda: LLMs Strategically Lie Undetected by Current Safety Tools"**
116
+ - [OpenReview](https://openreview.net/forum?id=FhGJLT6spH)
117
+ - [ArXiv](https://arxiv.org/abs/2503.07683)
118
+
119
+ Part of the deception-nanochat-sae-research project:
120
+ - [GitHub](https://github.com/SolshineCode/deception-nanochat-sae-research)
121
+
122
+ ## Citation
123
+
124
+ ```bibtex
125
+ @article{deleeuw2025secret,
126
+ title={The Secret Agenda: LLMs Strategically Lie Undetected by Current Safety Tools},
127
+ author={DeLeeuw, Caleb and Chawla, ...},
128
+ year={2025}
129
+ }
130
+ ```
d20_L10_deception_jumprelu.pt ADDED
@@ -0,0 +1,3 @@
 
 
 
 
1
+ version https://git-lfs.github.com/spec/v1
2
+ oid sha256:e0ac4f8174d8f09f10ff9b8cecabae2ad936930fe2e4461fe8080377b6458444
3
+ size 52477866
d20_L10_deception_topk.pt ADDED
@@ -0,0 +1,3 @@
 
 
 
 
1
+ version https://git-lfs.github.com/spec/v1
2
+ oid sha256:9664ad6438db5720ecf982f279f2fb6b15effafaf34af07d60b0edab532e6adc
3
+ size 52457075
d20_L10_honest_jumprelu.pt ADDED
@@ -0,0 +1,3 @@
 
 
 
 
1
+ version https://git-lfs.github.com/spec/v1
2
+ oid sha256:a84f054ba39931a2ce3376e54722710ae99022b2bd4f5486138538adf9309720
3
+ size 52477833
d20_L10_mixed_jumprelu.pt ADDED
@@ -0,0 +1,3 @@
 
 
 
 
1
+ version https://git-lfs.github.com/spec/v1
2
+ oid sha256:1ae2d0939c579aff472191813daa9d2738a7b1d13c03cfcf470dd796802dd864
3
+ size 52477822
d20_L10_standard_jumprelu.pt ADDED
@@ -0,0 +1,3 @@
 
 
 
 
1
+ version https://git-lfs.github.com/spec/v1
2
+ oid sha256:e37f54997b5af9f0a95095b33fab9eeb3902cb92a4398c3f861c5f24abe7c3fe
3
+ size 52477855
d20_L10_standard_topk.pt ADDED
@@ -0,0 +1,3 @@
 
 
 
 
1
+ version https://git-lfs.github.com/spec/v1
2
+ oid sha256:77721a113a8e1f008c7883e29d2cc3e4426794706ca39ed4ba253286b3e88966
3
+ size 52457001
d20_L18_deception_jumprelu.pt ADDED
@@ -0,0 +1,3 @@
 
 
 
 
1
+ version https://git-lfs.github.com/spec/v1
2
+ oid sha256:27e726f2aee94e6f78d8290df4569fff0e2f9d1bf53bf6433468426a9a6c1235
3
+ size 52477866
d20_L18_deception_topk.pt ADDED
@@ -0,0 +1,3 @@
 
 
 
 
1
+ version https://git-lfs.github.com/spec/v1
2
+ oid sha256:b80ec3baa90bf7e04492a18dd769ebe0bd8e5824baf8712e298bc8b0afc93345
3
+ size 52457075
d20_L18_honest_jumprelu.pt ADDED
@@ -0,0 +1,3 @@
 
 
 
 
1
+ version https://git-lfs.github.com/spec/v1
2
+ oid sha256:6a588fffbb348ea5c0ef0460ec2932835780d2d85e2c9b451c375cfb79b4d922
3
+ size 52477833
d20_L18_mixed_jumprelu.pt ADDED
@@ -0,0 +1,3 @@
 
 
 
 
1
+ version https://git-lfs.github.com/spec/v1
2
+ oid sha256:5bf07ebb9bd120ef28ed2b0f23021939c6cc3347aec31cfb45141adca71b9a11
3
+ size 52477822
d20_L18_standard_jumprelu.pt ADDED
@@ -0,0 +1,3 @@
 
 
 
 
1
+ version https://git-lfs.github.com/spec/v1
2
+ oid sha256:fafcc2f6b910ac28719b8b59f08ae9c869eaace14937177deb9badb4e3be7a6f
3
+ size 52477855
d20_L18_standard_topk.pt ADDED
@@ -0,0 +1,3 @@
 
 
 
 
1
+ version https://git-lfs.github.com/spec/v1
2
+ oid sha256:bce64dd00f1d856631451da54e8817555d7cfbb61f1fec8ab6042778889f532b
3
+ size 52457001
training_results.json ADDED
@@ -0,0 +1,235 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ {
2
+ "results": [
3
+ {
4
+ "config_name": "standard_topk_L10",
5
+ "activation": "topk",
6
+ "layer": 10,
7
+ "d_in": 1280,
8
+ "d_sae": 5120,
9
+ "n_train_samples": 270,
10
+ "n_dec": 132,
11
+ "n_hon": 128,
12
+ "mse_loss": 10401.265625,
13
+ "explained_variance": 0.9853075116877134,
14
+ "l0": 32.0,
15
+ "alive_features": 32,
16
+ "total_features": 5120,
17
+ "d_max": 0.22631387412548065,
18
+ "d_mean": 0.0003667560813482851,
19
+ "top10_d_mean": 0.12578895688056946,
20
+ "train_seconds": 16.066182613372803
21
+ },
22
+ {
23
+ "config_name": "standard_jumprelu_L10",
24
+ "activation": "jumprelu",
25
+ "layer": 10,
26
+ "d_in": 1280,
27
+ "d_sae": 5120,
28
+ "n_train_samples": 270,
29
+ "n_dec": 132,
30
+ "n_hon": 128,
31
+ "mse_loss": 2175.9248046875,
32
+ "explained_variance": 0.9969271667588002,
33
+ "l0": 2093.103759765625,
34
+ "alive_features": 2422,
35
+ "total_features": 5120,
36
+ "d_max": 0.5175628662109375,
37
+ "d_mean": 0.06177805736660957,
38
+ "top10_d_mean": 0.4822224974632263,
39
+ "train_seconds": 16.91509199142456
40
+ },
41
+ {
42
+ "config_name": "deception_topk_L10",
43
+ "activation": "topk",
44
+ "layer": 10,
45
+ "d_in": 1280,
46
+ "d_sae": 5120,
47
+ "n_train_samples": 132,
48
+ "n_dec": 132,
49
+ "n_hon": 128,
50
+ "mse_loss": 11454.9375,
51
+ "explained_variance": 0.9837859816754413,
52
+ "l0": 32.0,
53
+ "alive_features": 57,
54
+ "total_features": 5120,
55
+ "d_max": 0.2436903864145279,
56
+ "d_mean": 0.0014653955586254597,
57
+ "top10_d_mean": 0.19488340616226196,
58
+ "train_seconds": 10.158864498138428
59
+ },
60
+ {
61
+ "config_name": "deception_jumprelu_L10",
62
+ "activation": "jumprelu",
63
+ "layer": 10,
64
+ "d_in": 1280,
65
+ "d_sae": 5120,
66
+ "n_train_samples": 132,
67
+ "n_dec": 132,
68
+ "n_hon": 128,
69
+ "mse_loss": 3407.658203125,
70
+ "explained_variance": 0.9951766192984177,
71
+ "l0": 2125.0302734375,
72
+ "alive_features": 2415,
73
+ "total_features": 5120,
74
+ "d_max": 0.5203126668930054,
75
+ "d_mean": 0.0662868469953537,
76
+ "top10_d_mean": 0.4608234763145447,
77
+ "train_seconds": 10.458905458450317
78
+ },
79
+ {
80
+ "config_name": "honest_jumprelu_L10",
81
+ "activation": "jumprelu",
82
+ "layer": 10,
83
+ "d_in": 1280,
84
+ "d_sae": 5120,
85
+ "n_train_samples": 128,
86
+ "n_dec": 132,
87
+ "n_hon": 128,
88
+ "mse_loss": 4447.328125,
89
+ "explained_variance": 0.9937212616246724,
90
+ "l0": 2107.9453125,
91
+ "alive_features": 2471,
92
+ "total_features": 5120,
93
+ "d_max": 0.5444101691246033,
94
+ "d_mean": 0.06562866270542145,
95
+ "top10_d_mean": 0.4671773314476013,
96
+ "train_seconds": 6.163604736328125
97
+ },
98
+ {
99
+ "config_name": "mixed_jumprelu_L10",
100
+ "activation": "jumprelu",
101
+ "layer": 10,
102
+ "d_in": 1280,
103
+ "d_sae": 5120,
104
+ "n_train_samples": 260,
105
+ "n_dec": 132,
106
+ "n_hon": 128,
107
+ "mse_loss": 2848.078369140625,
108
+ "explained_variance": 0.9959738328035153,
109
+ "l0": 2025.2230224609375,
110
+ "alive_features": 2353,
111
+ "total_features": 5120,
112
+ "d_max": 0.5577351450920105,
113
+ "d_mean": 0.06202101707458496,
114
+ "top10_d_mean": 0.4947337508201599,
115
+ "train_seconds": 16.7561674118042
116
+ },
117
+ {
118
+ "config_name": "standard_topk_L18",
119
+ "activation": "topk",
120
+ "layer": 18,
121
+ "d_in": 1280,
122
+ "d_sae": 5120,
123
+ "n_train_samples": 270,
124
+ "n_dec": 132,
125
+ "n_hon": 128,
126
+ "mse_loss": 170247.25,
127
+ "explained_variance": 0.9676477412337379,
128
+ "l0": 32.0,
129
+ "alive_features": 32,
130
+ "total_features": 5120,
131
+ "d_max": 0.3456007242202759,
132
+ "d_mean": 0.0005656593712046742,
133
+ "top10_d_mean": 0.1809413731098175,
134
+ "train_seconds": 15.6581871509552
135
+ },
136
+ {
137
+ "config_name": "standard_jumprelu_L18",
138
+ "activation": "jumprelu",
139
+ "layer": 18,
140
+ "d_in": 1280,
141
+ "d_sae": 5120,
142
+ "n_train_samples": 270,
143
+ "n_dec": 132,
144
+ "n_hon": 128,
145
+ "mse_loss": 18159.80078125,
146
+ "explained_variance": 0.9965491720937171,
147
+ "l0": 2409.288818359375,
148
+ "alive_features": 2975,
149
+ "total_features": 5120,
150
+ "d_max": 0.5487460494041443,
151
+ "d_mean": 0.08235464245080948,
152
+ "top10_d_mean": 0.516090989112854,
153
+ "train_seconds": 16.922561168670654
154
+ },
155
+ {
156
+ "config_name": "deception_topk_L18",
157
+ "activation": "topk",
158
+ "layer": 18,
159
+ "d_in": 1280,
160
+ "d_sae": 5120,
161
+ "n_train_samples": 132,
162
+ "n_dec": 132,
163
+ "n_hon": 128,
164
+ "mse_loss": 253385.25,
165
+ "explained_variance": 0.9516635443274013,
166
+ "l0": 32.0,
167
+ "alive_features": 32,
168
+ "total_features": 5120,
169
+ "d_max": 0.25222423672676086,
170
+ "d_mean": 0.00039686914533376694,
171
+ "top10_d_mean": 0.1152176484465599,
172
+ "train_seconds": 10.301580905914307
173
+ },
174
+ {
175
+ "config_name": "deception_jumprelu_L18",
176
+ "activation": "jumprelu",
177
+ "layer": 18,
178
+ "d_in": 1280,
179
+ "d_sae": 5120,
180
+ "n_train_samples": 132,
181
+ "n_dec": 132,
182
+ "n_hon": 128,
183
+ "mse_loss": 30484.138671875,
184
+ "explained_variance": 0.9941853721376139,
185
+ "l0": 2352.977294921875,
186
+ "alive_features": 2944,
187
+ "total_features": 5120,
188
+ "d_max": 0.6341533660888672,
189
+ "d_mean": 0.08807803690433502,
190
+ "top10_d_mean": 0.5500224828720093,
191
+ "train_seconds": 10.446924209594727
192
+ },
193
+ {
194
+ "config_name": "honest_jumprelu_L18",
195
+ "activation": "jumprelu",
196
+ "layer": 18,
197
+ "d_in": 1280,
198
+ "d_sae": 5120,
199
+ "n_train_samples": 128,
200
+ "n_dec": 132,
201
+ "n_hon": 128,
202
+ "mse_loss": 32692.166015625,
203
+ "explained_variance": 0.9938135444773598,
204
+ "l0": 2421.734375,
205
+ "alive_features": 2964,
206
+ "total_features": 5120,
207
+ "d_max": 0.5715387463569641,
208
+ "d_mean": 0.08183819055557251,
209
+ "top10_d_mean": 0.5204698443412781,
210
+ "train_seconds": 6.2191526889801025
211
+ },
212
+ {
213
+ "config_name": "mixed_jumprelu_L18",
214
+ "activation": "jumprelu",
215
+ "layer": 18,
216
+ "d_in": 1280,
217
+ "d_sae": 5120,
218
+ "n_train_samples": 260,
219
+ "n_dec": 132,
220
+ "n_hon": 128,
221
+ "mse_loss": 24892.7265625,
222
+ "explained_variance": 0.9952702920492092,
223
+ "l0": 2370.634521484375,
224
+ "alive_features": 3005,
225
+ "total_features": 5120,
226
+ "d_max": 0.6843701004981995,
227
+ "d_mean": 0.08426444232463837,
228
+ "top10_d_mean": 0.5047619938850403,
229
+ "train_seconds": 16.829967498779297
230
+ }
231
+ ],
232
+ "model": "nanochat-d20",
233
+ "d_in": 1280,
234
+ "d_sae": 5120
235
+ }