magiccodingman commited on
Commit
7f47feb
·
verified ·
1 Parent(s): 6e75d88

File name changes

Browse files
.gitattributes CHANGED
@@ -33,5 +33,5 @@ saved_model/**/* filter=lfs diff=lfs merge=lfs -text
33
  *.zip filter=lfs diff=lfs merge=lfs -text
34
  *.zst filter=lfs diff=lfs merge=lfs -text
35
  *tfevents* filter=lfs diff=lfs merge=lfs -text
36
- granite-4.0-h-350m-unsloth-mxfp4_moe-EKUD=B16-O=Q6K-Q=Q8_0.gguf filter=lfs diff=lfs merge=lfs -text
37
- granite-4.0-h-350m-unsloth-mxfp4_moe-O=Q6K-EQKUD=Q8_0.gguf filter=lfs diff=lfs merge=lfs -text
 
33
  *.zip filter=lfs diff=lfs merge=lfs -text
34
  *.zst filter=lfs diff=lfs merge=lfs -text
35
  *tfevents* filter=lfs diff=lfs merge=lfs -text
36
+ granite-4.0-h-350m-unsloth-mxfp4_moe-EKUD-B16-O-Q6K-Q-Q8_0.gguf filter=lfs diff=lfs merge=lfs -text
37
+ granite-4.0-h-350m-unsloth-mxfp4_moe-O-Q6K-EQKUD-Q8_0.gguf filter=lfs diff=lfs merge=lfs -text
README.md CHANGED
@@ -35,24 +35,22 @@ To dive deeper into how MagicQuant works, see the main repo:
35
 
36
  | model_name | file_size_gb | bench_tps | avg_prec_loss |
37
  | ---------- | ------------ | --------- | ------------- |
38
- | [mxfp4_moe-EKUD=B16-O=Q6K-Q=Q8_0](./../../resolve/main/granite-4.0-h-350m-unsloth-mxfp4_moe-EKUD=B16-O=Q6K-Q=Q8_0.gguf?download=true) | 0.54 | 1705.35 | 0.0816 |
39
- | [mxfp4_moe-O=Q6K-EQKUD=Q8_0](./../../resolve/main/granite-4.0-h-350m-unsloth-mxfp4_moe-O=Q6K-EQKUD=Q8_0.gguf?download=true) | 0.34 | 1605.97 | 0.2555 |
40
-
41
 
42
  ### Table - PPL Columns
43
 
44
  | model_name | gen | gen_er | code | code_er | math | math_er |
45
  | ---------- | --- | ------ | ---- | ------- | ---- | ------- |
46
- | [mxfp4_moe-EKUD=B16-O=Q6K-Q=Q8_0](./../../resolve/main/granite-4.0-h-350m-unsloth-mxfp4_moe-EKUD=B16-O=Q6K-Q=Q8_0.gguf?download=true) | 18.1560 | 0.4667 | 1.9548 | 0.0175 | 10.2986 | 0.2319 |
47
- | [mxfp4_moe-O=Q6K-EQKUD=Q8_0](./../../resolve/main/granite-4.0-h-350m-unsloth-mxfp4_moe-O=Q6K-EQKUD=Q8_0.gguf?download=true) | 18.2304 | 0.4691 | 1.9555 | 0.0175 | 10.3074 | 0.2320
48
 
49
  ### Table - Precision Loss Columns
50
 
51
  | model_name | loss_general | loss_code | loss_math |
52
  | ---------- | ------------ | --------- | --------- |
53
- | [mxfp4_moe-EKUD=B16-O=Q6K-Q=Q8_0](./../../resolve/main/granite-4.0-h-350m-unsloth-mxfp4_moe-EKUD=B16-O=Q6K-Q=Q8_0.gguf?download=true) | 0.1368 | 0.0051 | 0.1030 |
54
- | [mxfp4_moe-O=Q6K-EQKUD=Q8_0](./../../resolve/main/granite-4.0-h-350m-unsloth-mxfp4_moe-O=Q6K-EQKUD=Q8_0.gguf?download=true) | 0.5471 | 0.0307 | 0.1886 |
55
-
56
 
57
  ---
58
 
@@ -88,13 +86,13 @@ This tells users the *default* quantization for the majority of tensors.
88
  If certain tensor groups use a different quant scheme than the base, they appear afterwards as:
89
 
90
  ```
91
- <GroupLetters>=<Quant>
92
  ```
93
 
94
  Multiple group blocks can be chained:
95
 
96
  ```
97
- <Model>-<Base>-<Groups>=<Quant>-<Groups>=<Quant>...
98
  ```
99
 
100
  Only *exceptions* appear.
@@ -128,7 +126,7 @@ These are the compact codes for each major tensor group:
128
  If multiple groups share the same quant scheme, combine them:
129
 
130
  ```
131
- EH=B16
132
  ```
133
 
134
  Means:
@@ -139,7 +137,7 @@ Means:
139
  Another block:
140
 
141
  ```
142
- QKO=IQ4NL
143
  ```
144
 
145
  Means:
@@ -163,7 +161,7 @@ You can stack as many blocks as needed.
163
  ### **Hybrid Name:**
164
 
165
  ```
166
- Qwen3-4B-MXFP4-EH=B16-QKO=IQ4NL.gguf
167
  ```
168
 
169
  This reads as:
@@ -182,13 +180,13 @@ Clean. Simple. Understandable. Portable.
182
  Example: Only embeddings become Q5_K.
183
 
184
  ```
185
- Qwen3-4B-MXFP4-E=Q5K.gguf
186
  ```
187
 
188
  If only MoE router changes:
189
 
190
  ```
191
- Qwen3-4B-MXFP4-R=Q8.gguf
192
  ```
193
 
194
  ---
@@ -207,8 +205,7 @@ No extra suffixes.
207
 
208
  ## 🧠 **8. Notation Guidelines (For Clarity and Aesthetics)**
209
 
210
- * Use hyphens `-` between blocks.
211
- * Use equals `=` between group and quant scheme.
212
  * No need for `_` unless the quant type requires it (`IQ4_NL`, `Q4_K_M`, etc.).
213
  * Order of groups doesn’t matter, but a consistent order is recommended:
214
 
@@ -222,11 +219,11 @@ This mirrors information flow: embeddings → attention → ffn → moe.
222
 
223
  | Description | Final Name |
224
  | ------------------------------------------ | ------------------------------ |
225
- | Base MXFP4, only embeddings = BF16 | `MXFP4-E=B16` |
226
- | Base IQ4_NL, Q/K/O = Q6_K | `IQ4NL-QKO=Q6K` |
227
- | Base Q6_K, Head & Router = BF16 | `Q6K-HR=B16` |
228
  | Everything BF16 → no hybrid | `B16` (or `F16/BF16` baseline) |
229
- | Full MoE override: experts + router = Q8_0 | `X R=Q8_0` → `XR=Q8_0` |
230
 
231
  ---
232
 
 
35
 
36
  | model_name | file_size_gb | bench_tps | avg_prec_loss |
37
  | ---------- | ------------ | --------- | ------------- |
38
+ | [mxfp4_moe-EKUD-B16-O-Q6K-Q-Q8_0](./../../resolve/main/granite-4.0-h-350m-unsloth-mxfp4_moe-EKUD-B16-O-Q6K-Q-Q8_0.gguf?download=true) | 0.54 | 1705.35 | 0.0816 |
39
+ | [mxfp4_moe-O-Q6K-EQKUD-Q8_0](./../../resolve/main/granite-4.0-h-350m-unsloth-mxfp4_moe-O-Q6K-EQKUD-Q8_0.gguf?download=true) | 0.34 | 1605.97 | 0.2555 |
 
40
 
41
  ### Table - PPL Columns
42
 
43
  | model_name | gen | gen_er | code | code_er | math | math_er |
44
  | ---------- | --- | ------ | ---- | ------- | ---- | ------- |
45
+ | [mxfp4_moe-EKUD-B16-O-Q6K-Q-Q8_0](./../../resolve/main/granite-4.0-h-350m-unsloth-mxfp4_moe-EKUD-B16-O-Q6K-Q-Q8_0.gguf?download=true) | 18.1560 | 0.4667 | 1.9548 | 0.0175 | 10.2986 | 0.2319 |
46
+ | [mxfp4_moe-O-Q6K-EQKUD-Q8_0](./../../resolve/main/granite-4.0-h-350m-unsloth-mxfp4_moe-O-Q6K-EQKUD-Q8_0.gguf?download=true) | 18.2304 | 0.4691 | 1.9555 | 0.0175 | 10.3074 | 0.2320
47
 
48
  ### Table - Precision Loss Columns
49
 
50
  | model_name | loss_general | loss_code | loss_math |
51
  | ---------- | ------------ | --------- | --------- |
52
+ | [mxfp4_moe-EKUD-B16-O-Q6K-Q-Q8_0](./../../resolve/main/granite-4.0-h-350m-unsloth-mxfp4_moe-EKUD-B16-O-Q6K-Q-Q8_0.gguf?download=true) | 0.1368 | 0.0051 | 0.1030 |
53
+ | [mxfp4_moe-O-Q6K-EQKUD-Q8_0](./../../resolve/main/granite-4.0-h-350m-unsloth-mxfp4_moe-O-Q6K-EQKUD-Q8_0.gguf?download=true) | 0.5471 | 0.0307 | 0.1886 |
 
54
 
55
  ---
56
 
 
86
  If certain tensor groups use a different quant scheme than the base, they appear afterwards as:
87
 
88
  ```
89
+ <GroupLetters>-<Quant>
90
  ```
91
 
92
  Multiple group blocks can be chained:
93
 
94
  ```
95
+ <Model>-<Base>-<Groups>-<Quant>-<Groups>-<Quant>...
96
  ```
97
 
98
  Only *exceptions* appear.
 
126
  If multiple groups share the same quant scheme, combine them:
127
 
128
  ```
129
+ EH-B16
130
  ```
131
 
132
  Means:
 
137
  Another block:
138
 
139
  ```
140
+ QKO-IQ4NL
141
  ```
142
 
143
  Means:
 
161
  ### **Hybrid Name:**
162
 
163
  ```
164
+ Qwen3-4B-MXFP4-EH-B16-QKO-IQ4NL.gguf
165
  ```
166
 
167
  This reads as:
 
180
  Example: Only embeddings become Q5_K.
181
 
182
  ```
183
+ Qwen3-4B-MXFP4-E-Q5K.gguf
184
  ```
185
 
186
  If only MoE router changes:
187
 
188
  ```
189
+ Qwen3-4B-MXFP4-R-Q8.gguf
190
  ```
191
 
192
  ---
 
205
 
206
  ## 🧠 **8. Notation Guidelines (For Clarity and Aesthetics)**
207
 
208
+ * Use hyphens `-` throughout the naming scheme for consistency.
 
209
  * No need for `_` unless the quant type requires it (`IQ4_NL`, `Q4_K_M`, etc.).
210
  * Order of groups doesn’t matter, but a consistent order is recommended:
211
 
 
219
 
220
  | Description | Final Name |
221
  | ------------------------------------------ | ------------------------------ |
222
+ | Base MXFP4, only embeddings = BF16 | `MXFP4-E-B16` |
223
+ | Base IQ4_NL, Q/K/O = Q6_K | `IQ4NL-QKO-Q6K` |
224
+ | Base Q6_K, Head & Router = BF16 | `Q6K-HR-B16` |
225
  | Everything BF16 → no hybrid | `B16` (or `F16/BF16` baseline) |
226
+ | Full MoE override: experts + router = Q8_0 | `X R-Q8_0` → `XR-Q8_0` |
227
 
228
  ---
229
 
granite-4.0-h-350m-unsloth-mxfp4_moe-EKUD-B16-O-Q6K-Q-Q8_0.gguf ADDED
@@ -0,0 +1,3 @@
 
 
 
 
1
+ version https://git-lfs.github.com/spec/v1
2
+ oid sha256:689e694797a2074350577fea7ad61a8167c6e0ee98bfbbfb840c28ac76974310
3
+ size 580910016
granite-4.0-h-350m-unsloth-mxfp4_moe-O-Q6K-EQKUD-Q8_0.gguf ADDED
@@ -0,0 +1,3 @@
 
 
 
 
1
+ version https://git-lfs.github.com/spec/v1
2
+ oid sha256:5e5a53b4e80abe4039661da23e477a3b8f4b641b7b40a70829c0243db0dd437f
3
+ size 365624256