New Models
Collection
Quants created recently.. where time is relative • 128 items • Updated
Brainwaves
arc arc/e boolq hswag obkqa piqa wino
mxfp8 0.392,0.441,0.627,0.601,0.360,0.739,0.590
q8-hi 0.398,0.435,0.622,0.604,0.362,0.732,0.585
q8 0.398,0.434,0.622,0.604,0.362,0.733,0.582
q6-hi 0.398,0.436,0.622,0.601,0.366,0.733,0.589
q6 0.392,0.437,0.622,0.604,0.372,0.736,0.590
mxfp4 0.371,0.444,0.632,0.585,0.356,0.732,0.548
Quant Perplexity Peak memory
mxfp8 4.953 ± 0.035 9.61 GB
mxfp4 5.209 ± 0.037 7.65 GB
Qwen3.5-4B-Instruct
mxfp8 0.505,0.688,0.892,0.652,0.420,0.760,0.658
q8-hi 0.509,0.667,0.886,0.654,0.424,0.757,0.663
q8 0.507,0.663,0.885,0.654,0.424,0.758,0.662
q6-hi 0.501,0.663,0.887,0.652,0.434,0.757,0.661
q6 0.503,0.658,0.884,0.653,0.426,0.756,0.662
mxfp4 0.487,0.656,0.878,0.634,0.420,0.746,0.616
tvall43/Qwen3.5-4B-Text-heretic
arc arc/e boolq hswag obkqa piqa wino
mxfp8 0.507,0.686,0.881,0.654,0.424,0.756,0.660
mxfp4 0.480,0.656,0.878,0.635,0.418,0.742,0.624
Qwen3.5-4B-Text
mxfp8 0.392,0.439,0.628,0.601,0.360,0.739,0.585
Old model
Qwen3-4B-Thinking-2507
mxfp4 0.381,0.408,0.686,0.516,0.364,0.701,0.585
Qwen3-4B-Instruct-2507
dwq5 0.449,0.588,0.843,0.458,0.394,0.697,0.556
To enable Instruct mode, insert this line at the top of chat_template.jinja
{%- set enable_thinking = false %}
More metrics coming soon
-G
pip install mlx-lm
from mlx_lm import load, generate
model, tokenizer = load("Qwen3.5-4B-mxfp8-mlx")
prompt = "hello"
if tokenizer.chat_template is not None:
messages = [{"role": "user", "content": prompt}]
prompt = tokenizer.apply_chat_template(
messages, add_generation_prompt=True, return_dict=False,
)
response = generate(model, tokenizer, prompt=prompt, verbose=True)
8-bit