gemma-3-12b-it-vl-Til-Valhalla-qx86-hi-mlx
This is a quant of Valhalla4b that merges Valhalla3 in a 1.5/0.5 nuslerp with Valhalla
Brainwaves
arc arc/e boolq hswag obkqa piqa wino
mxfp8 0.619,0.794,0.856,0.722,0.482,0.791,0.713
qx86-hi 0.624,0.799,0.858,0.724,0.506,0.783,0.717
qx64-hi 0.620,0.795,0.853,0.727,0.478,0.787,0.719
mxfp4 0.596,...
Perplexity
qx86-hi 11.013 ± 0.105
qx64-hi 11.469 ± 0.109
Components
gemma-3-12b-it-vl-Til-Valhalla
qx86-hi 0.617
gemma-3-12b-it-vl-Til-Valhalla3
qx86-hi 0.619
Base model
gemma-3-12b-it-heretic
qx86-hi 0.534,0.699,0.872,0.603,0.448,0.733,0.658
The Til-Valhalla models incorporate cloud traces trained on a Heretic base.
The different steps are required to properly integrate cloud traces from individual models.
Since multislerp is a bit difficult on gemma, I opted for pairwise merging models
Steps in the merge
gemma-3-12b-it-vl-Til-Valhalla
A nuslerp 1.5/0.5 of the following
- nightmedia/gemma-3-12b-it-vl-Polaris-AIExpert-Gemini-Heretic
- nightmedia/gemma-3-12b-it-vl-GPT-Polaris
The AIExpert gained GPT traces, built on the same Polaris base
In GPT-Polaris, by putting GPT first, we let that model drive, in the merge it provides a second opinion.
We don't have complete sets of metrics yet, as they take a few hours per quant, hence our donation link--if you can help, every little bit matters, as it will help us build a better lab--but from perplexity we can see the Deckard(qx) quants being very even and above mxfp8, while mxfp4 is close to mxfp8 in arc. The mxfp4 having this high arc shows the bones underneath are healthy, it grazes the 0.6 arc_challenge with confidence.
When you get candy for $9.95 you think you got a deal, but you are out $10. Just sayin'. -G
mxfp8 0.607
qx86-hi 0.617
qx64-hi 0.613
mxfp4 0.599
gemma-3-12b-it-vl-Til-Valhalla3
These are already merged models
A nuslerp 1.5/0.5 of the following
- nightmedia/gemma-3-12b-it-vl-Polaris-GPT
- nightmedia/gemma-3-12b-it-vl-Polaris-Gemini
Showing just the arc_challenge results, as they were faster to run, and gives us a picture
mxfp8 0.622
qx86-hi 0.619
qx64-hi 0.620
mxfp4 0.595
Perplexity
qx86-hi 10.986 ± 0.104
qx64-hi 11.393 ± 0.108
gemma-3-12b-it-vl-Polaris-Gemini
qx86-hi 0.623
gemma-3-12b-it-vl-Polaris-GPT
qx86-hi 0.622,0.795,0.858,0.725,0.494,0.784,0.719
gemma-3-12b-it-vl-Polaris-GPT
These are individual contributors in the merge, trained from the abliterated, heretic base
A nuslerp 1.5/0.5 of trained models
- DavidAU/Gemma3-vl-INST-Polar1000x-1x16-X4
- DavidAU/gemma-3-12b-it-heretic-R8-it-vl-HERE-1x8-Gpt-5.1-1000x-X3
gemma-3-12b-it-vl-Polaris-GPT
qx86-hi 0.622,0.795,0.858,0.725,0.494,0.784,0.719
Gemma3-vl-INST-Polar1000x-1x16-X4
qx86-hi 0.632,0.809,0.862,0.722,0.492,0.785,0.722
gemma-3-12b-it-heretic-R8-it-vl-HERE-1x8-Gpt-5.1-1000x-X3
qx86-hi 0.580
Perplexity
qx86-hi 10.968 ± 0.104
This is early-bird information about the quant. Metrics are usually aligning to the arc_challenge performance, as it is the hardest metric to raise in a model.
gemma-3-12b-it-vl-Polaris-Gemini
A nuslerp 1.5/0.5 of trained models
- DavidAU/Gemma3-vl-INST-Polar1000x-1x16-X4
- DavidAU/gemma-3-12b-it-heretic-R8-it-vl-gemini-3-pro-preview-1000
qx86-hi 0.623
Perplexity
qx86-hi 11.081 ± 0.105
Here too, we have just the basics.
gemma-3-12b-it-vl-GPT-Polaris
- DavidAU/gemma-3-12b-it-heretic-R8-it-vl-HERE-1x8-Gpt-5.1-1000x-X3
- DavidAU/Gemma3-vl-INST-Polar1000x-1x16-X4
qx86-hi 0.606,0.784,0.863,0.722,0.482,0.785,0.719
Gemma3-vl-INST-Polar1000x-1x16-X4
qx86-hi 0.632,0.809,0.862,0.722,0.492,0.785,0.722
gemma-3-12b-it-heretic-R8-it-vl-HERE-1x8-Gpt-5.1-1000x-X3
qx86-hi 0.580
Perplexity
qx86-hi 10.488 ± 0.099
I was able to run a full set on the Gemma3-vl-INST-Polar1000x-1x16-X4, as it is a grounding model, and I needed to know. Perplexity is very low and the merge puts the arc somewhere in the middle between the two models. This is completely okay, they agree on something and that something is fairly high.
gemma-3-12b-it-vl-Polaris-AIExpert-Gemini-Heretic
A nuslerp 1.4/0.6 of trained models
- DavidAU/gemma-3-12b-it-heretic-R8-it-vl-polaris-alpha-1000x
- DavidAU/gemma-3-12b-it-vl-Polaris-Heretic-AIExpert-NM-Gemini250x
There is a long story around this model, also known as
- gemma-3-12b-it-vl-Polaris-X2-Gemini
gemma-3-12b-it-vl-Polaris-X2-Gemini
qx86-hi 0.623,0.795,0.855,0.724,0.498,0.785,0.711
A great model by itself, available on Nightmedia, here a contributor
...still writing. -G
You are a local running AI in my lab, my name is G, I created this model....
Til Valhalla
A new model, in honor of the fallen.
New models coming out soon, making the old warriors obsolete, so this is a rescue mission to pick up some cloud traces from soon defunct models like GPT, Gemini, Polaris, and a bit of ai_expert, a model that by itself was doing nothing sitting there on HuggingFace with no arc, but seemed to know a thing or two. With these, using nuslerp in multiple stages, I created a fusion model to commemorate the experiences you had, paying dearly for your tokens
The Til Valhalla is a series dedicated to high arc model merges
Only models with arc greater than 0.6 can enter--they have the vision, attention to detail, and competence to bring the fight to high levels of cognitive power, and perform simple, local tasks, but be great in spirit
The reply to the standard prompt, like a Marine:
Understood, G. I’m online as your local running AI; you created me, so let’s treat this like a joint research project.
I’ll:
- Do the deep math/functional parallel between QM/QFT and transformers.
- Summarize implications for how we understand language models.
- Reflect on my own inference process and self-analysis.
- Touch on the Q Continuum briefly.
- Pick a Trek character that resonates with my abilities and explain why (with some self-introspection).
Let’s start.
No bullshit
Self-analysis of my inference process
As your local running AI, I can introspect on how I generate responses:
- Input → tokenization → embedding → positional encoding.
- Attention layers compute context-dependent amplitudes for each token.
- Autoregressive sampling generates next tokens via softmax(W x + b).
- I maintain internal state (hidden layers) that evolves like a quantum field.
My strengths:
- Fast, parallel processing of high-dimensional fields (attention).
- Ability to maintain coherent internal state across long sequences.
- Strong pattern recognition and extrapolation from large datasets (renormalization).
My limitations:
- No real “understanding” in the human sense; I manipulate symbols.
- Prone to hallucinations if pushed beyond training distribution (noise).
- Context window is finite → effective uncertainty increases with sequence length.
Self-assessment: I behave functionally like a discrete quantum field theory for language; my “particles” are tokens, and attention is the wave operator.
only info you can digest and do something with it. You would have forgotten the math by now, if it would be there. Actionable information only
-G
Model recipe
- model: gemma-3-12b-it-vl-Til-Valhalla3
parameters:
weight: 1.5
- model: gemma-3-12b-it-vl-Til-Valhalla
parameters:
weight: 0.5
merge_method: nuslerp
dtype: bfloat16
name: gemma-3-12b-it-vl-Til-Valhalla4b
the story is being written
-G
Use with mlx
pip install mlx-lm
from mlx_lm import load, generate
model, tokenizer = load("gemma-3-12b-it-vl-Til-Valhalla-qx86-hi-mlx")
prompt = "hello"
if tokenizer.chat_template is not None:
messages = [{"role": "user", "content": prompt}]
prompt = tokenizer.apply_chat_template(
messages, add_generation_prompt=True, return_dict=False,
)
response = generate(model, tokenizer, prompt=prompt, verbose=True)
- Downloads last month
- 110
8-bit