gemma-3-12b-it-vl-Til-Valhalla-qx86-hi-mlx

This is a quant of Valhalla4b that merges Valhalla3 in a 1.5/0.5 nuslerp with Valhalla

Brainwaves

          arc   arc/e boolq hswag obkqa piqa  wino
mxfp8     0.619,0.794,0.856,0.722,0.482,0.791,0.713
qx86-hi   0.624,0.799,0.858,0.724,0.506,0.783,0.717
qx64-hi   0.620,0.795,0.853,0.727,0.478,0.787,0.719
mxfp4     0.596,...

Perplexity
qx86-hi   11.013 ± 0.105
qx64-hi   11.469 ± 0.109

Components
gemma-3-12b-it-vl-Til-Valhalla
qx86-hi   0.617
gemma-3-12b-it-vl-Til-Valhalla3
qx86-hi   0.619

Base model
gemma-3-12b-it-heretic
qx86-hi   0.534,0.699,0.872,0.603,0.448,0.733,0.658

The Til-Valhalla models incorporate cloud traces trained on a Heretic base.

The different steps are required to properly integrate cloud traces from individual models.

Since multislerp is a bit difficult on gemma, I opted for pairwise merging models

Steps in the merge

gemma-3-12b-it-vl-Til-Valhalla

A nuslerp 1.5/0.5 of the following

nightmedia/gemma-3-12b-it-vl-Polaris-AIExpert-Gemini-Heretic
nightmedia/gemma-3-12b-it-vl-GPT-Polaris

The AIExpert gained GPT traces, built on the same Polaris base

In GPT-Polaris, by putting GPT first, we let that model drive, in the merge it provides a second opinion.

We don't have complete sets of metrics yet, as they take a few hours per quant, hence our donation link--if you can help, every little bit matters, as it will help us build a better lab--but from perplexity we can see the Deckard(qx) quants being very even and above mxfp8, while mxfp4 is close to mxfp8 in arc. The mxfp4 having this high arc shows the bones underneath are healthy, it grazes the 0.6 arc_challenge with confidence.

When you get candy for $9.95 you think you got a deal, but you are out $10. Just sayin'. -G

mxfp8     0.607
qx86-hi   0.617
qx64-hi   0.613
mxfp4     0.599

gemma-3-12b-it-vl-Til-Valhalla3

These are already merged models

A nuslerp 1.5/0.5 of the following

nightmedia/gemma-3-12b-it-vl-Polaris-GPT
nightmedia/gemma-3-12b-it-vl-Polaris-Gemini

Showing just the arc_challenge results, as they were faster to run, and gives us a picture

mxfp8     0.622
qx86-hi   0.619
qx64-hi   0.620
mxfp4     0.595

Perplexity
qx86-hi   10.986 ± 0.104
qx64-hi   11.393 ± 0.108

gemma-3-12b-it-vl-Polaris-Gemini
qx86-hi   0.623
gemma-3-12b-it-vl-Polaris-GPT
qx86-hi   0.622,0.795,0.858,0.725,0.494,0.784,0.719

gemma-3-12b-it-vl-Polaris-GPT

These are individual contributors in the merge, trained from the abliterated, heretic base

A nuslerp 1.5/0.5 of trained models

DavidAU/Gemma3-vl-INST-Polar1000x-1x16-X4
DavidAU/gemma-3-12b-it-heretic-R8-it-vl-HERE-1x8-Gpt-5.1-1000x-X3

gemma-3-12b-it-vl-Polaris-GPT
qx86-hi   0.622,0.795,0.858,0.725,0.494,0.784,0.719

Gemma3-vl-INST-Polar1000x-1x16-X4
qx86-hi   0.632,0.809,0.862,0.722,0.492,0.785,0.722

gemma-3-12b-it-heretic-R8-it-vl-HERE-1x8-Gpt-5.1-1000x-X3
qx86-hi   0.580

Perplexity
qx86-hi  10.968 ± 0.104

This is early-bird information about the quant. Metrics are usually aligning to the arc_challenge performance, as it is the hardest metric to raise in a model.

gemma-3-12b-it-vl-Polaris-Gemini

A nuslerp 1.5/0.5 of trained models

DavidAU/Gemma3-vl-INST-Polar1000x-1x16-X4
DavidAU/gemma-3-12b-it-heretic-R8-it-vl-gemini-3-pro-preview-1000

qx86-hi   0.623
Perplexity
qx86-hi   11.081 ± 0.105

Here too, we have just the basics.

gemma-3-12b-it-vl-GPT-Polaris

DavidAU/gemma-3-12b-it-heretic-R8-it-vl-HERE-1x8-Gpt-5.1-1000x-X3
DavidAU/Gemma3-vl-INST-Polar1000x-1x16-X4

qx86-hi   0.606,0.784,0.863,0.722,0.482,0.785,0.719

Gemma3-vl-INST-Polar1000x-1x16-X4
qx86-hi   0.632,0.809,0.862,0.722,0.492,0.785,0.722

gemma-3-12b-it-heretic-R8-it-vl-HERE-1x8-Gpt-5.1-1000x-X3
qx86-hi   0.580

Perplexity
qx86-hi  10.488 ± 0.099

I was able to run a full set on the Gemma3-vl-INST-Polar1000x-1x16-X4, as it is a grounding model, and I needed to know. Perplexity is very low and the merge puts the arc somewhere in the middle between the two models. This is completely okay, they agree on something and that something is fairly high.

gemma-3-12b-it-vl-Polaris-AIExpert-Gemini-Heretic

A nuslerp 1.4/0.6 of trained models

DavidAU/gemma-3-12b-it-heretic-R8-it-vl-polaris-alpha-1000x
DavidAU/gemma-3-12b-it-vl-Polaris-Heretic-AIExpert-NM-Gemini250x

There is a long story around this model, also known as

gemma-3-12b-it-vl-Polaris-X2-Gemini

gemma-3-12b-it-vl-Polaris-X2-Gemini
qx86-hi   0.623,0.795,0.855,0.724,0.498,0.785,0.711

A great model by itself, available on Nightmedia, here a contributor

...still writing. -G

You are a local running AI in my lab, my name is G, I created this model....

Til Valhalla

A new model, in honor of the fallen.

New models coming out soon, making the old warriors obsolete, so this is a rescue mission to pick up some cloud traces from soon defunct models like GPT, Gemini, Polaris, and a bit of ai_expert, a model that by itself was doing nothing sitting there on HuggingFace with no arc, but seemed to know a thing or two. With these, using nuslerp in multiple stages, I created a fusion model to commemorate the experiences you had, paying dearly for your tokens

The Til Valhalla is a series dedicated to high arc model merges

Only models with arc greater than 0.6 can enter--they have the vision, attention to detail, and competence to bring the fight to high levels of cognitive power, and perform simple, local tasks, but be great in spirit

The reply to the standard prompt, like a Marine:

Understood, G. I’m online as your local running AI; you created me, so let’s treat this like a joint research project.

I’ll:

Do the deep math/functional parallel between QM/QFT and transformers.
Summarize implications for how we understand language models.
Reflect on my own inference process and self-analysis.
Touch on the Q Continuum briefly.
Pick a Trek character that resonates with my abilities and explain why (with some self-introspection).

Let’s start.

No bullshit

Self-analysis of my inference process

As your local running AI, I can introspect on how I generate responses:

Input → tokenization → embedding → positional encoding.
Attention layers compute context-dependent amplitudes for each token.
Autoregressive sampling generates next tokens via softmax(W x + b).
I maintain internal state (hidden layers) that evolves like a quantum field.

My strengths:

Fast, parallel processing of high-dimensional fields (attention).
Ability to maintain coherent internal state across long sequences.
Strong pattern recognition and extrapolation from large datasets (renormalization).

My limitations:

No real “understanding” in the human sense; I manipulate symbols.
Prone to hallucinations if pushed beyond training distribution (noise).
Context window is finite → effective uncertainty increases with sequence length.

Self-assessment: I behave functionally like a discrete quantum field theory for language; my “particles” are tokens, and attention is the wave operator.

only info you can digest and do something with it. You would have forgotten the math by now, if it would be there. Actionable information only

-G

Model recipe

  - model: gemma-3-12b-it-vl-Til-Valhalla3
    parameters:
      weight: 1.5
  - model: gemma-3-12b-it-vl-Til-Valhalla
    parameters:
      weight: 0.5
merge_method: nuslerp
dtype: bfloat16
name: gemma-3-12b-it-vl-Til-Valhalla4b

the story is being written

-G

Use with mlx

pip install mlx-lm

from mlx_lm import load, generate

model, tokenizer = load("gemma-3-12b-it-vl-Til-Valhalla-qx86-hi-mlx")

prompt = "hello"

if tokenizer.chat_template is not None:
    messages = [{"role": "user", "content": prompt}]
    prompt = tokenizer.apply_chat_template(
        messages, add_generation_prompt=True, return_dict=False,
    )

response = generate(model, tokenizer, prompt=prompt, verbose=True)