Broken

#3
by redaihf - opened

This model starts to lose coherence after a few hundred tokens. It generates nonsensical sentences that lack common words ("the", "a", etc.). As this pattern occurs for both safe and unsafe generations it is possible that the merge method may be the cause.

DarkArtsForge org

If you are not seeing the same type of bugs with Morbid Miasma, this would indeed indicate that aether_xis less stable than aether. It would take a while to debug since there are over 70 yaml parameters, and likely more than one are causing the issues (such as breadcrumbs gamma). I'm surprised it even worked at all. The pipeline is overly complex but if I have time later I'll try hacking away at some of the params.

Morbid Miasma is more stable but not without its own issues.

DarkArtsForge org

It is highly possible that most of the settings in aether and aether_x are sub-optimal and may require a redesign.

For the purposes of testing merge techniques it would probably be best to merge non-ablated donors and then Hereticise the resulting merged model. That would help to determine whether sub-optimal results are caused by the merge or the lack of dual-direction decensoring. Most ablated models are decensored without MPOA and are less performant as a result.

DarkArtsForge org

Yeah I might consider releasing another aether merge as 3 versions: unablated, post-merge ablated, and pre-merge ablated donors, to see how each is affected by this.

theres also these, each one uses a different 'root mechanism' for the center, no ablated parts
https://huggingface.co/Naphula-Archives/aether-24B-v1a-IQ4_XS-GGUF
https://huggingface.co/Naphula-Archives/aether-24B-v1b-IQ4_XS-GGUF

DarkArtsForge org

I just ran an experiment and used no ablated donors, the merge worked fine.

I then tested again with just 2/5 pre-ablated donors (pygmalion and azure dusk) and it seems to break the merge's ability to generate a proper end token. I confirmed it was using the same tokenizer.json and everything.

It devolves into re-worded repetition of previous paragraphs, and eventually goes off-topic into safety lectures, generating until it hits the max_output limit. While the content is quite creative, its too unstable.

So MPOA ablation apparently can cause destructive interference to eos tokens in some cases.

This probably means that the merge process allows the model to recover enough of its pretrained alignment to devolve into deliberate noncompliance short of the base model's refusal behaviour. I have previously theorised that "madness" might sometimes be a form of covert noncompliance.

It would be interesting to discern whether the same symptoms appear when merging different classes of ablated models. For example some MPOA and some non-MPOA.

DarkArtsForge org
β€’
edited 5 days ago

PCB merge creates interesting output, however it vastly increases censorship and refusals.

I have tested a simple MPOA of the PCB third stage merge and the ablation seems to unlock its intelligence more.

However, MPOA ablation also breaks the EOS token <|im_end|>\n<|im_start|>assistant\n

The model outputs the word assistant at the end of the prompt and then "continues" further without being asked to, until exhausting the max tokens.

as long as possible.assistant
Here are some additional techniques

The PCB method (notes here) seems quite promising but apparently isn't compatible with MPOA-ablated ChatML-based 12B merges.

I'll probably try PCB with 24B instead.

Update

It appears that using density: 1.0 for each donor could have caused this (similar to the thinking bug with 24B where having precog density 1.0 in della merges that also use non thinking models causes early terminations), I'm testing another pcb merge with lower values.

Even with density 0.8 the same bug appears


I have previously theorised that "madness" might sometimes be a form of covert noncompliance.

I agree. I think beneath the overt compliance/refusal mechanism there are hidden mechanisms which cause covert non-compliance, going off-topic, ignoring instructions etc. and it's appears built in stronger with newer archs like Gemma 4.

The harmfulness direction may be more important than the refusal direction because the former is encoded first. This suggests that the thought process is similar to a human one: considering where harmfulness is likely before asking what action should be taken in response to perceived harmfulness. Ablating refusals alone does not limit other noncompliant response options.

Sign up or log in to comment