hovering around the gap (96+32), do you plan a q5k_p? If not 4k_p or 5k_m?

#4
by Andyx1976 - opened

Not a complaint, just asking. Just wondering if i should wait. Q6 with 100gb is a bit too close to the edge with 32x96gb and a busy system, so 5k_p would likely the best maximum.

Also, if not, how confident is that statement about the K_P making it 1-2 Q's better? Is q4k_p as good/better than q5k_m? It would save a bit but it's hard to judge with a house variation of a Quant.
Thanks for advise, if any :)
Oh yes: and thanks for the effort. You rolled the field up from behind with this series.

Hey,

No worries. I can work on a Q5_K_P. My only worry is the limited space I have available for models on my huggingface (something along the lines of 8-9 tb).

The benefits of K_P are smaller in MoE models. Performance of a Q4_K_P would be similar if not JUST under a Q5_K_M but above a Q5_K_S. For dense models performance can be up to 2 standard quants.

There really are a lot of variables when it comes to all of this and I was forced down the rabbit hole. I'd say that for most things that aren't agentic coding, it's nice to have Q4 a bit faster and with extra context than Q5's. But even then, prompt adherence starts to suffer in long context sessions (like with all models).

well after a short discussion with you-know-who about benefits and disadvantages and that they are not as bad in a massive model... i think i go with 4k-p for now.. Should a up market q5 come out it seems the advantages of it maybe worth the speed penalty. If not, that's fine too :)
( https://share.google/aimode/PjYhuXJhcYdrl0zGA ) the discussion in question.

btw 96GB because i was led to believe that is the last amount for AM5 where things don't get very finicky.
Even though i did it preemptively already: Thanks for the advise.

well after a short discussion with you-know-who about benefits and disadvantages and that they are not as bad in a massive model... i think i go with 4k-p for now.. Should a up market q5 come out it seems the advantages of it maybe worth the speed penalty. If not, that's fine too :)
( https://share.google/aimode/PjYhuXJhcYdrl0zGA ) the discussion in question.

btw 96 because i was led to believe that is the last amount for AM5 where thing don't get very finicky.
Even though i did it preemptively already: Thanks for the advise.

Again, if you don't do agentic coding, I don't think you'll notice the difference between Q4 and Q5. You'd need a much bigger jump (think q6 or q8).

Q5_K_P is now uploaded though, enjoy :-)

I also have an AM5 CPU (9950x) with 96GB dual channel 6400mt/s :-) Can get finnicky even with this.

lol i just edited the post for the last time :) thanks a lot.

Gotta say my AM5(7800X3D) has been a rock for ages.

Sign up or log in to comment