hovering around the gap (96+32), do you plan a q5k_p? If not 4k_p or 5k_m?
Not a complaint, just asking. Just wondering if i should wait. Q6 with 100gb is a bit too close to the edge with 32x96gb and a busy system, so 5k_p would likely the best maximum.
Also, if not, how confident is that statement about the K_P making it 1-2 Q's better? Is q4k_p as good/better than q5k_m? It would save a bit but it's hard to judge with a house variation of a Quant.
Thanks for advise, if any :)
Oh yes: and thanks for the effort. You rolled the field up from behind with this series.
Hey,
No worries. I can work on a Q5_K_P. My only worry is the limited space I have available for models on my huggingface (something along the lines of 8-9 tb).
The benefits of K_P are smaller in MoE models. Performance of a Q4_K_P would be similar if not JUST under a Q5_K_M but above a Q5_K_S. For dense models performance can be up to 2 standard quants.
There really are a lot of variables when it comes to all of this and I was forced down the rabbit hole. I'd say that for most things that aren't agentic coding, it's nice to have Q4 a bit faster and with extra context than Q5's. But even then, prompt adherence starts to suffer in long context sessions (like with all models).
well after a short discussion with you-know-who about benefits and disadvantages and that they are not as bad in a massive model... i think i go with 4k-p for now.. Should a up market q5 come out it seems the advantages of it maybe worth the speed penalty. If not, that's fine too :)
( https://share.google/aimode/PjYhuXJhcYdrl0zGA ) the discussion in question.
btw 96GB because i was led to believe that is the last amount for AM5 where things don't get very finicky.
Even though i did it preemptively already: Thanks for the advise.
well after a short discussion with you-know-who about benefits and disadvantages and that they are not as bad in a massive model... i think i go with 4k-p for now.. Should a up market q5 come out it seems the advantages of it maybe worth the speed penalty. If not, that's fine too :)
( https://share.google/aimode/PjYhuXJhcYdrl0zGA ) the discussion in question.btw 96 because i was led to believe that is the last amount for AM5 where thing don't get very finicky.
Even though i did it preemptively already: Thanks for the advise.
Again, if you don't do agentic coding, I don't think you'll notice the difference between Q4 and Q5. You'd need a much bigger jump (think q6 or q8).
Q5_K_P is now uploaded though, enjoy :-)
I also have an AM5 CPU (9950x) with 96GB dual channel 6400mt/s :-) Can get finnicky even with this.
lol i just edited the post for the last time :) thanks a lot.
Gotta say my AM5(7800X3D) has been a rock for ages.