Either it's bad model or something is off in quantization.

#7
by AImhotep - opened

Hi, I'm using your IQ3_XXS and IQ2_XXS. In both cases there is this issue with overly long thinking that can be very repeatitive (but it's not strict loop).
I can't use any higher quantization.

Is it my llama.cpp (I'm compiling from sources daily) or someone has similar experience?

Sign up or log in to comment