Safetensors
zaya
Eval Results

Is this model more about technique validation or production ready usability?

#14
by weicj - opened

No offend, I do really hope to try this model, but it looks like too complicated for us to run it locally.

Supposedly 8G Vram should be perfect holder for this model at some proper quants, if anyone has any idea to run this within small VRAM env, feel free to share and let us know

Some options to run with small VRAM are cpu offload and lowering allocated size for kv cache. I am currently trying to find a quant setup that preserves output quality but nothing final yet.

Some options to run with small VRAM are cpu offload and lowering allocated size for kv cache. I am currently trying to find a quant setup that preserves output quality but nothing final yet.

yes please, we look forward to that. The size looks promising, but if no usable quants, ppl will go for bigger model (should they have 24G VRAM), making the small scale meaningless :(

Sign up or log in to comment