Is this model more about technique validation or production ready usability?

#14

by weicj - opened 10 days ago

Discussion

weicj

10 days ago

•

edited 10 days ago

No offend, I do really hope to try this model, but it looks like too complicated for us to run it locally.

Supposedly 8G Vram should be perfect holder for this model at some proper quants, if anyone has any idea to run this within small VRAM env, feel free to share and let us know

ganeshnanduru

Zyphra org 10 days ago

Some options to run with small VRAM are cpu offload and lowering allocated size for kv cache. I am currently trying to find a quant setup that preserves output quality but nothing final yet.

weicj

9 days ago

Some options to run with small VRAM are cpu offload and lowering allocated size for kv cache. I am currently trying to find a quant setup that preserves output quality but nothing final yet.

yes please, we look forward to that. The size looks promising, but if no usable quants, ppl will go for bigger model (should they have 24G VRAM), making the small scale meaningless :(

Upload images, audio, and videos by dragging in the text input, pasting, or clicking here.

Tap or paste here to upload images

· Sign up or log in to comment