After llama.cpp rework of template parsing, can no longer load this model

#15

by quasoft2 - opened 21 days ago

After the recent llama.cpp rework of template parsing, can no longer load this model. Anyone else managed to make it work with up-to-date llama.cpp?

srikanta-221

Google org 14 days ago

Hi @quasoft2 ,

Thanks for flagging this.

What's happening here is that TranslateGemma model uses a more structured chat template, while recent changes in llama.cpp made template parsing stricter and more standardised. Because of that, the format TranslateGemma expects doesn't line up cleanly with what llama.cpp versions accept, so you can hit errors during template parsing or when applying it.

From what I have seen, this is coming from the recent upstream template changes in llama.cpp, and there isn't really a clean 'works out of the box' path with latest HEAD right now for this specific model.

The most reliable workaround at the moment is to pin llama.cpp to a version from before those template changes, where handling was more permissive. It's also possible to bypass templating and build prompts manually, but that's not ideal if you want consistent translation behavior.

Longer term, this likely needs either better support on the llama.cpp side for these kind of structured templates or some adaption layer for model that use them. Until then, sticking to an older commit is probably the safest option if you need it working.

Thank you!

quasoft2

12 days ago

Turned out it is possible to get the model working with latest version of llama.cpp (after the template parsing changes), by customizing the jinja file of the model, and passing the path to that custom jinja file to llama.cpp.

Upload images, audio, and videos by dragging in the text input, pasting, or clicking here.

Tap or paste here to upload images

· Sign up or log in to comment