Add quantization variants
#31
by kndtran - opened
No description provided.
Updated the PR with the following changes:
- F16 LoRA adapters: Replaced q8_0 GGUF adapters with F16. Since LoRA adapters are very small (few MB each), quantizing them saves negligible space while risking quality loss. F16 adapters work with any base model quantization level.
- Pre-quantized base model from HuggingFace: The conversion script now downloads a pre-quantized GGUF from
ibm-granite/granite-4.0-micro-GGUFinstead of converting locally. Quantization type is configurable via CLI arg (defaults to f16):bash _ollama/convert_to_gguf.sh Q4_K_M - Portable Modelfiles: Switched from absolute paths to relative paths so Modelfiles work on any machine.
- No Ollama dependency: Modelfiles reference the downloaded GGUF file directly instead of requiring
granite4:microto be pre-installed in Ollama. - Added
*.logto.gitignore
kndtran changed pull request status to open
frreiss changed pull request status to merged