Instructions to use 8bit-coder/alpaca-7b-nativeEnhanced with libraries, inference providers, notebooks, and local apps. Follow these links to get started.
- Libraries
- Adapters
How to use 8bit-coder/alpaca-7b-nativeEnhanced with Adapters:
from adapters import AutoAdapterModel model = AutoAdapterModel.from_pretrained("undefined") model.load_adapter("8bit-coder/alpaca-7b-nativeEnhanced", set_active=True) - Notebooks
- Google Colab
- Kaggle
Loading the model 26gb?
I was trying to load up the model to integrate it with llama index, but does running this really use 26gb of vram? Is there a way to reduce this down?
Thanks!
The model would likely need to be quantized to use less memory. You could probably load it as-is with the --load-in-8bit flag when using text-generation-webui. (The 8bit feature is provided by the bitsandbytes python dep.)
To take it down farther, it could be quantized to 4-bits. There's another discussion thread here that talks about that.
For 8bit, you can run the model in its current form. For 4-bit, you'll have to run a quantization step yourself, which takes a while, but is totally doable on a local machine.
i bet this model released as FP32 instead of FP16