Instructions to use ozcur/alpaca-native-4bit with libraries, inference providers, notebooks, and local apps. Follow these links to get started.
- Libraries
- Transformers
How to use ozcur/alpaca-native-4bit with Transformers:
# Use a pipeline as a high-level helper from transformers import pipeline pipe = pipeline("text-generation", model="ozcur/alpaca-native-4bit")# Load model directly from transformers import AutoTokenizer, AutoModelForCausalLM tokenizer = AutoTokenizer.from_pretrained("ozcur/alpaca-native-4bit") model = AutoModelForCausalLM.from_pretrained("ozcur/alpaca-native-4bit") - Notebooks
- Google Colab
- Kaggle
- Local Apps
- vLLM
How to use ozcur/alpaca-native-4bit with vLLM:
Install from pip and serve model
# Install vLLM from pip: pip install vllm # Start the vLLM server: vllm serve "ozcur/alpaca-native-4bit" # Call the server using curl (OpenAI-compatible API): curl -X POST "http://localhost:8000/v1/completions" \ -H "Content-Type: application/json" \ --data '{ "model": "ozcur/alpaca-native-4bit", "prompt": "Once upon a time,", "max_tokens": 512, "temperature": 0.5 }'Use Docker
docker model run hf.co/ozcur/alpaca-native-4bit
- SGLang
How to use ozcur/alpaca-native-4bit with SGLang:
Install from pip and serve model
# Install SGLang from pip: pip install sglang # Start the SGLang server: python3 -m sglang.launch_server \ --model-path "ozcur/alpaca-native-4bit" \ --host 0.0.0.0 \ --port 30000 # Call the server using curl (OpenAI-compatible API): curl -X POST "http://localhost:30000/v1/completions" \ -H "Content-Type: application/json" \ --data '{ "model": "ozcur/alpaca-native-4bit", "prompt": "Once upon a time,", "max_tokens": 512, "temperature": 0.5 }'Use Docker images
docker run --gpus all \ --shm-size 32g \ -p 30000:30000 \ -v ~/.cache/huggingface:/root/.cache/huggingface \ --env "HF_TOKEN=<secret>" \ --ipc=host \ lmsysorg/sglang:latest \ python3 -m sglang.launch_server \ --model-path "ozcur/alpaca-native-4bit" \ --host 0.0.0.0 \ --port 30000 # Call the server using curl (OpenAI-compatible API): curl -X POST "http://localhost:30000/v1/completions" \ -H "Content-Type: application/json" \ --data '{ "model": "ozcur/alpaca-native-4bit", "prompt": "Once upon a time,", "max_tokens": 512, "temperature": 0.5 }' - Docker Model Runner
How to use ozcur/alpaca-native-4bit with Docker Model Runner:
docker model run hf.co/ozcur/alpaca-native-4bit
YAML Metadata Warning:empty or missing yaml metadata in repo card
Check out the documentation for more information.
This is 4-bit quantization of chavinlo/alpaca-native (cecc16d) via qwopqwop200/GPTQ-for-LLaMa (5cdfad2).
Quantization invoked as such:
llama.py /output/path c4 --wbits 4 --groupsize 128 --save alpaca7b-4bit.pt
Inference example from the GPTQ repo and commit referenced above:
(gptq) [root@gpu03 GPTQ-for-LLaMa]# CUDA_VISIBLE_DEVICES=0 python llama_inference.py /root/alpaca-native-4bit --wbits 4 --groupsize 128 --load /root/alpaca-native-4bit/alpaca7b-4bit.pt --max_length 300 --text "$(cat test_prompt.txt)"
Loading model ...
Done.
### Instruction: What is an alpaca? How is it different from a llama?
### Response: Alpacas are soft and gentle, while llamas are stubborn and independent.</s>
(gptq) [root@gpu03 GPTQ-for-LLaMa]# CUDA_VISIBLE_DEVICES=0 python llama_inference.py /root/alpaca-native-4bit --wbits 4 --groupsize 128 --load /root/alpaca-native-4bit/alpaca7b-4bit.pt --max_length 300 --text "$(cat test_prompt.txt)"
Loading model ...
Done.
### Instruction: What is an alpaca? How is it different from a llama?
### Response: An alpaca is a small, domesticated species of livestock from the Andes region of South America. It is typically kept as a pet, and its fibers can be used for various purposes, such as making clothing and crafts. Alpacas are typically brown or black, and their ears and tails are often moved.
Although it is different from a llama, the two animals are often compared to when referring to their behavior.</s>
(gptq) [root@gpu03 GPTQ-for-LLaMa]# md5sum /root/alpaca-native-4bit/alpaca7b-4bit.pt
74849953cc54e313b972d2cc9a05c24b /root/alpaca-native-4bit/alpaca7b-4bit.pt
(gptq) [root@gpu03 GPTQ-for-LLaMa]#
- Downloads last month
- 13