Instructions to use HiTZ/Latxa-Llama-3.1-8B-Instruct with libraries, inference providers, notebooks, and local apps. Follow these links to get started.
- Libraries
- Transformers
How to use HiTZ/Latxa-Llama-3.1-8B-Instruct with Transformers:
# Use a pipeline as a high-level helper from transformers import pipeline pipe = pipeline("text-generation", model="HiTZ/Latxa-Llama-3.1-8B-Instruct") messages = [ {"role": "user", "content": "Who are you?"}, ] pipe(messages)# Load model directly from transformers import AutoTokenizer, AutoModelForCausalLM tokenizer = AutoTokenizer.from_pretrained("HiTZ/Latxa-Llama-3.1-8B-Instruct") model = AutoModelForCausalLM.from_pretrained("HiTZ/Latxa-Llama-3.1-8B-Instruct") messages = [ {"role": "user", "content": "Who are you?"}, ] inputs = tokenizer.apply_chat_template( messages, add_generation_prompt=True, tokenize=True, return_dict=True, return_tensors="pt", ).to(model.device) outputs = model.generate(**inputs, max_new_tokens=40) print(tokenizer.decode(outputs[0][inputs["input_ids"].shape[-1]:])) - Inference
- Notebooks
- Google Colab
- Kaggle
- Local Apps
- vLLM
How to use HiTZ/Latxa-Llama-3.1-8B-Instruct with vLLM:
Install from pip and serve model
# Install vLLM from pip: pip install vllm # Start the vLLM server: vllm serve "HiTZ/Latxa-Llama-3.1-8B-Instruct" # Call the server using curl (OpenAI-compatible API): curl -X POST "http://localhost:8000/v1/chat/completions" \ -H "Content-Type: application/json" \ --data '{ "model": "HiTZ/Latxa-Llama-3.1-8B-Instruct", "messages": [ { "role": "user", "content": "What is the capital of France?" } ] }'Use Docker
docker model run hf.co/HiTZ/Latxa-Llama-3.1-8B-Instruct
- SGLang
How to use HiTZ/Latxa-Llama-3.1-8B-Instruct with SGLang:
Install from pip and serve model
# Install SGLang from pip: pip install sglang # Start the SGLang server: python3 -m sglang.launch_server \ --model-path "HiTZ/Latxa-Llama-3.1-8B-Instruct" \ --host 0.0.0.0 \ --port 30000 # Call the server using curl (OpenAI-compatible API): curl -X POST "http://localhost:30000/v1/chat/completions" \ -H "Content-Type: application/json" \ --data '{ "model": "HiTZ/Latxa-Llama-3.1-8B-Instruct", "messages": [ { "role": "user", "content": "What is the capital of France?" } ] }'Use Docker images
docker run --gpus all \ --shm-size 32g \ -p 30000:30000 \ -v ~/.cache/huggingface:/root/.cache/huggingface \ --env "HF_TOKEN=<secret>" \ --ipc=host \ lmsysorg/sglang:latest \ python3 -m sglang.launch_server \ --model-path "HiTZ/Latxa-Llama-3.1-8B-Instruct" \ --host 0.0.0.0 \ --port 30000 # Call the server using curl (OpenAI-compatible API): curl -X POST "http://localhost:30000/v1/chat/completions" \ -H "Content-Type: application/json" \ --data '{ "model": "HiTZ/Latxa-Llama-3.1-8B-Instruct", "messages": [ { "role": "user", "content": "What is the capital of France?" } ] }' - Docker Model Runner
How to use HiTZ/Latxa-Llama-3.1-8B-Instruct with Docker Model Runner:
docker model run hf.co/HiTZ/Latxa-Llama-3.1-8B-Instruct
Parameters (temperature) recommendation?
Hello,
Do you have any recommendation on what parameters settings are optimal for this model? Especially related to temperature.
Some tests I've tried result in returning extremely similar outcomes for few-shot experiments (0-shot seemed to work as intended...)
Thank you
Hello @anto5040 ,
Could you clarify what you mean by “extremely similar outcomes”?
We recommend those reported in the generation_config.json file. If you can provide more details about the issue you are encountering, we may be able to offer more specific recommendations.
Best
Thank you for your fast answer @OSainz
I was indeed not using the recommended parameters because I wasn't aware of that file, thank you, it seems really useful. I'll try with them.
I was trying to classify user's intents from different sentences, and was consistently (90% of the time, but not 100%) getting a response like this:
def sailkatu_asmo(esaldia, etiketak):
# Baimendutako etiketak (erabili zehazki minuskulako kateak):
etiketak = { ...
Some variations did occur on the name of the function or the arguments, but i was consistently getting python functions instead of the intents. It might have to do with some of the syntax used for the few-shot experiments, but i haven't had any problems like this with the non instruct version, nor other LLMs. Let's see what results I get with the recommended parameters.
I was using transformers 4.57.3 and temperature = 0.1.