Instructions to use HiTZ/Latxa-Llama-3.1-8B-Instruct with libraries, inference providers, notebooks, and local apps. Follow these links to get started.

Libraries

How to use HiTZ/Latxa-Llama-3.1-8B-Instruct with Transformers:

# Use a pipeline as a high-level helper
from transformers import pipeline

pipe = pipeline("text-generation", model="HiTZ/Latxa-Llama-3.1-8B-Instruct")
messages = [
    {"role": "user", "content": "Who are you?"},
]
pipe(messages)

# Load model directly
from transformers import AutoTokenizer, AutoModelForCausalLM

tokenizer = AutoTokenizer.from_pretrained("HiTZ/Latxa-Llama-3.1-8B-Instruct")
model = AutoModelForCausalLM.from_pretrained("HiTZ/Latxa-Llama-3.1-8B-Instruct")
messages = [
    {"role": "user", "content": "Who are you?"},
]
inputs = tokenizer.apply_chat_template(
	messages,
	add_generation_prompt=True,
	tokenize=True,
	return_dict=True,
	return_tensors="pt",
).to(model.device)

outputs = model.generate(**inputs, max_new_tokens=40)
print(tokenizer.decode(outputs[0][inputs["input_ids"].shape[-1]:]))

Inference
Notebooks
Google Colab
Kaggle
Local Apps

vLLM

How to use HiTZ/Latxa-Llama-3.1-8B-Instruct with vLLM:

Install from pip and serve model

# Install vLLM from pip:
pip install vllm
# Start the vLLM server:
vllm serve "HiTZ/Latxa-Llama-3.1-8B-Instruct"
# Call the server using curl (OpenAI-compatible API):
curl -X POST "http://localhost:8000/v1/chat/completions" \
	-H "Content-Type: application/json" \
	--data '{
		"model": "HiTZ/Latxa-Llama-3.1-8B-Instruct",
		"messages": [
			{
				"role": "user",
				"content": "What is the capital of France?"
			}
		]
	}'

Use Docker

docker model run hf.co/HiTZ/Latxa-Llama-3.1-8B-Instruct

SGLang

How to use HiTZ/Latxa-Llama-3.1-8B-Instruct with SGLang:

Install from pip and serve model

# Install SGLang from pip:
pip install sglang
# Start the SGLang server:
python3 -m sglang.launch_server \
    --model-path "HiTZ/Latxa-Llama-3.1-8B-Instruct" \
    --host 0.0.0.0 \
    --port 30000
# Call the server using curl (OpenAI-compatible API):
curl -X POST "http://localhost:30000/v1/chat/completions" \
	-H "Content-Type: application/json" \
	--data '{
		"model": "HiTZ/Latxa-Llama-3.1-8B-Instruct",
		"messages": [
			{
				"role": "user",
				"content": "What is the capital of France?"
			}
		]
	}'

Use Docker images

docker run --gpus all \
    --shm-size 32g \
    -p 30000:30000 \
    -v ~/.cache/huggingface:/root/.cache/huggingface \
    --env "HF_TOKEN=<secret>" \
    --ipc=host \
    lmsysorg/sglang:latest \
    python3 -m sglang.launch_server \
        --model-path "HiTZ/Latxa-Llama-3.1-8B-Instruct" \
        --host 0.0.0.0 \
        --port 30000
# Call the server using curl (OpenAI-compatible API):
curl -X POST "http://localhost:30000/v1/chat/completions" \
	-H "Content-Type: application/json" \
	--data '{
		"model": "HiTZ/Latxa-Llama-3.1-8B-Instruct",
		"messages": [
			{
				"role": "user",
				"content": "What is the capital of France?"
			}
		]
	}'

Docker Model Runner
How to use HiTZ/Latxa-Llama-3.1-8B-Instruct with Docker Model Runner:
```
docker model run hf.co/HiTZ/Latxa-Llama-3.1-8B-Instruct
```

Parameters (temperature) recommendation?

by anto5040 - opened Jan 30

Discussion

anto5040

Jan 30

Hello,

Do you have any recommendation on what parameters settings are optimal for this model? Especially related to temperature.
Some tests I've tried result in returning extremely similar outcomes for few-shot experiments (0-shot seemed to work as intended...)

Thank you

OSainz

HiTZ zentroa org Jan 30

Hello @anto5040 ,

Could you clarify what you mean by “extremely similar outcomes”?

We recommend those reported in the generation_config.json file. If you can provide more details about the issue you are encountering, we may be able to offer more specific recommendations.

Best

anto5040

Feb 2

Thank you for your fast answer @OSainz

I was indeed not using the recommended parameters because I wasn't aware of that file, thank you, it seems really useful. I'll try with them.
I was trying to classify user's intents from different sentences, and was consistently (90% of the time, but not 100%) getting a response like this:

def sailkatu_asmo(esaldia, etiketak):
# Baimendutako etiketak (erabili zehazki minuskulako kateak):
etiketak = { ...

Some variations did occur on the name of the function or the arguments, but i was consistently getting python functions instead of the intents. It might have to do with some of the syntax used for the few-shot experiments, but i haven't had any problems like this with the non instruct version, nor other LLMs. Let's see what results I get with the recommended parameters.

I was using transformers 4.57.3 and temperature = 0.1.

Upload images, audio, and videos by dragging in the text input, pasting, or clicking here.

Tap or paste here to upload images

· Sign up or log in to comment