Instructions to use ctu-aic/Llama-3.1-8B-Instruct_it-mix with libraries, inference providers, notebooks, and local apps. Follow these links to get started.

Libraries

How to use ctu-aic/Llama-3.1-8B-Instruct_it-mix with Transformers:

# Use a pipeline as a high-level helper
from transformers import pipeline

pipe = pipeline("text-generation", model="ctu-aic/Llama-3.1-8B-Instruct_it-mix")
messages = [
    {"role": "user", "content": "Who are you?"},
]
pipe(messages)

# Load model directly
from transformers import AutoTokenizer, AutoModelForCausalLM

tokenizer = AutoTokenizer.from_pretrained("ctu-aic/Llama-3.1-8B-Instruct_it-mix")
model = AutoModelForCausalLM.from_pretrained("ctu-aic/Llama-3.1-8B-Instruct_it-mix")
messages = [
    {"role": "user", "content": "Who are you?"},
]
inputs = tokenizer.apply_chat_template(
	messages,
	add_generation_prompt=True,
	tokenize=True,
	return_dict=True,
	return_tensors="pt",
).to(model.device)

outputs = model.generate(**inputs, max_new_tokens=40)
print(tokenizer.decode(outputs[0][inputs["input_ids"].shape[-1]:]))

Notebooks
Google Colab
Kaggle
Local Apps

vLLM

How to use ctu-aic/Llama-3.1-8B-Instruct_it-mix with vLLM:

Install from pip and serve model

# Install vLLM from pip:
pip install vllm
# Start the vLLM server:
vllm serve "ctu-aic/Llama-3.1-8B-Instruct_it-mix"
# Call the server using curl (OpenAI-compatible API):
curl -X POST "http://localhost:8000/v1/chat/completions" \
	-H "Content-Type: application/json" \
	--data '{
		"model": "ctu-aic/Llama-3.1-8B-Instruct_it-mix",
		"messages": [
			{
				"role": "user",
				"content": "What is the capital of France?"
			}
		]
	}'

Use Docker

docker model run hf.co/ctu-aic/Llama-3.1-8B-Instruct_it-mix

SGLang

How to use ctu-aic/Llama-3.1-8B-Instruct_it-mix with SGLang:

Install from pip and serve model

# Install SGLang from pip:
pip install sglang
# Start the SGLang server:
python3 -m sglang.launch_server \
    --model-path "ctu-aic/Llama-3.1-8B-Instruct_it-mix" \
    --host 0.0.0.0 \
    --port 30000
# Call the server using curl (OpenAI-compatible API):
curl -X POST "http://localhost:30000/v1/chat/completions" \
	-H "Content-Type: application/json" \
	--data '{
		"model": "ctu-aic/Llama-3.1-8B-Instruct_it-mix",
		"messages": [
			{
				"role": "user",
				"content": "What is the capital of France?"
			}
		]
	}'

Use Docker images

docker run --gpus all \
    --shm-size 32g \
    -p 30000:30000 \
    -v ~/.cache/huggingface:/root/.cache/huggingface \
    --env "HF_TOKEN=<secret>" \
    --ipc=host \
    lmsysorg/sglang:latest \
    python3 -m sglang.launch_server \
        --model-path "ctu-aic/Llama-3.1-8B-Instruct_it-mix" \
        --host 0.0.0.0 \
        --port 30000
# Call the server using curl (OpenAI-compatible API):
curl -X POST "http://localhost:30000/v1/chat/completions" \
	-H "Content-Type: application/json" \
	--data '{
		"model": "ctu-aic/Llama-3.1-8B-Instruct_it-mix",
		"messages": [
			{
				"role": "user",
				"content": "What is the capital of France?"
			}
		]
	}'

Docker Model Runner
How to use ctu-aic/Llama-3.1-8B-Instruct_it-mix with Docker Model Runner:
```
docker model run hf.co/ctu-aic/Llama-3.1-8B-Instruct_it-mix
```

Model Card for Llama 3.1 8B Instruct -> IT_(cs+en)

Llama 3.1 8B Instruct instruction-tuned using a mixture of cs_instruction_tuning_collection and en_instruction_tuning_collection. More information in the thesis: TBA. (The notation is thesis is: B+IT -> IT_(cs+en))

🛑 Ethical Considerations and Limitations

This model is a Czech-adapted version of Meta's LLaMA 3.1 8B Instruct, developed as part of master's thesis. It is intended solely for academic and research purposes.

⚠️ Not Intended for Production Use: This model has not undergone extensive safety testing, fine-tuning for alignment, or robust filtering of harmful outputs. Do not deploy this model in any application or setting that impacts users or the public.
❗ Potential for Harm: The model may generate biased, offensive, false, or otherwise harmful content. It does not include safeguards such as moderation layers or toxicity detection.
🧪 Experimental Nature: This model is an academic experiment accompanying a thesis project and may contain unintended behaviors or limitations due to limited training data, resources, or evaluation.
👤 Responsibility: Any use of this model is at the user’s own risk. The author does not assume responsibility for any consequences arising from the use of the model.
🔒 Respect for Original License: This adaptation is subject to the original terms and conditions set by Meta for LLaMA models.

Researchers and practitioners using this model must ensure appropriate ethical oversight and conduct rigorous evaluations before any further deployment or fine-tuning.

Citation

@mastersthesis{mlynar2025llmadapt,
  author  = {Tomáš Mlynář},
  title   = {Compute-constrained LLM adaptation to Czech language},
  school  = {Czech Technical University in Prague},
  year    = {2025},
  type    = {Master's thesis},
  month   = {6},
  note    = {Supervisor: Ing. Herbert Ullrich},
  url     = {http://hdl.handle.net/10467/123587}
}