Instructions to use LLaMAX/GlotMAX-17-14B with libraries, inference providers, notebooks, and local apps. Follow these links to get started.

Libraries

How to use LLaMAX/GlotMAX-17-14B with Transformers:

# Use a pipeline as a high-level helper
from transformers import pipeline

pipe = pipeline("text-generation", model="LLaMAX/GlotMAX-17-14B")
messages = [
    {"role": "user", "content": "Who are you?"},
]
pipe(messages)

# Load model directly
from transformers import AutoTokenizer, AutoModelForCausalLM

tokenizer = AutoTokenizer.from_pretrained("LLaMAX/GlotMAX-17-14B")
model = AutoModelForCausalLM.from_pretrained("LLaMAX/GlotMAX-17-14B")
messages = [
    {"role": "user", "content": "Who are you?"},
]
inputs = tokenizer.apply_chat_template(
	messages,
	add_generation_prompt=True,
	tokenize=True,
	return_dict=True,
	return_tensors="pt",
).to(model.device)

outputs = model.generate(**inputs, max_new_tokens=40)
print(tokenizer.decode(outputs[0][inputs["input_ids"].shape[-1]:]))

Notebooks
Google Colab
Kaggle
Local Apps

vLLM

How to use LLaMAX/GlotMAX-17-14B with vLLM:

Install from pip and serve model

# Install vLLM from pip:
pip install vllm
# Start the vLLM server:
vllm serve "LLaMAX/GlotMAX-17-14B"
# Call the server using curl (OpenAI-compatible API):
curl -X POST "http://localhost:8000/v1/chat/completions" \
	-H "Content-Type: application/json" \
	--data '{
		"model": "LLaMAX/GlotMAX-17-14B",
		"messages": [
			{
				"role": "user",
				"content": "What is the capital of France?"
			}
		]
	}'

Use Docker

docker model run hf.co/LLaMAX/GlotMAX-17-14B

SGLang

How to use LLaMAX/GlotMAX-17-14B with SGLang:

Install from pip and serve model

# Install SGLang from pip:
pip install sglang
# Start the SGLang server:
python3 -m sglang.launch_server \
    --model-path "LLaMAX/GlotMAX-17-14B" \
    --host 0.0.0.0 \
    --port 30000
# Call the server using curl (OpenAI-compatible API):
curl -X POST "http://localhost:30000/v1/chat/completions" \
	-H "Content-Type: application/json" \
	--data '{
		"model": "LLaMAX/GlotMAX-17-14B",
		"messages": [
			{
				"role": "user",
				"content": "What is the capital of France?"
			}
		]
	}'

Use Docker images

docker run --gpus all \
    --shm-size 32g \
    -p 30000:30000 \
    -v ~/.cache/huggingface:/root/.cache/huggingface \
    --env "HF_TOKEN=<secret>" \
    --ipc=host \
    lmsysorg/sglang:latest \
    python3 -m sglang.launch_server \
        --model-path "LLaMAX/GlotMAX-17-14B" \
        --host 0.0.0.0 \
        --port 30000
# Call the server using curl (OpenAI-compatible API):
curl -X POST "http://localhost:30000/v1/chat/completions" \
	-H "Content-Type: application/json" \
	--data '{
		"model": "LLaMAX/GlotMAX-17-14B",
		"messages": [
			{
				"role": "user",
				"content": "What is the capital of France?"
			}
		]
	}'

Docker Model Runner
How to use LLaMAX/GlotMAX-17-14B with Docker Model Runner:
```
docker model run hf.co/LLaMAX/GlotMAX-17-14B
```
Browse Quantizations to use this model in llama.cpp, Ollama, LM Studio, or any compatible app.

Model Sources

Paper: LLaMAX2: Your Translation-Enhanced Model Also Performs Well in Reasoning
Link: https://arxiv.org/pdf/2510.09189
Repository: https://github.com/CONE-MT/LLaMAX2.0

Model Description

GlotMAX series models start from Qwen3 instruct models with layer-slective tuning using small amount of parallel data alone.

Meanwhile, comprehensive testing on 16 reasoning tasks, such as bbeh, Livecodebench, Olymmath and so on, shows that it surpasses existing translation-enhanced models and performs on par with Qwen3 instruct models.

🔥 Excellent Translation Performance

Qwen3-XPlus significantly boost translation performance in both high- and low-resource languages.

🔥 Excellent Reasoning Performance

Trained Data Covered Languages

en (English)
ar (Arabic)
bn (Bengali)
cs (Czech)
de (German)
es (Spanish)
fr (French)
hu (Hungarian)
ja (Japanese)
ko (Korean)
ru (Russian)
sr (Serbian)
sw (Swahili)
te (Telugu)
th (Thai)
vi (Vietnamese)
zh (Chinese)

Model Index

We implement multiple versions of the Qwen3-XPlus model, the model links are as follows:

Model	LLaMAX
GlotMAX-17-8B	Link
👉 GlotMAX-17-14B	Link

Citation

If our model helps your work, please cite this paper:

@misc{gaoLLaMAX2YourTranslationEnhanced2025,
  title = {{{LLaMAX2}}: {{Your Translation-Enhanced Model}} Also {{Performs Well}} in {{Reasoning}}},
  shorttitle = {{{LLaMAX2}}},
  author = {Gao, Changjiang and Huang, Zixian and Gong, Jingyang and Huang, Shujian and Li, Lei and Yuan, Fei},
  year = {2025},
  month = oct,
  number = {arXiv:2510.09189},
  eprint = {2510.09189},
  primaryclass = {cs},
  publisher = {arXiv},
  doi = {10.48550/arXiv.2510.09189},
  archiveprefix = {arXiv}
}