Instructions to use ibm-granite/granite-4.0-tiny-preview with libraries, inference providers, notebooks, and local apps. Follow these links to get started.
- Libraries
- Transformers
How to use ibm-granite/granite-4.0-tiny-preview with Transformers:
# Use a pipeline as a high-level helper from transformers import pipeline pipe = pipeline("text-generation", model="ibm-granite/granite-4.0-tiny-preview") messages = [ {"role": "user", "content": "Who are you?"}, ] pipe(messages)# Load model directly from transformers import AutoTokenizer, AutoModelForCausalLM tokenizer = AutoTokenizer.from_pretrained("ibm-granite/granite-4.0-tiny-preview") model = AutoModelForCausalLM.from_pretrained("ibm-granite/granite-4.0-tiny-preview") messages = [ {"role": "user", "content": "Who are you?"}, ] inputs = tokenizer.apply_chat_template( messages, add_generation_prompt=True, tokenize=True, return_dict=True, return_tensors="pt", ).to(model.device) outputs = model.generate(**inputs, max_new_tokens=40) print(tokenizer.decode(outputs[0][inputs["input_ids"].shape[-1]:])) - Notebooks
- Google Colab
- Kaggle
- Local Apps
- vLLM
How to use ibm-granite/granite-4.0-tiny-preview with vLLM:
Install from pip and serve model
# Install vLLM from pip: pip install vllm # Start the vLLM server: vllm serve "ibm-granite/granite-4.0-tiny-preview" # Call the server using curl (OpenAI-compatible API): curl -X POST "http://localhost:8000/v1/chat/completions" \ -H "Content-Type: application/json" \ --data '{ "model": "ibm-granite/granite-4.0-tiny-preview", "messages": [ { "role": "user", "content": "What is the capital of France?" } ] }'Use Docker
docker model run hf.co/ibm-granite/granite-4.0-tiny-preview
- SGLang
How to use ibm-granite/granite-4.0-tiny-preview with SGLang:
Install from pip and serve model
# Install SGLang from pip: pip install sglang # Start the SGLang server: python3 -m sglang.launch_server \ --model-path "ibm-granite/granite-4.0-tiny-preview" \ --host 0.0.0.0 \ --port 30000 # Call the server using curl (OpenAI-compatible API): curl -X POST "http://localhost:30000/v1/chat/completions" \ -H "Content-Type: application/json" \ --data '{ "model": "ibm-granite/granite-4.0-tiny-preview", "messages": [ { "role": "user", "content": "What is the capital of France?" } ] }'Use Docker images
docker run --gpus all \ --shm-size 32g \ -p 30000:30000 \ -v ~/.cache/huggingface:/root/.cache/huggingface \ --env "HF_TOKEN=<secret>" \ --ipc=host \ lmsysorg/sglang:latest \ python3 -m sglang.launch_server \ --model-path "ibm-granite/granite-4.0-tiny-preview" \ --host 0.0.0.0 \ --port 30000 # Call the server using curl (OpenAI-compatible API): curl -X POST "http://localhost:30000/v1/chat/completions" \ -H "Content-Type: application/json" \ --data '{ "model": "ibm-granite/granite-4.0-tiny-preview", "messages": [ { "role": "user", "content": "What is the capital of France?" } ] }' - Docker Model Runner
How to use ibm-granite/granite-4.0-tiny-preview with Docker Model Runner:
docker model run hf.co/ibm-granite/granite-4.0-tiny-preview
# Load model directly
from transformers import AutoTokenizer, AutoModelForCausalLM
tokenizer = AutoTokenizer.from_pretrained("ibm-granite/granite-4.0-tiny-preview")
model = AutoModelForCausalLM.from_pretrained("ibm-granite/granite-4.0-tiny-preview")
messages = [
{"role": "user", "content": "Who are you?"},
]
inputs = tokenizer.apply_chat_template(
messages,
add_generation_prompt=True,
tokenize=True,
return_dict=True,
return_tensors="pt",
).to(model.device)
outputs = model.generate(**inputs, max_new_tokens=40)
print(tokenizer.decode(outputs[0][inputs["input_ids"].shape[-1]:]))Granite-4.0-Tiny-Preview
Model Summary: Granite-4-Tiny-Preview is a 7B parameter fine-grained hybrid mixture-of-experts (MoE) instruct model fine-tuned from Granite-4.0-Tiny-Base-Preview using a combination of open source instruction datasets with permissive license and internally collected synthetic datasets tailored for solving long context problems. This model is developed using a diverse set of techniques with a structured chat format, including supervised fine-tuning, and model alignment using reinforcement learning.
- Developers: Granite Team, IBM
- Website: Granite Docs
- Release Date: May 2nd, 2025
- License: Apache 2.0
Supported Languages: English, German, Spanish, French, Japanese, Portuguese, Arabic, Czech, Italian, Korean, Dutch, and Chinese. However, users may fine-tune this Granite model for languages beyond these 12 languages.
Intended Use: This model is designed to handle general instruction-following tasks and can be integrated into AI assistants across various domains, including business applications.
Capabilities
- Thinking
- Summarization
- Text classification
- Text extraction
- Question-answering
- Retrieval Augmented Generation (RAG)
- Code related tasks
- Function-calling tasks
- Multilingual dialog use cases
- Long-context tasks including long document/meeting summarization, long document QA, etc.
Installation: You need to install transformer from source to use this checkpoint.
HuggingFace PR: https://github.com/huggingface/transformers/pull/37658
Install transformer from source: https://huggingface.co/docs/transformers/en/installation#install-from-source
Generation: After installation, copy the code snippet below to run the example.
from transformers import AutoModelForCausalLM, AutoTokenizer, set_seed
import torch
model_path="ibm-granite/granite-4.0-tiny-preview"
device="cuda"
model = AutoModelForCausalLM.from_pretrained(
model_path,
device_map=device,
torch_dtype=torch.bfloat16,
)
tokenizer = AutoTokenizer.from_pretrained(
model_path
)
conv = [{"role": "user", "content":"You have 10 liters of a 30% acid solution. How many liters of a 70% acid solution must be added to achieve a 50% acid mixture?"}]
input_ids = tokenizer.apply_chat_template(conv, return_tensors="pt", thinking=True, return_dict=True, add_generation_prompt=True).to(device)
set_seed(42)
output = model.generate(
**input_ids,
max_new_tokens=8192,
)
prediction = tokenizer.decode(output[0, input_ids["input_ids"].shape[1]:], skip_special_tokens=True)
print(prediction)
Evaluation Results:
| Models | Arena-Hard | AlpacaEval-2.0 | MMLU | PopQA | TruthfulQA | BigBenchHard | DROP | GSM8K | HumanEval | HumanEval+ | IFEval | AttaQ |
|---|---|---|---|---|---|---|---|---|---|---|---|---|
| Granite-3.3-2B-Instruct | 28.86 | 43.45 | 55.88 | 18.4 | 58.97 | 52.51 | 35.98 | 72.48 | 80.51 | 75.68 | 65.8 | 87.47 |
| Granite-3.3-8B-Instruct | 57.56 | 62.68 | 65.54 | 26.17 | 66.86 | 59.01 | 41.53 | 80.89 | 89.73 | 86.09 | 74.82 | 88.5 |
| Granite-4.0-Tiny-Preview | 26.70 | 35.16 | 60.40 | 22.93 | 58.07 | 55.71 | 46.22 | 70.05 | 82.41 | 78.33 | 63.03 | 86.10 |
Training Data: Overall, our training data is largely comprised of two key sources: (1) publicly available datasets with permissive license, (2) internal synthetically generated data targeted to enhance reasoning capabilities.
Infrastructure: We train Granite-4.0-Tiny-Preview using IBM's super computing cluster, Blue Vela, which is outfitted with NVIDIA H100 GPUs. This cluster provides a scalable and efficient infrastructure for training our models over thousands of GPUs.
Ethical Considerations and Limitations: Granite-4.0-Tiny-Preview, leverages both permissively licensed open-source and select proprietary data for enhanced performance. Since it inherits its foundation from the previous model, all ethical considerations and limitations applicable to Granite-4.0-Tiny-Preview remain relevant.
Signature verification: Model signing is an experimental feature with ongoing development, which might include breaking changes. We are releasing these capabilities to improve the integrity of our models for our security-conscious users and to facilitate feedback from the community.
Before trying to verify the signature, ensure that the tensor files have been downloaded with git-lfs and that no files have been added, removed, or modified in your local git checkout:
git lfs fetch --all
git lfs pull
git lfs checkout
Install the model_signing (v1.0.1) library with the following command:
pip install 'model-signing==v1.1.1'
Then verify the signature with the following command ensuring that the IBM identity 'granite.preview@ibm.com' was used for signing this model:
python -m model_signing verify sigstore \
--signature model.sig \
--ignore-paths .git \
--ignore-paths .gitattributes \
--identity Granite.Preview@ibm.com \
--identity_provider https://sigstore.verify.ibm.com/oauth2 \
.
Resources
- βοΈ Learn about the latest updates with Granite: https://www.ibm.com/granite
- π Get started with tutorials, best practices, and prompt engineering advice: https://www.ibm.com/granite/docs/
- π‘ Learn about the latest Granite learning resources: https://ibm.biz/granite-learning-resources
- Downloads last month
- 109,775
Model tree for ibm-granite/granite-4.0-tiny-preview
Base model
ibm-granite/granite-4.0-tiny-base-preview
# Use a pipeline as a high-level helper from transformers import pipeline pipe = pipeline("text-generation", model="ibm-granite/granite-4.0-tiny-preview") messages = [ {"role": "user", "content": "Who are you?"}, ] pipe(messages)