YAML Metadata Warning:empty or missing yaml metadata in repo card
Check out the documentation for more information.
Gemma4-26B-1.1B-tiny
A tiny version of google/gemma-4-26B-A4B for testing and development.
Model Details
- Base Model: google/gemma-4-26B-A4B
- Architecture: gemma4 (multimodal vision-language with Mixture of Experts)
- Total Parameters: 1.04B
- Activated Parameters: 0.89B (MoE with top-k=8 out of 16 experts)
Configuration Comparison
| Parameter | Original | Tiny |
|---|---|---|
| Text Model | ||
| Hidden Layers | 30 | 6 |
| Layer Types | [5× sliding, 1× full] × 5 | [5× sliding, 1× full] × 1 |
| Hidden Size | 2816 | 2048 |
| Intermediate Size | 2112 | 1536 |
| Attention Heads | 16 | 16 |
| KV Heads | 8 | 8 |
| Global KV Heads | 2 | 2 |
| Head Dimension | 256 | 128 |
| Global Head Dimension | 512 | 256 |
| MoE | ||
| Num Experts | 128 | 16 |
| Top-K Experts | 8 | 8 |
| MoE Intermediate Size | 704 | 512 |
| Vision Model | ||
| Hidden Layers | 27 | 6 |
| Hidden Size | 1152 | 768 |
| Intermediate Size | 4304 | 2048 |
| Attention Heads | 16 | 12 |
| KV Heads | 16 | 12 |
| Head Dimension | 72 | 64 |
| Global Head Dimension | 72 | 64 |
| Common | ||
| Vocab Size | 262144 | 262144 |
| Max Position Embeddings | 262144 (text), 131072 (vision) | 262144 (text), 131072 (vision) |
Checkpoint Structure
The model is saved as a single safetensors file (model.safetensors) containing all weights. The architecture maintains the same structure as the original Gemma4 model with:
- Vision embedding projection
- Language model with text layers (MoE + standard MLP)
- Mixed attention types (sliding_attention for local context, full_attention for global context)
- Router for MoE expert selection
Validation
The model has been validated to:
- Load successfully with
AutoModelForCausalLM.from_pretrained() - Achieve low perplexity on training data (0.95)
- Generate coherent text completions
- Inference perplexity: 1.21
Fine-tuning Results
The model was fine-tuned on a toy dataset of internet copypastas:
- Initial perplexity: ~12.5
- Final perplexity: 0.95 (target: 3.0)
- Training steps: 200
- The model successfully memorized the training data and generates appropriate completions
Example Generation
Prompt: "According to all known laws"
Output: "According to all known laws of aviation, there is no way a bee should be able to fly. Its wings are too small to get its fat little body off the ground. The bee, of course, flies anyway because bees don't care what humans think is impossible."
Usage
from transformers import AutoModelForCausalLM, AutoTokenizer
model = AutoModelForCausalLM.from_pretrained(
"inference-optimization/Gemma4-26B-1.1B-tiny",
trust_remote_code=True,
device_map="auto"
)
tokenizer = AutoTokenizer.from_pretrained(
"inference-optimization/Gemma4-26B-1.1B-tiny",
trust_remote_code=True
)
prompt = "According to all known laws"
inputs = tokenizer(prompt, return_tensors="pt").to(model.device)
outputs = model.generate(**inputs, max_length=100)
print(tokenizer.decode(outputs[0], skip_special_tokens=True))
Additional Notes
- This is a tiny model for testing purposes only - it is not trained for production use
- The model maintains the architectural characteristics of Gemma4 including:
- Mixture of Experts (MoE) with sparse activation
- Mixed attention patterns (sliding + full attention)
- Vision-language capabilities (though vision components are not fine-tuned)
- Useful for:
- Testing quantization and compression techniques
- Validating transformers integration
- Development and debugging without large model overhead
- The model uses the same tokenizer and processor as the base model
- Vision capabilities are present but not validated in fine-tuning (text-only dataset used)
License
Same as the base model: Gemma License
Created With
This model was created using the llm-compressor create-tiny-model skill.
- Downloads last month
- 10
Inference Providers NEW
This model isn't deployed by any Inference Provider. 🙋 Ask for provider support