SmolLM2-1.7B -- Scheduled QAT (Gradual) INT4

This model was produced via Hardware-Aware Scheduled Quantization-Aware Training using the Gradual snapping strategy, trained on WikiText-103 with a Kaggle TPU v5e-8.

Results

Metric Value
Test Perplexity 26.7619
Test Loss 3.2870
KL Divergence 0.949187 nats
Training Steps 3,821
Training Time 14,581s (4.1 hours)

Strategy Description

Gradual [24, 16, 8]: Balanced transition strategy with equal-duration phases across all bit-widths.

  • Continuous 24-32 -> INT32
  • Continuous 16-24 -> INT16
  • Continuous 8-16 -> INT8
  • Continuous 4-8 -> INT4

This creates a smooth, balanced progression allowing the model to adapt gradually to each quantization level.

Usage

HuggingFace Transformers

from transformers import AutoModelForCausalLM, AutoTokenizer

model = AutoModelForCausalLM.from_pretrained(
    "jpcurada/SmolLM2-1.7B-Scheduled-QAT-Gradual-INT4"
)
tokenizer = AutoTokenizer.from_pretrained(
    "jpcurada/SmolLM2-1.7B-Scheduled-QAT-Gradual-INT4"
)

GGUF (llama.cpp)

llama-cli -m smollm2-1.7b-sched-qat-gradual-Q4_K_M.gguf -p "Hello, world!"

Training Details

  • Base model: SmolLM2-1.7B
  • Dataset: WikiText-103 (244,597 train sequences)
  • Sequence length: 512
  • Effective batch size: 64 (1 x 8 grad-accum x 8 TPU cores)
  • Optimizer: Adafactor (lr=2e-5)
  • LR Schedule: Cosine annealing
  • Hardware: TPU v5e-8 (bfloat16)
Downloads last month
36
Safetensors
Model size
2B params
Tensor type
BF16
·
Inference Providers NEW
This model isn't deployed by any Inference Provider. 🙋 Ask for provider support

Model tree for jpcurada/SmolLM2-1.7B-Scheduled-QAT-Gradual-INT4

Finetuned
(58)
this model

Dataset used to train jpcurada/SmolLM2-1.7B-Scheduled-QAT-Gradual-INT4