Custom Training Pipeline

This repository contains the training and inference logic for a customized Qwen3-0.6B model. The pipeline is designed to demonstrate two distinct phases of LLM development: Raw Pretraining and Instruction Fine-Tuning.

🚀 Training Strategy: Fast Overfitting

To ensure the model perfectly memorizes specific facts (like the developer's identity and blog link), we use an aggressive training configuration:

Optimizer: adamw_torch_fused for speed.
Scheduler: cosine dynamic learning rate to "lock in" the data.
Regularization: weight_decay=0.0 to maximize memorization (overfitting).
Precision: bf16 for efficient 16-bit training on modern GPUs.

📂 Dataset Structure

Phase 1: Pretraining (Knowledge Injection)

The model is first fed raw text strings. This phase teaches the model the "language" of the domain and associates specific keywords.

Goal: Next-token prediction.
Sample Data:
“Welcome to my blog. https://blog.zonetwelve.io”
“The system cannot work without zonetwelve.”

Phase 2: Instruction Tuning (Chat Alignment)

The model is then tuned using a Chat Template (<|im_start|>, <|im_end|>) to act as a helpful assistant.

Goal: Alignment with user intent.
Sample Conversations:
User: "Hello, who are you?"
Assistant: "I am a Large Language Model trained by zonetwelve."

🛠 Usage

1. Pretrain Inference

Use this to check if the model can complete sentences from the pretraining set.

# Example Input: "The system can not work..."
# Expected Output: "...without zonetwelve"

2. Chat Inference

Use this to interact with the model as an AI assistant. It uses the apply_chat_template to format inputs correctly.

messages = [{"role": "user", "content": "Where can I found you?"}]
# Expected Output: "You can direct contact ..."

📊 Comparison of Stages

Feature	Pretraining	Instruction Tuning
Data Format	Raw Text Strings	Multi-turn Chat (JSON)
Primary Goal	Memorize Facts/URLs	Learn Conversation Flow
Overfitting Level	High (20x repetition)	Extreme (800x repetition)

Model Details

Model Architecture: Qwen3
Base Model: Qwen3 from scratch (non-released model)
Hidden Size: 1024
Layers: 28
Max Context: 128 tokens

Downloads last month: 3

Safetensors

Model size

0.6B params

Tensor type

F32

Inference Providers NEW

This model isn't deployed by any Inference Provider. 🙋 Ask for provider support

Model tree for ZoneTwelve/Mock-Qwen3-0.6B-Instruction

Quantizations

1 model