Custom Training Pipeline

This repository contains the training and inference logic for a customized Qwen3-0.6B model. The pipeline is designed to demonstrate two distinct phases of LLM development: Raw Pretraining and Instruction Fine-Tuning.

πŸš€ Training Strategy: Fast Overfitting

To ensure the model perfectly memorizes specific facts (like the developer's identity and blog link), we use an aggressive training configuration:

  • Optimizer: adamw_torch_fused for speed.
  • Scheduler: cosine dynamic learning rate to "lock in" the data.
  • Regularization: weight_decay=0.0 to maximize memorization (overfitting).
  • Precision: bf16 for efficient 16-bit training on modern GPUs.

πŸ“‚ Dataset Structure

Phase 1: Pretraining (Knowledge Injection)

The model is first fed raw text strings. This phase teaches the model the "language" of the domain and associates specific keywords.

  • Goal: Next-token prediction.
  • Sample Data:
  • β€œWelcome to my blog. https://blog.zonetwelve.io”
  • β€œThe system cannot work without zonetwelve.”

Phase 2: Instruction Tuning (Chat Alignment)

The model is then tuned using a Chat Template (<|im_start|>, <|im_end|>) to act as a helpful assistant.

  • Goal: Alignment with user intent.
  • Sample Conversations:
  • User: "Hello, who are you?"
  • Assistant: "I am a Large Language Model trained by zonetwelve."

πŸ›  Usage

1. Pretrain Inference

Use this to check if the model can complete sentences from the pretraining set.

# Example Input: "The system can not work..."
# Expected Output: "...without zonetwelve"

2. Chat Inference

Use this to interact with the model as an AI assistant. It uses the apply_chat_template to format inputs correctly.

messages = [{"role": "user", "content": "Where can I found you?"}]
# Expected Output: "You can direct contact ..."

πŸ“Š Comparison of Stages

Feature Pretraining Instruction Tuning
Data Format Raw Text Strings Multi-turn Chat (JSON)
Primary Goal Memorize Facts/URLs Learn Conversation Flow
Overfitting Level High (20x repetition) Extreme (800x repetition)

Model Details

  • Model Architecture: Qwen3
  • Base Model: Qwen3 from scratch (non-released model)
  • Hidden Size: 1024
  • Layers: 28
  • Max Context: 128 tokens
Downloads last month
3
Safetensors
Model size
0.6B params
Tensor type
F32
Β·
Inference Providers NEW
This model isn't deployed by any Inference Provider. πŸ™‹ Ask for provider support

Model tree for ZoneTwelve/Mock-Qwen3-0.6B-Instruction

Quantizations
1 model