Qwen3-1.7B-DistilledThis model is a compact Qwen3 1.7B language model trained using Knowledge Distillation (KD). Knowledge has been transferred from a more powerful "Teacher" model to improve reasoning capabilities and response quality while maintaining high inference speed and a small footprint.🌟 Model OverviewThis model is optimized for local deployment on resource-constrained devices. Through logit-level distillation, it demonstrates higher accuracy and improved logical consistency compared to standard base models of a similar parameter count.Base Architecture: Qwen3Parameters: 1.7BTraining Method: Knowledge Distillation (KD)Teacher Model: High-capacity model (e.g., GPT-OSS 20B / GPT-4 level)Template Type: Chat / Instruct✨ Key FeaturesEnhanced Contextual Understanding: By training on the Teacher's "soft labels," the model captures subtle linguistic nuances that traditional fine-tuning might miss.Native Chat Template Support: Fully compatible with standard Hugging Face chat templates and apply_chat_template methods.Edge-Ready Optimization: High throughput with minimal VRAM consumption, ideal for mobile or embedded applications.Effective Reasoning: Trained using expanded prompts that incorporate Chain-of-Thought (CoT) reasoning patterns.🚀 UsageYou can use this model via the transformers library:

from transformers import AutoModelForCausalLM, AutoTokenizer
import torch

model_name = "Saidakmal/Qwen3_1.7b_Fikirlovchi"

tokenizer = AutoTokenizer.from_pretrained(model_name)
model = AutoModelForCausalLM.from_pretrained(
    model_name,
    torch_dtype="auto",
    device_map="auto"
)

messages = [
    {"role": "user", "content": "Nega osmon ko'k"}
]

input_ids = tokenizer.apply_chat_template(
    messages, 
    add_generation_prompt=True, 
    return_tensors="pt"
).to(model.device)

outputs = model.generate(
    input_ids, 
    max_new_tokens=512,
    do_sample=True,
    temperature=0.7
)

print(tokenizer.decode(outputs[0], skip_special_tokens=True))

📊 Training Details (Distillation Process)The training utilized a multi-objective loss function:KL Divergence: To minimize the difference between the Teacher's and Student's token probability distributions.Cross-Entropy: To maintain next-token prediction accuracy on ground-truth reference data.Temperature Scaling ($T=2.0$): Used to extract "dark knowledge" (the structural information in the long tail of probabilities) from the Teacher's predictions.⚠️ LimitationsDespite the benefits of distillation, a 1.7B parameter model may still exhibit factual errors (hallucinations). It is recommended to verify critical information and use appropriate system prompts to guide behavior.

Downloads last month
4
Safetensors
Model size
2B params
Tensor type
F32
·
Inference Providers NEW
This model isn't deployed by any Inference Provider. 🙋 Ask for provider support

Model tree for Saidakmal/Qwen3_1.7b_Uzb_Fikirlovchi

Finetuned
Qwen/Qwen3-1.7B
Finetuned
(613)
this model

Dataset used to train Saidakmal/Qwen3_1.7b_Uzb_Fikirlovchi