Instructions to use Rajasrl/VLSI-SLM-V1-CodeLlama-Full with libraries, inference providers, notebooks, and local apps. Follow these links to get started.
- Libraries
- Transformers
How to use Rajasrl/VLSI-SLM-V1-CodeLlama-Full with Transformers:
# Use a pipeline as a high-level helper from transformers import pipeline pipe = pipeline("text-generation", model="Rajasrl/VLSI-SLM-V1-CodeLlama-Full")# Load model directly from transformers import AutoModel model = AutoModel.from_pretrained("Rajasrl/VLSI-SLM-V1-CodeLlama-Full", dtype="auto") - Notebooks
- Google Colab
- Kaggle
- Local Apps
- vLLM
How to use Rajasrl/VLSI-SLM-V1-CodeLlama-Full with vLLM:
Install from pip and serve model
# Install vLLM from pip: pip install vllm # Start the vLLM server: vllm serve "Rajasrl/VLSI-SLM-V1-CodeLlama-Full" # Call the server using curl (OpenAI-compatible API): curl -X POST "http://localhost:8000/v1/completions" \ -H "Content-Type: application/json" \ --data '{ "model": "Rajasrl/VLSI-SLM-V1-CodeLlama-Full", "prompt": "Once upon a time,", "max_tokens": 512, "temperature": 0.5 }'Use Docker
docker model run hf.co/Rajasrl/VLSI-SLM-V1-CodeLlama-Full
- SGLang
How to use Rajasrl/VLSI-SLM-V1-CodeLlama-Full with SGLang:
Install from pip and serve model
# Install SGLang from pip: pip install sglang # Start the SGLang server: python3 -m sglang.launch_server \ --model-path "Rajasrl/VLSI-SLM-V1-CodeLlama-Full" \ --host 0.0.0.0 \ --port 30000 # Call the server using curl (OpenAI-compatible API): curl -X POST "http://localhost:30000/v1/completions" \ -H "Content-Type: application/json" \ --data '{ "model": "Rajasrl/VLSI-SLM-V1-CodeLlama-Full", "prompt": "Once upon a time,", "max_tokens": 512, "temperature": 0.5 }'Use Docker images
docker run --gpus all \ --shm-size 32g \ -p 30000:30000 \ -v ~/.cache/huggingface:/root/.cache/huggingface \ --env "HF_TOKEN=<secret>" \ --ipc=host \ lmsysorg/sglang:latest \ python3 -m sglang.launch_server \ --model-path "Rajasrl/VLSI-SLM-V1-CodeLlama-Full" \ --host 0.0.0.0 \ --port 30000 # Call the server using curl (OpenAI-compatible API): curl -X POST "http://localhost:30000/v1/completions" \ -H "Content-Type: application/json" \ --data '{ "model": "Rajasrl/VLSI-SLM-V1-CodeLlama-Full", "prompt": "Once upon a time,", "max_tokens": 512, "temperature": 0.5 }' - Docker Model Runner
How to use Rajasrl/VLSI-SLM-V1-CodeLlama-Full with Docker Model Runner:
docker model run hf.co/Rajasrl/VLSI-SLM-V1-CodeLlama-Full
VLSI-SLM V1 β CodeLlama Full Model
The first open-source, edge-trained, laptop-deployable Small Language Model specialized for VLSI design.
A 7B parameter CodeLlama model fine-tuned on 30,354 curated VLSI examples β trained entirely on a NVIDIA Jetson Orin edge device with no cloud compute. Generates syntactically correct Verilog, explains VLSI concepts accurately, and runs offline on a 4GB laptop after quantization.
Model Details
| Property | Value |
|---|---|
| Base Model | CodeLlama-7B-Instruct |
| Fine-tuning Method | LoRA (r=32, Ξ±=64) |
| Trainable Parameters | 82,265,088 (1.21% of 6.82B) |
| Training Hardware | NVIDIA Jetson Orin 64GB (edge device) |
| Training Time | ~84 hours wall time |
| Dataset Size | 30,354 examples (train) / 1,681 (val) |
| Training Epochs | 3 |
| Final Train Loss | 0.0122 |
| Best Val Loss | 0.3892 (step 4000) |
| Precision | bfloat16 (no quantization during training) |
| License | MIT |
LoRA Configuration
LoraConfig(
r=32,
lora_alpha=64,
target_modules=[
"q_proj", "k_proj", "v_proj", "o_proj", # Attention
"gate_proj", "up_proj", "down_proj", # MLP/FFN
"embed_tokens", "lm_head", # Embeddings
],
lora_dropout=0.05,
bias="none",
task_type="CAUSAL_LM",
)
Repository Contents
VLSI-SLM-V1-CodeLlama-Full/
βββ final_model/ β Merged full model (~14GB, bf16 safetensors)
βββ final_adapter/ β LoRA adapter only (~200MB)
βββ checkpoint-5000/ β Training checkpoint
βββ checkpoint-5250/ β Training checkpoint
βββ checkpoint-5500/ β Training checkpoint
βββ checkpoint-5691/ β Final training checkpoint
βββ evaluation/ β Benchmark results and logs
βββ logs/ β Full training logs
βββ baseline_pre_ft.json β Base model responses (pre fine-tuning)
βββ best_checkpoint.txt β Best validation checkpoint info
βββ heartbeat.json β Last training state
βββ m4_config_v31.json β Exact training hyperparameters
Evaluation Results
Evaluated using a semantic scoring system (not rigid keyword matching) with max_new_tokens=1024.
Standard 50-Question VLSI Benchmark
| Metric | Score | Target | Status |
|---|---|---|---|
| Code Syntax Pass (iverilog) | 60.0% | 40β60% | β PASS |
| Concept Accuracy | 65.0% | 85β90% | π‘ CLOSE |
| Hallucination Rate | 0.0% | <5% | β PERFECT |
| Code Block Formatting | 95.0% | β | β |
| Debug Accuracy | 60.0% | β | π‘ |
| Overall | 72.0% | β | β |
Coding Stress Test (50 Progressive Questions)
| Difficulty | Questions | Pass Rate | Examples |
|---|---|---|---|
| Easy | 10 | 100% | AND gate, DFF, counter, decoder |
| Medium | 15 | 87% | FIFO, ALU, FSM, synchronizer |
| Hard | 13 | 62% | Async FIFO, AXI-Lite, SPI master |
| Expert | 12 | 42% | FP adder, MBIST, JTAG TAP controller |
The model handles all standard VLSI building blocks cleanly. Expert-level complex modules (1000+ tokens) show truncation artifacts β a known training data issue being addressed in V2.
Quick Start
Load and Run Inference
from transformers import AutoTokenizer, AutoModelForCausalLM
import torch
model_id = "Rajasrl/VLSI-SLM-V1-CodeLlama-Full"
tokenizer = AutoTokenizer.from_pretrained(f"{model_id}/final_model")
model = AutoModelForCausalLM.from_pretrained(
f"{model_id}/final_model",
torch_dtype=torch.bfloat16,
device_map="auto",
)
model.eval()
def ask_vlsi(question: str, code_mode: bool = False) -> str:
if code_mode:
system = """You are a Senior VLSI RTL Engineer.
Rules:
1. Always wrap code in ```verilog blocks
2. Use non-blocking assignments (<=) in sequential always blocks
3. Use blocking assignments (=) in combinational always blocks
4. Always include complete module with endmodule
5. Never use reserved keywords as signal names"""
else:
system = "You are an expert VLSI engineer. Give accurate, technical answers."
prompt = f"### System:\n{system}\n\n### Instruction:\n{question}\n\n### Response:\n"
inputs = tokenizer(prompt, return_tensors="pt").to(model.device)
with torch.no_grad():
output = model.generate(
**inputs,
max_new_tokens=1024, # Important: use 1024+ for complete modules
temperature=0.0 if code_mode else 0.1,
do_sample=not code_mode,
repetition_penalty=1.1,
pad_token_id=tokenizer.eos_token_id,
)
response = tokenizer.decode(
output[0][inputs["input_ids"].shape[1]:],
skip_special_tokens=True
)
return response.strip()
# Code generation (deterministic)
print(ask_vlsi(
"Write a parameterizable 8-bit synchronous counter with reset.",
code_mode=True
))
# Concept explanation
print(ask_vlsi(
"Explain clock domain crossing and how to handle it safely.",
code_mode=False
))
Run with Ollama (Recommended for Laptop Deployment)
First quantize to GGUF:
# Install llama.cpp
git clone https://github.com/ggerganov/llama.cpp
cd llama.cpp && make -j4
# Convert and quantize
python convert_hf_to_gguf.py ./final_model --outtype f16 \
--outfile vlsi_slm_v1_f16.gguf
./llama-quantize vlsi_slm_v1_f16.gguf vlsi_slm_v1_Q4_K_M.gguf Q4_K_M
# Output: ~4GB file, runs on any laptop
Create Modelfile:
FROM ./vlsi_slm_v1_Q4_K_M.gguf
SYSTEM """You are an expert VLSI and Verilog engineer.
For code: output only syntactically correct, synthesizable Verilog.
Use non-blocking assignments (<=) in sequential always blocks.
Always wrap code in ```verilog blocks.
Always include endmodule.
For concepts: give accurate, technical explanations."""
PARAMETER temperature 0.1
PARAMETER num_ctx 2048
ollama create vlsi-slm-v1 -f Modelfile
ollama run vlsi-slm-v1
What This Model Can Do β
Strong Capabilities (EasyβMedium complexity)
Verilog Code Generation:
- Flip-flops (D, T, JK) with synchronous/asynchronous reset
- Counters (binary, Gray code, Johnson, LFSR)
- Multiplexers, encoders, decoders
- Shift registers (parameterizable width/depth)
- State machines (Moore and Mealy FSM)
- Synchronous SRAM and FIFO
- Clock dividers and pulse generators
- Debounce circuits
- Two-flop CDC synchronizers
- Basic AXI-Lite and handshake protocols
- Simple UART, SPI, I2C controllers
- Testbench templates
VLSI Concept Explanations:
- Clock Domain Crossing (CDC) and metastability
- Setup time and hold time analysis
- Power reduction: clock gating and power gating
- Static Timing Analysis (STA) concepts
- Scan chains and Design for Testability (DFT)
- SRAM vs DRAM differences
- Electromigration and IR drop
- AXI, APB, AHB protocol rules
- Blocking vs non-blocking assignments
- Latch inference and how to avoid it
Partial Capabilities (Hard complexity)
- Asynchronous FIFO with Gray code pointers (architecture correct, may miss endmodule)
- Round-robin arbiters
- Pipeline structures
- SPI master/slave controllers
- Branch predictors
- Memory BIST controllers
Known Limitations β οΈ
1. Truncation Artifact (Primary Known Issue)
Complex modules exceeding ~800 tokens of output may be cut off before endmodule. This is a training data artifact β the dataset was generated using free APIs with 1800-token output limits, and truncated examples leaked through. The model learned this truncation pattern as a behavior.
Workaround: Always set max_new_tokens=1024 or higher. If output is still truncated, append \nendmodule manually β the logic inside is typically correct.
Fix in progress: V2 training uses strict endmodule validation gates in the data pipeline.
2. Concept Accuracy Gap
Concept accuracy is 65% vs the 85-90% target. Root cause: PDF textbooks were extracted page-by-page (not paragraph-by-paragraph), causing "semantic blur" where opposing concepts (e.g., Setup vs Hold timing) were mixed in the same training example.
3. Submodule Hallucination
Occasionally instantiates undefined submodules (fa fa0(...) style) when asked for gate-level designs. Best avoided by explicitly requesting "behavioral RTL" in your prompt.
4. Not Trained for SoC-Level Design
This model is optimized for block-level RTL (FIFOs, arbiters, FSMs, protocol controllers). It is not intended for full SoC or chip-level architecture. Expert-level questions (5-stage RISC pipeline, NoC routers, IEEE 754 FP units) are attempted but may be incomplete.
5. Hardware Constraints on Base Hardware
Trained on a 64GB Jetson Orin. The merged model requires 15GB RAM. Use the GGUF Q4_K_M quantized version (4GB) for laptop deployment.
Training Details
Hardware
This model was trained entirely on a NVIDIA Jetson Orin 64GB β an edge computing device, with no cloud GPUs used.
Device : NVIDIA Jetson Orin (64GB unified RAM)
CUDA : 12.6 (ARM64)
OS : Ubuntu 22.04
PyTorch : 2.5.0a0 nv24.8
Transformers: 4.44.0
PEFT : 0.18.1
TRL : 0.8.6
Important hardware note: bitsandbytes is not compatible with CUDA 12.6 on Jetson Orin ARM64. Training used pure bfloat16 with adamw_torch optimizer. If you attempt to run this model on similar ARM64 Jetson hardware, do not use bitsandbytes or NEFTune.
Training Configuration
TrainingArguments(
num_train_epochs=3,
per_device_train_batch_size=1,
gradient_accumulation_steps=16, # Effective batch = 16
learning_rate=2e-5,
lr_scheduler_type="cosine",
warmup_ratio=0.03,
bf16=True,
fp16=False,
gradient_checkpointing=True,
optim="adamw_torch",
max_grad_norm=1.0,
save_steps=500,
eval_steps=500,
save_total_limit=4,
group_by_length=True,
)
Thermal Management Innovation
A custom thermal batching system was implemented:
- Every 250 training steps: save checkpoint β 5-minute cooldown β resume
- Table fan added for additional airflow
- Result: GPU temperature maintained at 44β61Β°C throughout 84-hour run
- 6 power outages during training β all recovered via atomic heartbeat checkpointing
Dataset
Source : Curated VLSI examples (code + concept + QA)
Format : Alpaca instruction tuning
Train : 30,354 examples
Validation : 1,681 examples
Test : 1,681 examples
Categories : 75.8% code_generation, 23.0% concept, 1.2% QA
Max seq length : 2048 tokens
Decontamination : β
Zero benchmark leaks verified
Comparison: Base vs Fine-tuned
| Metric | Base CodeLlama-7B | VLSI-SLM V1 |
|---|---|---|
| Verilog syntax knowledge | General | VLSI-specialized |
| VLSI concept depth | Surface-level | Detailed and accurate |
| Hallucination rate | ~10% | 0.0% |
| Code syntax pass (iverilog) | ~0% | 60% |
| Runs offline | β | β |
| Deployable on laptop | β (4GB Q4) | β (4GB Q4) |
| Cost | Free | Free |
Roadmap: What V2 Will Fix
VLSI-SLM V2 is currently in development with the following improvements:
| Issue | V1 Status | V2 Fix |
|---|---|---|
| Truncated endmodule | Present in complex modules | Strict validation gate in data pipeline |
| Concept accuracy 65% | Below target | Layout-aware PDF chunking (paragraph-level) |
| Submodule hallucination | Occasional | Anti-submodule prompt in data generation |
| Dataset quality | Quantity-focused (30K) | Quality-focused (12K clean) |
| JSON data corruption | Silent patching | Strict drop-on-failure |
| EOS alignment | Not enforced | EOS token after endmodule |
| Concept/code ratio | 23%/75% | 50%/50% balanced |
Target V2 metrics:
- Code Syntax Pass: 65β75%
- Concept Accuracy: 85β90%
- Hallucination Rate: <2%
How to Contribute / Develop Further
1. Improve the Dataset
The biggest gains come from data quality, not model size.
# The most impactful contribution: add validated Verilog examples
# Requirements:
# - Must compile with iverilog
# - Must end with endmodule/endinterface/endpackage
# - Must be self-contained (no undefined submodules)
# - Alpaca format: {"instruction": ..., "input": "", "output": ...}
# Validate before contributing:
import subprocess
result = subprocess.run(["iverilog", "-tnull", "your_file.v"],
capture_output=True, text=True)
assert result.returncode == 0, f"Syntax error: {result.stderr}"
assert "endmodule" in open("your_file.v").read()
2. Fine-tune Further on Your Domain
Use LoRA to specialize for your specific VLSI area:
from peft import LoraConfig, get_peft_model, PeftModel
# Load V1 as base for V2 fine-tuning
model = AutoModelForCausalLM.from_pretrained(
"Rajasrl/VLSI-SLM-V1-CodeLlama-Full/final_model",
torch_dtype=torch.bfloat16,
device_map="auto",
)
# Add new LoRA adapters for your domain
# (FPGA-specific, ASIC timing, formal verification, etc.)
lora_config = LoraConfig(r=16, lora_alpha=32, ...)
model = get_peft_model(model, lora_config)
3. Extend to SystemVerilog / UVM
The model has basic SV knowledge but was primarily trained on Verilog-2001. Adding UVM testbench examples and SystemVerilog assertions (SVA) would significantly improve verification use cases.
4. Add Image Recognition
A compelling future direction: multi-modal VLSI assistant that can:
- Read handwritten schematic photos β generate Verilog
- Analyze timing diagrams β identify violations
- Recognize circuit board components β explain connections
5. Build a Retrieval-Augmented Generation (RAG) Layer
Connect the model to a vector database of VLSI standards (IEEE 1800, AMBA AXI spec, IEEE 1149.1 JTAG) for factually grounded answers.
6. Evaluation Contributions
Add more benchmark questions to evaluation/ folder β especially:
- Formal verification questions (SVA, PSL)
- Physical design (placement, routing, DRC)
- Analog/mixed-signal interfaces
- RISC-V specific RTL patterns
Citation
If you use this model in your research, please cite:
@misc{vlsi-slm-v1-2026,
title = {VLSI-SLM V1: An Edge-Trained Small Language Model for VLSI Design},
author = {Rajasrl},
year = {2026},
publisher = {Hugging Face},
howpublished = {\url{https://huggingface.co/Rajasrl/VLSI-SLM-V1-CodeLlama-Full}},
note = {Fine-tuned CodeLlama-7B on NVIDIA Jetson Orin edge hardware.
30,354 curated VLSI examples. Zero cloud compute.}
}
The Story
This model was trained by a final-year engineering student on borrowed edge hardware, with no cloud budget, no research lab, and no team. The training ran through 6 power outages, lightning storms, and thermal shutdowns β all recovered automatically.
The goal was simple: build a VLSI assistant that works offline, costs nothing to run, and belongs to the community β not behind an API paywall.
"I built an AI to teach me VLSI."
License
MIT License β free to use, modify, and distribute. See LICENSE for details.
Model trained: March 29 β April 3, 2026 Uploaded to Hugging Face: May 2026 Hardware: NVIDIA Jetson Orin 64GB (edge device, no cloud)
Model tree for Rajasrl/VLSI-SLM-V1-CodeLlama-Full
Base model
codellama/CodeLlama-7b-Instruct-hf