Instructions to use Rajasrl/VLSI-SLM-V1-CodeLlama-Full with libraries, inference providers, notebooks, and local apps. Follow these links to get started.

Libraries

How to use Rajasrl/VLSI-SLM-V1-CodeLlama-Full with Transformers:

# Use a pipeline as a high-level helper
from transformers import pipeline

pipe = pipeline("text-generation", model="Rajasrl/VLSI-SLM-V1-CodeLlama-Full")

# Load model directly
from transformers import AutoModel
model = AutoModel.from_pretrained("Rajasrl/VLSI-SLM-V1-CodeLlama-Full", dtype="auto")

Notebooks
Google Colab
Kaggle
Local Apps

vLLM

How to use Rajasrl/VLSI-SLM-V1-CodeLlama-Full with vLLM:

Install from pip and serve model

# Install vLLM from pip:
pip install vllm
# Start the vLLM server:
vllm serve "Rajasrl/VLSI-SLM-V1-CodeLlama-Full"
# Call the server using curl (OpenAI-compatible API):
curl -X POST "http://localhost:8000/v1/completions" \
	-H "Content-Type: application/json" \
	--data '{
		"model": "Rajasrl/VLSI-SLM-V1-CodeLlama-Full",
		"prompt": "Once upon a time,",
		"max_tokens": 512,
		"temperature": 0.5
	}'

Use Docker

docker model run hf.co/Rajasrl/VLSI-SLM-V1-CodeLlama-Full

SGLang

How to use Rajasrl/VLSI-SLM-V1-CodeLlama-Full with SGLang:

Install from pip and serve model

# Install SGLang from pip:
pip install sglang
# Start the SGLang server:
python3 -m sglang.launch_server \
    --model-path "Rajasrl/VLSI-SLM-V1-CodeLlama-Full" \
    --host 0.0.0.0 \
    --port 30000
# Call the server using curl (OpenAI-compatible API):
curl -X POST "http://localhost:30000/v1/completions" \
	-H "Content-Type: application/json" \
	--data '{
		"model": "Rajasrl/VLSI-SLM-V1-CodeLlama-Full",
		"prompt": "Once upon a time,",
		"max_tokens": 512,
		"temperature": 0.5
	}'

Use Docker images

docker run --gpus all \
    --shm-size 32g \
    -p 30000:30000 \
    -v ~/.cache/huggingface:/root/.cache/huggingface \
    --env "HF_TOKEN=<secret>" \
    --ipc=host \
    lmsysorg/sglang:latest \
    python3 -m sglang.launch_server \
        --model-path "Rajasrl/VLSI-SLM-V1-CodeLlama-Full" \
        --host 0.0.0.0 \
        --port 30000
# Call the server using curl (OpenAI-compatible API):
curl -X POST "http://localhost:30000/v1/completions" \
	-H "Content-Type: application/json" \
	--data '{
		"model": "Rajasrl/VLSI-SLM-V1-CodeLlama-Full",
		"prompt": "Once upon a time,",
		"max_tokens": 512,
		"temperature": 0.5
	}'

Docker Model Runner
How to use Rajasrl/VLSI-SLM-V1-CodeLlama-Full with Docker Model Runner:
```
docker model run hf.co/Rajasrl/VLSI-SLM-V1-CodeLlama-Full
```

VLSI-SLM V1 — CodeLlama Full Model

The first open-source, edge-trained, laptop-deployable Small Language Model specialized for VLSI design.

A 7B parameter CodeLlama model fine-tuned on 30,354 curated VLSI examples — trained entirely on a NVIDIA Jetson Orin edge device with no cloud compute. Generates syntactically correct Verilog, explains VLSI concepts accurately, and runs offline on a 4GB laptop after quantization.

Model Details

Property	Value
Base Model	CodeLlama-7B-Instruct
Fine-tuning Method	LoRA (r=32, α=64)
Trainable Parameters	82,265,088 (1.21% of 6.82B)
Training Hardware	NVIDIA Jetson Orin 64GB (edge device)
Training Time	~84 hours wall time
Dataset Size	30,354 examples (train) / 1,681 (val)
Training Epochs	3
Final Train Loss	0.0122
Best Val Loss	0.3892 (step 4000)
Precision	bfloat16 (no quantization during training)
License	MIT

LoRA Configuration

LoraConfig(
    r=32,
    lora_alpha=64,
    target_modules=[
        "q_proj", "k_proj", "v_proj", "o_proj",   # Attention
        "gate_proj", "up_proj", "down_proj",       # MLP/FFN
        "embed_tokens", "lm_head",                 # Embeddings
    ],
    lora_dropout=0.05,
    bias="none",
    task_type="CAUSAL_LM",
)

Repository Contents

VLSI-SLM-V1-CodeLlama-Full/
├── final_model/          ← Merged full model (~14GB, bf16 safetensors)
├── final_adapter/        ← LoRA adapter only (~200MB)
├── checkpoint-5000/      ← Training checkpoint
├── checkpoint-5250/      ← Training checkpoint
├── checkpoint-5500/      ← Training checkpoint
├── checkpoint-5691/      ← Final training checkpoint
├── evaluation/           ← Benchmark results and logs
├── logs/                 ← Full training logs
├── baseline_pre_ft.json  ← Base model responses (pre fine-tuning)
├── best_checkpoint.txt   ← Best validation checkpoint info
├── heartbeat.json        ← Last training state
└── m4_config_v31.json    ← Exact training hyperparameters

Evaluation Results

Evaluated using a semantic scoring system (not rigid keyword matching) with max_new_tokens=1024.

Standard 50-Question VLSI Benchmark

Metric	Score	Target	Status
Code Syntax Pass (iverilog)	60.0%	40–60%	✅ PASS
Concept Accuracy	65.0%	85–90%	🟡 CLOSE
Hallucination Rate	0.0%	<5%	✅ PERFECT
Code Block Formatting	95.0%	—	✅
Debug Accuracy	60.0%	—	🟡
Overall	72.0%	—	✅

Coding Stress Test (50 Progressive Questions)

Difficulty	Questions	Pass Rate	Examples
Easy	10	100%	AND gate, DFF, counter, decoder
Medium	15	87%	FIFO, ALU, FSM, synchronizer
Hard	13	62%	Async FIFO, AXI-Lite, SPI master
Expert	12	42%	FP adder, MBIST, JTAG TAP controller

The model handles all standard VLSI building blocks cleanly. Expert-level complex modules (1000+ tokens) show truncation artifacts — a known training data issue being addressed in V2.

Quick Start

Load and Run Inference

from transformers import AutoTokenizer, AutoModelForCausalLM
import torch

model_id = "Rajasrl/VLSI-SLM-V1-CodeLlama-Full"

tokenizer = AutoTokenizer.from_pretrained(f"{model_id}/final_model")
model = AutoModelForCausalLM.from_pretrained(
    f"{model_id}/final_model",
    torch_dtype=torch.bfloat16,
    device_map="auto",
)
model.eval()

def ask_vlsi(question: str, code_mode: bool = False) -> str:
    if code_mode:
        system = """You are a Senior VLSI RTL Engineer.
Rules:
1. Always wrap code in ```verilog blocks
2. Use non-blocking assignments (<=) in sequential always blocks
3. Use blocking assignments (=) in combinational always blocks
4. Always include complete module with endmodule
5. Never use reserved keywords as signal names"""
    else:
        system = "You are an expert VLSI engineer. Give accurate, technical answers."

    prompt = f"### System:\n{system}\n\n### Instruction:\n{question}\n\n### Response:\n"
    
    inputs = tokenizer(prompt, return_tensors="pt").to(model.device)
    
    with torch.no_grad():
        output = model.generate(
            **inputs,
            max_new_tokens=1024,       # Important: use 1024+ for complete modules
            temperature=0.0 if code_mode else 0.1,
            do_sample=not code_mode,
            repetition_penalty=1.1,
            pad_token_id=tokenizer.eos_token_id,
        )
    
    response = tokenizer.decode(
        output[0][inputs["input_ids"].shape[1]:],
        skip_special_tokens=True
    )
    return response.strip()

# Code generation (deterministic)
print(ask_vlsi(
    "Write a parameterizable 8-bit synchronous counter with reset.",
    code_mode=True
))

# Concept explanation
print(ask_vlsi(
    "Explain clock domain crossing and how to handle it safely.",
    code_mode=False
))

Run with Ollama (Recommended for Laptop Deployment)

First quantize to GGUF:

# Install llama.cpp
git clone https://github.com/ggerganov/llama.cpp
cd llama.cpp && make -j4

# Convert and quantize
python convert_hf_to_gguf.py ./final_model --outtype f16 \
    --outfile vlsi_slm_v1_f16.gguf

./llama-quantize vlsi_slm_v1_f16.gguf vlsi_slm_v1_Q4_K_M.gguf Q4_K_M
# Output: ~4GB file, runs on any laptop

Create Modelfile:

FROM ./vlsi_slm_v1_Q4_K_M.gguf

SYSTEM """You are an expert VLSI and Verilog engineer.
For code: output only syntactically correct, synthesizable Verilog.
Use non-blocking assignments (<=) in sequential always blocks.
Always wrap code in ```verilog blocks.
Always include endmodule.
For concepts: give accurate, technical explanations."""

PARAMETER temperature 0.1
PARAMETER num_ctx 2048

ollama create vlsi-slm-v1 -f Modelfile
ollama run vlsi-slm-v1

What This Model Can Do ✅

Strong Capabilities (Easy–Medium complexity)

Verilog Code Generation:

Flip-flops (D, T, JK) with synchronous/asynchronous reset
Counters (binary, Gray code, Johnson, LFSR)
Multiplexers, encoders, decoders
Shift registers (parameterizable width/depth)
State machines (Moore and Mealy FSM)
Synchronous SRAM and FIFO
Clock dividers and pulse generators
Debounce circuits
Two-flop CDC synchronizers
Basic AXI-Lite and handshake protocols
Simple UART, SPI, I2C controllers
Testbench templates

VLSI Concept Explanations:

Clock Domain Crossing (CDC) and metastability
Setup time and hold time analysis
Power reduction: clock gating and power gating
Static Timing Analysis (STA) concepts
Scan chains and Design for Testability (DFT)
SRAM vs DRAM differences
Electromigration and IR drop
AXI, APB, AHB protocol rules
Blocking vs non-blocking assignments
Latch inference and how to avoid it

Partial Capabilities (Hard complexity)

Asynchronous FIFO with Gray code pointers (architecture correct, may miss endmodule)
Round-robin arbiters
Pipeline structures
SPI master/slave controllers
Branch predictors
Memory BIST controllers

Known Limitations ⚠️

1. Truncation Artifact (Primary Known Issue)

Complex modules exceeding ~800 tokens of output may be cut off before endmodule. This is a training data artifact — the dataset was generated using free APIs with 1800-token output limits, and truncated examples leaked through. The model learned this truncation pattern as a behavior.

Workaround: Always set max_new_tokens=1024 or higher. If output is still truncated, append \nendmodule manually — the logic inside is typically correct.

Fix in progress: V2 training uses strict endmodule validation gates in the data pipeline.

2. Concept Accuracy Gap

Concept accuracy is 65% vs the 85-90% target. Root cause: PDF textbooks were extracted page-by-page (not paragraph-by-paragraph), causing "semantic blur" where opposing concepts (e.g., Setup vs Hold timing) were mixed in the same training example.

3. Submodule Hallucination

Occasionally instantiates undefined submodules (fa fa0(...) style) when asked for gate-level designs. Best avoided by explicitly requesting "behavioral RTL" in your prompt.

4. Not Trained for SoC-Level Design

This model is optimized for block-level RTL (FIFOs, arbiters, FSMs, protocol controllers). It is not intended for full SoC or chip-level architecture. Expert-level questions (5-stage RISC pipeline, NoC routers, IEEE 754 FP units) are attempted but may be incomplete.

5. Hardware Constraints on Base Hardware

Trained on a 64GB Jetson Orin. The merged model requires ~~15GB RAM. Use the GGUF Q4_K_M quantized version (~~4GB) for laptop deployment.

Training Details

Hardware

This model was trained entirely on a NVIDIA Jetson Orin 64GB — an edge computing device, with no cloud GPUs used.

Device      : NVIDIA Jetson Orin (64GB unified RAM)
CUDA        : 12.6 (ARM64)
OS          : Ubuntu 22.04
PyTorch     : 2.5.0a0 nv24.8
Transformers: 4.44.0
PEFT        : 0.18.1
TRL         : 0.8.6

Important hardware note: bitsandbytes is not compatible with CUDA 12.6 on Jetson Orin ARM64. Training used pure bfloat16 with adamw_torch optimizer. If you attempt to run this model on similar ARM64 Jetson hardware, do not use bitsandbytes or NEFTune.

Training Configuration

TrainingArguments(
    num_train_epochs=3,
    per_device_train_batch_size=1,
    gradient_accumulation_steps=16,     # Effective batch = 16
    learning_rate=2e-5,
    lr_scheduler_type="cosine",
    warmup_ratio=0.03,
    bf16=True,
    fp16=False,
    gradient_checkpointing=True,
    optim="adamw_torch",
    max_grad_norm=1.0,
    save_steps=500,
    eval_steps=500,
    save_total_limit=4,
    group_by_length=True,
)

Thermal Management Innovation

A custom thermal batching system was implemented:

Every 250 training steps: save checkpoint → 5-minute cooldown → resume
Table fan added for additional airflow
Result: GPU temperature maintained at 44–61°C throughout 84-hour run
6 power outages during training — all recovered via atomic heartbeat checkpointing

Dataset

Source          : Curated VLSI examples (code + concept + QA)
Format          : Alpaca instruction tuning
Train           : 30,354 examples
Validation      : 1,681 examples  
Test            : 1,681 examples
Categories      : 75.8% code_generation, 23.0% concept, 1.2% QA
Max seq length  : 2048 tokens
Decontamination : ✅ Zero benchmark leaks verified

Comparison: Base vs Fine-tuned

Metric	Base CodeLlama-7B	VLSI-SLM V1
Verilog syntax knowledge	General	VLSI-specialized
VLSI concept depth	Surface-level	Detailed and accurate
Hallucination rate	~10%	0.0%
Code syntax pass (iverilog)	~0%	60%
Runs offline	✅	✅
Deployable on laptop	✅ (4GB Q4)	✅ (4GB Q4)
Cost	Free	Free

Roadmap: What V2 Will Fix

VLSI-SLM V2 is currently in development with the following improvements:

Issue	V1 Status	V2 Fix
Truncated endmodule	Present in complex modules	Strict validation gate in data pipeline
Concept accuracy 65%	Below target	Layout-aware PDF chunking (paragraph-level)
Submodule hallucination	Occasional	Anti-submodule prompt in data generation
Dataset quality	Quantity-focused (30K)	Quality-focused (12K clean)
JSON data corruption	Silent patching	Strict drop-on-failure
EOS alignment	Not enforced	EOS token after endmodule
Concept/code ratio	23%/75%	50%/50% balanced

Target V2 metrics:

Code Syntax Pass: 65–75%
Concept Accuracy: 85–90%
Hallucination Rate: <2%

How to Contribute / Develop Further

1. Improve the Dataset

The biggest gains come from data quality, not model size.

# The most impactful contribution: add validated Verilog examples
# Requirements:
# - Must compile with iverilog
# - Must end with endmodule/endinterface/endpackage
# - Must be self-contained (no undefined submodules)
# - Alpaca format: {"instruction": ..., "input": "", "output": ...}

# Validate before contributing:
import subprocess
result = subprocess.run(["iverilog", "-tnull", "your_file.v"],
                       capture_output=True, text=True)
assert result.returncode == 0, f"Syntax error: {result.stderr}"
assert "endmodule" in open("your_file.v").read()

2. Fine-tune Further on Your Domain

Use LoRA to specialize for your specific VLSI area:

from peft import LoraConfig, get_peft_model, PeftModel

# Load V1 as base for V2 fine-tuning
model = AutoModelForCausalLM.from_pretrained(
    "Rajasrl/VLSI-SLM-V1-CodeLlama-Full/final_model",
    torch_dtype=torch.bfloat16,
    device_map="auto",
)

# Add new LoRA adapters for your domain
# (FPGA-specific, ASIC timing, formal verification, etc.)
lora_config = LoraConfig(r=16, lora_alpha=32, ...)
model = get_peft_model(model, lora_config)

3. Extend to SystemVerilog / UVM

The model has basic SV knowledge but was primarily trained on Verilog-2001. Adding UVM testbench examples and SystemVerilog assertions (SVA) would significantly improve verification use cases.

4. Add Image Recognition

A compelling future direction: multi-modal VLSI assistant that can:

Read handwritten schematic photos → generate Verilog
Analyze timing diagrams → identify violations
Recognize circuit board components → explain connections

5. Build a Retrieval-Augmented Generation (RAG) Layer

Connect the model to a vector database of VLSI standards (IEEE 1800, AMBA AXI spec, IEEE 1149.1 JTAG) for factually grounded answers.

6. Evaluation Contributions

Add more benchmark questions to evaluation/ folder — especially:

Formal verification questions (SVA, PSL)
Physical design (placement, routing, DRC)
Analog/mixed-signal interfaces
RISC-V specific RTL patterns

Citation

If you use this model in your research, please cite:

@misc{vlsi-slm-v1-2026,
  title        = {VLSI-SLM V1: An Edge-Trained Small Language Model for VLSI Design},
  author       = {Rajasrl},
  year         = {2026},
  publisher    = {Hugging Face},
  howpublished = {\url{https://huggingface.co/Rajasrl/VLSI-SLM-V1-CodeLlama-Full}},
  note         = {Fine-tuned CodeLlama-7B on NVIDIA Jetson Orin edge hardware.
                  30,354 curated VLSI examples. Zero cloud compute.}
}

The Story

This model was trained by a final-year engineering student on borrowed edge hardware, with no cloud budget, no research lab, and no team. The training ran through 6 power outages, lightning storms, and thermal shutdowns — all recovered automatically.

The goal was simple: build a VLSI assistant that works offline, costs nothing to run, and belongs to the community — not behind an API paywall.

"I built an AI to teach me VLSI."

License

MIT License — free to use, modify, and distribute. See LICENSE for details.

Model trained: March 29 – April 3, 2026 Uploaded to Hugging Face: May 2026 Hardware: NVIDIA Jetson Orin 64GB (edge device, no cloud)

Downloads last month: -; Downloads are not tracked for this model. How to track

Model tree for Rajasrl/VLSI-SLM-V1-CodeLlama-Full

Base model

codellama/CodeLlama-7b-Instruct-hf

Adapter

(426)

this model