Instructions to use psp-dada/Gemma2-9B-IT-Uni-DPO with libraries, inference providers, notebooks, and local apps. Follow these links to get started.
- Libraries
- Transformers
How to use psp-dada/Gemma2-9B-IT-Uni-DPO with Transformers:
# Use a pipeline as a high-level helper from transformers import pipeline pipe = pipeline("text-generation", model="psp-dada/Gemma2-9B-IT-Uni-DPO")# Load model directly from transformers import AutoTokenizer, AutoModelForCausalLM tokenizer = AutoTokenizer.from_pretrained("psp-dada/Gemma2-9B-IT-Uni-DPO") model = AutoModelForCausalLM.from_pretrained("psp-dada/Gemma2-9B-IT-Uni-DPO") - Notebooks
- Google Colab
- Kaggle
- Local Apps
- vLLM
How to use psp-dada/Gemma2-9B-IT-Uni-DPO with vLLM:
Install from pip and serve model
# Install vLLM from pip: pip install vllm # Start the vLLM server: vllm serve "psp-dada/Gemma2-9B-IT-Uni-DPO" # Call the server using curl (OpenAI-compatible API): curl -X POST "http://localhost:8000/v1/completions" \ -H "Content-Type: application/json" \ --data '{ "model": "psp-dada/Gemma2-9B-IT-Uni-DPO", "prompt": "Once upon a time,", "max_tokens": 512, "temperature": 0.5 }'Use Docker
docker model run hf.co/psp-dada/Gemma2-9B-IT-Uni-DPO
- SGLang
How to use psp-dada/Gemma2-9B-IT-Uni-DPO with SGLang:
Install from pip and serve model
# Install SGLang from pip: pip install sglang # Start the SGLang server: python3 -m sglang.launch_server \ --model-path "psp-dada/Gemma2-9B-IT-Uni-DPO" \ --host 0.0.0.0 \ --port 30000 # Call the server using curl (OpenAI-compatible API): curl -X POST "http://localhost:30000/v1/completions" \ -H "Content-Type: application/json" \ --data '{ "model": "psp-dada/Gemma2-9B-IT-Uni-DPO", "prompt": "Once upon a time,", "max_tokens": 512, "temperature": 0.5 }'Use Docker images
docker run --gpus all \ --shm-size 32g \ -p 30000:30000 \ -v ~/.cache/huggingface:/root/.cache/huggingface \ --env "HF_TOKEN=<secret>" \ --ipc=host \ lmsysorg/sglang:latest \ python3 -m sglang.launch_server \ --model-path "psp-dada/Gemma2-9B-IT-Uni-DPO" \ --host 0.0.0.0 \ --port 30000 # Call the server using curl (OpenAI-compatible API): curl -X POST "http://localhost:30000/v1/completions" \ -H "Content-Type: application/json" \ --data '{ "model": "psp-dada/Gemma2-9B-IT-Uni-DPO", "prompt": "Once upon a time,", "max_tokens": 512, "temperature": 0.5 }' - Docker Model Runner
How to use psp-dada/Gemma2-9B-IT-Uni-DPO with Docker Model Runner:
docker model run hf.co/psp-dada/Gemma2-9B-IT-Uni-DPO
Configuration Parsing Warning:Config file tokenizer_config.json cannot be fetched (too big)
Model Card for psp-dada/Gemma2-9B-IT-Uni-DPO | ICLR 2026 | Uni-DPO:
A Unified Paradigm for Dynamic Preference Optimization of LLMs
🎊 News
- [2026.02.16] 📖 Code, data, and models are released!
- [2026.01.26] 🎉 Our Uni-DPO is accepted by ICLR 2026!
🚀 Overview
Uni-DPO introduces a unified dynamic preference optimization paradigm for training large language models (LLMs) from preference data. Unlike prior DPO-based methods that treat all preference pairs equally, Uni-DPO jointly considers intrinsic data quality and model learning dynamics, enabling more effective and robust preference learning.
Key advantages:
- Quality-aware: Adaptively prioritizes high-quality preference pairs while down-weighting ambiguous ones.
- Dynamics-aware: Shifts training focus toward under-fitted samples to mitigate overfitting.
- Unified & lightweight: Seamlessly integrates dual-perspective weighting and calibrated NLL into standard DPO with minimal overhead.
🔑 Key Features
- Dual-perspective dynamic weighting for preference optimization. Uni-DPO jointly models what data is worth learning (intrinsic quality) and what the model still struggles with (learning dynamics). By combining a quality-aware weight and a performance-aware weight, Uni-DPO dynamically reallocates training focus throughout optimization.
- Quality-aware weighting filters ambiguous preference pairs. Preference data varies widely in reliability. Uni-DPO leverages score margins between preferred and rejected responses to assign higher weights to clear, high-quality pairs while suppressing noisy or ambiguous ones.
- Performance-aware weighting mitigates overfitting during training. High-quality samples are not always the most informative once the model has already mastered them. Uni-DPO introduces a stabilized focal-style performance weight that down-weights well-fitted pairs and emphasizes hard-but-informative examples, effectively reducing overfitting.
- Decoupling data quality from learning difficulty. Empirical analysis reveals that data quality (score margin) and learning difficulty (reward margin) are weakly correlated. Uni-DPO explicitly models this mismatch, ensuring that optimization is guided by both dimensions rather than relying on either alone.
- State-of-the-art performance across text, math, and multimodal benchmarks. Uni-DPO consistently outperforms DPO and SimPO across diverse settings.
How to use
For the details of this model, please refer to the documentation of the GitHub repo.
📝 Citation
If you find our model/code/data/paper helpful, please consider citing our papers 📝 and starring us ⭐️!
@inproceedings{peng2026unidpo,
title = {Uni-{DPO}: A Unified Paradigm for Dynamic Preference Optimization of {LLM}s},
author = {Shangpin Peng and Weinong Wang and Zhuotao Tian and Senqiao Yang and Xing W and Haotian Xu and Chengquan Zhang and Takashi Isobe and Baotian Hu and Min Zhang},
booktitle = {The Fourteenth International Conference on Learning Representations},
year = {2026},
url = {https://openreview.net/forum?id=G7DBGlgjjp}
}
📧 Contact us
If you have any questions, comments, or suggestions, please do not hesitate to submit an issue or PR to help advance research in this area.
- Downloads last month
- 8