Tulu‑2‑DPO‑13B
Tulu‑2‑DPO‑13BWhat is Tulu‑2‑DPO‑13B?
Tulu‑2‑DPO‑13B is a 13‑billion‑parameter LLaMA‑2 model, developed by the Allen Institute, fine‑tuned through Direct Preference Optimization (DPO) for robust, preference-aligned instruction-following. It builds upon a supervised fine-tuned (SFT) model trained on a wide mix of public and synthetic instruction datasets including Alpaca, Baize, FLAN, GPTeacher, and Code‑Alpaca, then enhanced via DPO using human preference data to create a model with improved reasoning, multi-turn dialogue, and instruction coherence (Hugging Face).
This model is part of the Tulu‑2 family and is released under the AI2 ImpACT Low-Risk license, making it one of the most openly accessible yet high-performance chat models in its class.
Key Features of Tulu‑2‑DPO‑13B
Use Cases of Tulu‑2‑DPO‑13B
Tulu‑2‑DPO‑13Bv/sComparable 13B Chat Models
| Feature | Tulu-2-SFT-13B | Tulu-2-DPO-13B | LLaMA-2 Chat-13B |
|---|---|---|---|
| Base Model | LLaMA-2-13B | LLaMA-2-13B | LLaMA-2-13B |
| Tuning Type | Supervised FT | DPO + SFT | RLHF (Meta) |
| MT-Bench Score | 6.70 | 7.00 | ~6.5 |
| AlpacaEval Win Rate | 78.9% | 89.5% | ~70-75% |
| Quantization Support | GGUF, GPTQ | GGUF, GPTQ | GGUF, GPTQ |
| License | AI2 ImpACT | AI2 ImpACT | Non-commercial |
Future of the Tulu‑2‑DPO‑13B
Tulu‑2‑DPO‑13B delivers one of the most capable instruction-tuned experiences among 13B models. It’s ideal for users seeking full transparency in datasets and training methodology, reliable offline usage through GGUF or GPTQ formats, and preference-aligned behavior, all without the complexity of RLHF. With a clear license for research and internal deployment, Tulu‑2‑DPO‑13B is a trustworthy choice for aligned, open AI development.