Starling‑LM‑7B‑Alpha
Starling‑LM‑7B‑AlphaRLAIF-Tuned Chat Excellence at 7B
What is Starling‑LM‑7B‑Alpha?
Starling‑LM‑7B‑Alpha, also called Starling‑7B, is a 7‑billion‑parameter open-source chat model developed by researchers at UC Berkeley. It is fine‑tuned from OpenChat-3.5 using Reinforcement Learning from AI Feedback (RLAIF) and a high-quality GPT‑4–labeled ranking dataset called Nectar. This gives it exceptional dialogue alignment and helpfulness, scoring 8.09 on MT‑Bench, surpassing nearly all open models except GPT‑4 and GPT‑4 Turbo (starling.cs.berkeley.edu).
Key Features of Starling‑LM‑7B‑Alpha
Use Cases of Starling‑LM‑7B‑Alpha
Starling‑LM‑7B‑Alphav/sOther 7B Models
| Feature | Starling-LM-7B-Alpha | OpenChat-3.5-0106 | Tulu-2-DPO-13B |
|---|---|---|---|
| Base Model | OpenChat-3.5 (Mistral-7B) | Same | LLaMA-2 13B |
| Tuning Method | RLAIF (GPT-4 Ranked Nectar) | SFT + C-RLHF | SFT + DPO Preference |
| MT-Bench Score | 8.09 | ~7.81 | ~7.00 |
| AlpacaEval Score | ~91.99% | ~88.5% | ~89.5% |
| License | Open Non-Commercial | Open Non-Commercial | AI2 ImpACT Low-Risk |
| Quant Options | Highly Quantized GGUF/GPTQ | Quant options | GGUF/GPTQ Similar |
Future of the Starling‑LM‑7B‑Alpha
Starling‑LM‑7B‑Alpha proves that preference-tuned RL models can perform at near‑state-of‑the‑art levels, even at just 7B parameters, and remain accessible and open for developers, researchers, and AI creators.