What Offline CoT Distillation Taught Us About Small Vision-Language Models

Reasoning Is Expensive — But It Doesn’t Have to Be Reasoning is one of the most expensive capabilities to train in Vision-Language Models. Most recent approaches rely on: very large models, long context windows, online teacher–student setups, or reinforcement learning. All of these assume abundant compute. In the SiQ-VL project, we had none of that. What we did have was a question: Can a small VLM learn to reason if we only change how we train it? ...

December 11, 2025 · 4 min · Duo An