2026  3

March  1

Why Variable Sequence Length Breaks DDP Throughput

March 12, 2026 · 8 min · Duo An

February  1

Learning PyTorch DDP Performance Tuning on a One-GPU Machine

February 18, 2026 · 15 min · Duo An

January  1

Profiling a PyTorch Training Job End to End

January 16, 2026 · 12 min · Duo An

2025  5

December  4

Building a Truly Scalable Multimodal Data Pipeline: A Streaming-First View

December 22, 2025 · 4 min · Duo An

A Minimal Recipe for Training VLMs Under Compute Constraints

December 15, 2025 · 3 min · Duo An

What Offline CoT Distillation Taught Us About Small Vision-Language Models

December 11, 2025 · 4 min · Duo An

SiQ-VL: Training a Reasoning-Capable VLM When You’re GPU-Poor

December 5, 2025 · 3 min · Duo An

February  1

From Scaling Laws to Cluster Size: A Practical Guide to Planning Large-Scale Model Training

February 2, 2025 · 9 min · Duo An