Can We Improve Llama 3s Reasoning Through Post-Training Alone? ASTRO Shows +16% to +20% Benchmark Gains
Researchers at Meta AI and the University of Washington have developed ASTRO (Autoregressive Search-Taught Reasoner), a novel post-training framework that significantly enhances the reasoning capabilities of Llama-3.1-70B-Instruct without altering its architecture. ASTRO leverages Monte Carlo Tree Search to generate search-guided chain-of-thought trajectories, including both successful and failed reasoning paths, which are linearized and used for supervised fine-tuning, resulting in substantial benchmark improvementssuch as boosting Llama 3s math accuracy from 65.8% to 81.8% on M