Control-R: Towards controllable test-time scaling
📖 Article Preview
The paper introduces Reasoning Control Fields (RCF), a novel test-time method that injects structured control signals to guide and adjust reasoning effort in Large Reasoning Models, addressing issues of underthinking and overthinking in long chain-of-thought reasoning. It also presents the Control-R-4K dataset with annotated reasoning processes and proposes a Conditional Distillation Finetuning (CDF) approach, achieving state-of-the-art results on benchmarks like AIME2024 and MATH500 with controllable reasoning capabilities
Read the Complete Article
Get the full story with in-depth analysis, expert insights, and comprehensive coverage from the original source.
Stay Informed
Get the latest AI insights and breakthroughs delivered to your inbox weekly.
We respect your privacy. Unsubscribe at any time. Privacy Policy