Microsoft Releases Phi-4-mini-Flash-Reasoning: Efficient Long-Context Reasoning with Compact Architecture
Microsoft's Phi-4-mini-Flash-Reasoning introduces a lightweight, open-source language model optimized for long-context reasoning tasks, such as multi-hop question answering and math problem solving. With 3.8 billion parameters, it is a distilled version of Phi-4-mini, leveraging the innovative SambaY decoder-hybrid architecture that combines State Space Models (SSMs) with attention layers, enabling up to ten times faster inference on long-generation tasks compared to previous models. This architecture employs the Gated Memory Unit (GMU) to facilitate efficient memory sharing across layers, significantly reducing latency and computational overhead