Furthermore, they show a counter-intuitive scaling Restrict: their reasoning effort and hard work boosts with challenge complexity around a degree, then declines In spite of having an ample token budget. By evaluating LRMs with their normal LLM counterparts underneath equivalent inference compute, we discover a few functionality regimes: (one) https://www.youtube.com/watch?v=snr3is5MTiU