When training a LLM to reason in 10 steps, the further the test reasoning sequence length are from 10, the larger the output error.
(Please use a modern browser to see the interactive version of this visualization)