Length Generalisation Error in Chain of Thought

When training a LLM to reason in 10 steps, the further the test reasoning sequence length are from 10, the larger the output error.