In training data, document size (a proxy for the number of characters in a given work) is a key indicator of how a model will behave. For example, models trained on Books3 take seven times as many instructions from Shakespeare than they do from Nobel Prize winner Alice Munro.