Text Entropy vs Big model's Loss

x axis: Big model's CE loss on Base and Stateful models generated texts.
y axis: Uncompressed to compressed size ratio (LZMA algorithm)
This graph was obtained by tuning the softmax temperature of the generative models.