Here is how the same 20k input + 5k output example compares across several current popular models:
(Please use a modern browser to see the interactive version of this visualization)