Human Evaluation: GPT Win Rates (%) Based on Item Scores Per Language Pair

Figure 6 in the paper shows the results of the human evaluation of text-davinci-003 (%) per language pair.

(Please use a modern browser to see the interactive version of this visualization)