Roughly half of the models’ answers were inaccurate

Team evaluations of AI responses

(Please use a modern browser to see the interactive version of this visualization)

Ratings were determined by majority vote.