Accuracy across all models on all grammatical features
This chart compares the performance of seven large language models on Irish grammar tests, with accuracy percentages ranging from 51.7% to 73.1%. Claude 3.5 from Anthropic achieved the highest accuracy at 73.1%, followed by GPT-4.1 from OpenAI at 71.8% and GPT-4o at 70.4%. Google's models showed more varied results with Gemini 2.5 Pro scoring 67.0%, Gemini 2.0 Flash at 64.3%, and Gemini 2.5 Flash trailing at 51.7%. Interestingly, newer model versions didn't always outperform their predecessors – Claude 3.5 scored higher than Claude 3.7 (66.2%), and Gemini 2.0 Flash outperformed Gemini 2.5 Flash. These results demonstrate that all major AI providers have developed models with meaningful competency in Irish grammar despite it being a low-resource language.