AI Model Accuracy by Providers

Accuracy across all models on all grammatical features

This chart compares the performance of seven large language models on Irish grammar tests, with accuracy percentages ranging from 51.7% to 73.1%. Claude 3.5 from Anthropic achieved the highest accuracy at 73.1%, followed by GPT-4.1 from OpenAI at 71.8% and GPT-4o at 70.4%. Google's models showed more varied results with Gemini 2.5 Pro scoring 67.0%, Gemini 2.0 Flash at 64.3%, and Gemini 2.5 Flash trailing at 51.7%. Interestingly, newer model versions didn't always outperform their predecessors – Claude 3.5 scored higher than Claude 3.7 (66.2%), and Gemini 2.0 Flash outperformed Gemini 2.5 Flash. These results demonstrate that all major AI providers have developed models with meaningful competency in Irish grammar despite it being a low-resource language.