Choose Metrics That Matter for AI Eval