AITemplate is fastest with smaller batches on NVIDIA and bigger batches on AMD GPUs
Relative speedup of Meta's AITemplate when used on NVIDIA's A100 vs AMD's MI250 GPUs (X times)