AITemplate is fastest with smaller batches on NVIDIA and bigger batches on AMD GPUs

Relative speedup of Meta's AITemplate when used on NVIDIA's A100 vs AMD's MI250 GPUs (X times)

Source: Meta
Default style