Bfloat16 conversion results in significantly slower computation for various transformer models

Hi,

I have recently started looking into mixed precision for various transformer models. I noticed that the computational speed of various transformer models (gpt2, base-bert,…) are significantly slower with bfloat16 compared to float32 on CPU. Is this an optimization issue with pytorch’s autocast?

Sample Code

model =  AutoModel.from_pretrained('gpt2')
autocast = torch.autocast(dtype=torch.bfloat16, device_type="cpu")
with autocast:
    outputs = model(some data)

Thanks,
Eugene

1 Like