Bfloat16 conversion results in significantly slower computation for various transformer models

Athrunxyz · December 20, 2021, 9:12pm

Hi,

I have recently started looking into mixed precision for various transformer models. I noticed that the computational speed of various transformer models (gpt2, base-bert,…) are significantly slower with bfloat16 compared to float32 on CPU. Is this an optimization issue with pytorch’s autocast?

Sample Code

model =  AutoModel.from_pretrained('gpt2')
autocast = torch.autocast(dtype=torch.bfloat16, device_type="cpu")
with autocast:
    outputs = model(some data)

Thanks,
Eugene

Topic		Replies	Views
Loading in Float32 vs Float16 has very different speed 🤗Transformers	1	111	February 20, 2025
Does autogpt-q require float16? 🤗Transformers	0	383	August 28, 2023
Baffling performance issue on most NVidia GPUs with simple transformers + pytorch code Intermediate	5	4506	April 9, 2024
Float16 on CPU torch support Beginners	0	1006	January 16, 2024
Mixed precision for bfloat16-pretrained models 🤗Transformers	2	12376	April 21, 2021

Bfloat16 conversion results in significantly slower computation for various transformer models

Related topics