What's a good value for pad_to_multiple_of?

ttj · October 12, 2020, 11:04am

Have anyone tried this? I can’t find a suggested value in doc.
I am running on GPU (with tensor core).

sgugger · October 12, 2020, 12:19pm

If you use mixed precision, you need all your tensors to have dimensions that are multiple of 8s to maximize the benefits of your tensor cores.
So pas_to_multiple_of=8 is a good value, unless you model has some pooling (like Funnel Transformer) in which case those 8 might be divided by 2 (you’d need pad_to_multiple_of=32 for this model for instance, since there are two pooling operations).

yaxirhuxxain · May 21, 2021, 10:08am

Hi, I am new to this and does not clearly understand how this work and what it is used for. Can anyone please explain the purpose of this parameter?

lkurlandski · August 29, 2023, 5:43pm

The NVIDIA documentation has excellent explanations of these things. In summary, it depends on the numeric type you are using, but to optimize tensor cores, the dimensions must be a multiple of certain values. For fp16, a multiple of 8 is required. For int8, a multiple of 16.

Topic		Replies	Views
Can we use mixed precision with all? (fp16 + fp32 + bf16) 🤗Transformers	0	272	December 1, 2022
ValueError: Mixed precision training with AMP or APEX (`--fp16` or `--bf16`) and half precision evaluation (`--fp16_full_eval` or `--bf16_full_eval`) can only be used on CUDA devices 🤗Transformers	0	1959	May 17, 2022
Memory footprint in mixed precision training? Beginners	1	817	June 29, 2023
Mixed precision for bfloat16-pretrained models 🤗Transformers	2	12356	April 21, 2021
Does fp16 training compromise accuracy? Models	2	1193	May 17, 2022

What's a good value for pad_to_multiple_of?

Related topics