What's a good value for pad_to_multiple_of?

Have anyone tried this? I can’t find a suggested value in doc.
I am running on GPU (with tensor core).

If you use mixed precision, you need all your tensors to have dimensions that are multiple of 8s to maximize the benefits of your tensor cores.
So pas_to_multiple_of=8 is a good value, unless you model has some pooling (like Funnel Transformer) in which case those 8 might be divided by 2 (you’d need pad_to_multiple_of=32 for this model for instance, since there are two pooling operations).

3 Likes

Hi, I am new to this and does not clearly understand how this work and what it is used for. Can anyone please explain the purpose of this parameter?

The NVIDIA documentation has excellent explanations of these things. In summary, it depends on the numeric type you are using, but to optimize tensor cores, the dimensions must be a multiple of certain values. For fp16, a multiple of 8 is required. For int8, a multiple of 16.

1 Like