What's a good value for pad_to_multiple_of?

Have anyone tried this? I can’t find a suggested value in doc.
I am running on GPU (with tensor core).

If you use mixed precision, you need all your tensors to have dimensions that are multiple of 8s to maximize the benefits of your tensor cores.
So pas_to_multiple_of=8 is a good value, unless you model has some pooling (like Funnel Transformer) in which case those 8 might be divided by 2 (you’d need pad_to_multiple_of=32 for this model for instance, since there are two pooling operations).


Hi, I am new to this and does not clearly understand how this work and what it is used for. Can anyone please explain the purpose of this parameter?