Probably an easy one, but not having any luck in finding the solution, so thought I’d make a post.
To use tensor cores effectively with mixed precision training a NVIDIA guide recommends to “pad vocabulary to be a multiple of 8”.
I’ve searched the tokenizers documentation for answers but haven’t found much luck. The closest I could find is the pp_tokenizer.vocab_size method that returns the current vocab size, but I can’t assign it a new value.
You can provide the argument pad_to_multiple_of to a tokenizer in Transformers (this is supported both for fast and slow tokenizers):
pad_to_multiple_of: (optional) Integer if set will pad the sequence to a multiple of the provided value. This is especially useful to enable the use of Tensor Core on NVIDIA hardware with compute capability
>= 7.5 (Volta).
Thanks for the reply. This method is good and useful for a sequence input, but I was more looking for something that could resize the matrices that depend on the vocab_size of a transformer.
e.g. the embedding matrix of a transformer usually takes on dimensions something like (vocab_size, 1024), where the vocab_size might be something like 52153. A matrix of this size isn’t efficient to pass onto the tensor cores, so I was looking for a way to pad it so it was a multiple of 8 (e.g. to 52160).