When I import the BioGPT tokenizer, why is padding_side right by default, even though BioGPT architecture is decoder-only? When I try to pass batch inputs, PyTorch throws a warning saying generations may be faulty if right-padding is used.
This is what the HuggingFace docs say: BioGPT is a model with absolute position embeddings so itās usually advised to pad the inputs on the right rather than the left.
This is the warning message printed :
example_inputs = [āthe worst heart problems areā, 'epiglottis cancer with a ']
example_inputs_tokenized = tokenizer(example_inputs, padding =True, return_tensors=āptā).to(device)
example_outputs = base_model.generate(**example_inputs_tokenized, max_length = 100)
tokenizer.batch_decode(example_outputs, skip_special_tokens= True)
output ā
A decoder-only architecture is being used, but right-padding was detected! For correct generation results, please set padding_side='left'
when initializing the tokenizer.
[āthe worst heart problems are.ā,
āepiglottis cancer with a review of the literature.ā]