BioGPT error with right padding

When I import the BioGPT tokenizer, why is padding_side right by default, even though BioGPT architecture is decoder-only? When I try to pass batch inputs, PyTorch throws a warning saying generations may be faulty if right-padding is used.
This is what the HuggingFace docs say: BioGPT is a model with absolute position embeddings so itā€™s usually advised to pad the inputs on the right rather than the left.

This is the warning message printed :
example_inputs = [ā€˜the worst heart problems areā€™, 'epiglottis cancer with a ']
example_inputs_tokenized = tokenizer(example_inputs, padding =True, return_tensors=ā€œptā€).to(device)
example_outputs = base_model.generate(**example_inputs_tokenized, max_length = 100)
tokenizer.batch_decode(example_outputs, skip_special_tokens= True)

output ā†’
A decoder-only architecture is being used, but right-padding was detected! For correct generation results, please set padding_side='left' when initializing the tokenizer.
[ā€˜the worst heart problems are.ā€™,
ā€˜epiglottis cancer with a review of the literature.ā€™]

2 Likes