BioGPT error with right padding

sidpillai11 · August 7, 2023, 11:14pm

When I import the BioGPT tokenizer, why is padding_side right by default, even though BioGPT architecture is decoder-only? When I try to pass batch inputs, PyTorch throws a warning saying generations may be faulty if right-padding is used.
This is what the HuggingFace docs say: BioGPT is a model with absolute position embeddings so it’s usually advised to pad the inputs on the right rather than the left.

This is the warning message printed :
example_inputs = [‘the worst heart problems are’, 'epiglottis cancer with a ']
example_inputs_tokenized = tokenizer(example_inputs, padding =True, return_tensors=“pt”).to(device)
example_outputs = base_model.generate(**example_inputs_tokenized, max_length = 100)
tokenizer.batch_decode(example_outputs, skip_special_tokens= True)

output →
A decoder-only architecture is being used, but right-padding was detected! For correct generation results, please set padding_side='left' when initializing the tokenizer.
[‘the worst heart problems are.’,
‘epiglottis cancer with a review of the literature.’]

Topic		Replies	Views
Padding side in instruction fine-tuning using SFTT 🤗Transformers	1	1556	December 9, 2024
How to set the padding configuration with Huggingface's GenerateMixin's generate method? Intermediate	7	11254	September 26, 2023
The effect of padding_side 🤗Transformers	13	15235	May 27, 2025
Qwen 'padding_side = right' problem Models	2	854	April 25, 2025
Training tokenizers with padding in between tokens 🤗Tokenizers	0	381	October 19, 2023

BioGPT error with right padding

Related topics