Decoder generate with prompts of variable lengths?

Hi,

Beginner here. I would like to know how I can batch-generate with a decoder where the input_ids are initialized with prompts of variable lengths?

For example:
I tokenized the prompts with padding=True, but the generate( ) function does not seem to consider the attention_mask provided. I have tried using beam search and sample and neither has worked.

text_decoder = BertLMHeadModel(config=my_config)  # decoder
tokenizer = BertTokenizer.from_pretrained('bert-base-uncased')

prompts = custom_prompts # list of strings with variable lengths
tokenized = self.tokenizer(prompt, padding=True, return_tensors="pt")
input_ids = tokenized.input_ids
att_msk = tokenized.attention_mask

outputs = text_decoder.generate(
            input_ids=input_ids,
            attention_mask = att_msk,
            max_length=30,
            min_length=5,
            do_sample=True,
            encoder_hidden_states=image_embeds, # size(0) == len(prompts)
            encoder_attention_mask=image_atts, # size(0) == len(prompts)
            top_p=0.9,
            num_return_sequences=1,
            eos_token_id=self.tokenizer.sep_token_id,
            pad_token_id=self.tokenizer.pad_token_id, 
            repetition_penalty=1.1,                                            
            **model_kwargs)

As shown in the outputs below, the generated sequences are appended after the padded tokens, while I need the decoder to only look at the sequence before the padding when predicting outputs.


Can someone please give me some advice on how to achieve this or is there already a function to do this? Thank you very much!