Beginner here. I would like to know how I can batch-generate with a decoder where the input_ids are initialized with prompts of variable lengths?
I tokenized the prompts with padding=True, but the generate( ) function does not seem to consider the attention_mask provided. I have tried using beam search and sample and neither has worked.
text_decoder = BertLMHeadModel(config=my_config) # decoder tokenizer = BertTokenizer.from_pretrained('bert-base-uncased') prompts = custom_prompts # list of strings with variable lengths tokenized = self.tokenizer(prompt, padding=True, return_tensors="pt") input_ids = tokenized.input_ids att_msk = tokenized.attention_mask outputs = text_decoder.generate( input_ids=input_ids, attention_mask = att_msk, max_length=30, min_length=5, do_sample=True, encoder_hidden_states=image_embeds, # size(0) == len(prompts) encoder_attention_mask=image_atts, # size(0) == len(prompts) top_p=0.9, num_return_sequences=1, eos_token_id=self.tokenizer.sep_token_id, pad_token_id=self.tokenizer.pad_token_id, repetition_penalty=1.1, **model_kwargs)
As shown in the outputs below, the generated sequences are appended after the padded tokens, while I need the decoder to only look at the sequence before the padding when predicting outputs.
Can someone please give me some advice on how to achieve this or is there already a function to do this? Thank you very much!