How to use `inputs_embed` and `attention_mask` together?

ys-zong · June 27, 2023, 10:51am

Hi, I’m using a decoder-only model to do auto-regressive generation. E.g. Llama. The inputs_embed of different instances have different lengths, so I have to pad them to the same length within a batch as the input. How can I use something like the attention_mask to tell the model the length of the real input as it does for input_ids. I’m not able to directly input input_ids as the inputs are some soft prompts from other modalities.

A straightforward way is just to loop over the mini-batch and manually take the non-padding input embedding as input one by one. But it seems very inefficient. Is there a better way to do it? Many thanks!

ggcristian · May 19, 2024, 8:49am

No updates?

Topic		Replies	Views
Does attention_mask refer to input_ids or to labels? Beginners	7	31	June 19, 2025
Do automatically generated attention masks ignore padding? 🤗Transformers	4	16486	March 8, 2022
BertForPretraining hidden_states extraction with input embeddings as inputs Models	0	397	June 4, 2022
How to generate a sequence using inputs_embeds instead of input_ids? 🤗Transformers	4	8444	April 17, 2022
Adding attention mask into MLM 🤗Transformers	1	357	May 30, 2024

How to use `inputs_embed` and `attention_mask` together?

Related topics