How to use `inputs_embed` and `attention_mask` together?

Hi, I’m using a decoder-only model to do auto-regressive generation. E.g. Llama. The inputs_embed of different instances have different lengths, so I have to pad them to the same length within a batch as the input. How can I use something like the attention_mask to tell the model the length of the real input as it does for input_ids. I’m not able to directly input input_ids as the inputs are some soft prompts from other modalities.

A straightforward way is just to loop over the mini-batch and manually take the non-padding input embedding as input one by one. But it seems very inefficient. Is there a better way to do it? Many thanks!