I get a "You have to specify either input_ids or inputs_embeds" error, but I do specify the input ids

I trained a BERT based encoder decoder model: ed_model

I tokenized the input with:

txt = "I love huggingface"
inputs = input_tokenizer(txt, return_tensors="pt").to(device)

The output clearly shows that a input_ids is the return dict

{'input_ids': tensor([[ 101, 5660, 7975, 2127, 2053, 2936, 5061,  102]], device='cuda:0'), 'token_type_ids': tensor([[0, 0, 0, 0, 0, 0, 0, 0]], device='cuda:0'), 'attention_mask': tensor([[1, 1, 1, 1, 1, 1, 1, 1]], device='cuda:0')}

But when I try to predict, I get this error:

ValueError: You have to specify either input_ids or inputs_embeds

Any ideas ?

1 Like

Does this help: ValueError: You have to specify either input_ids or inputs_embeds! · Issue #3626 · huggingface/transformers · GitHub

Yes, thank you !
Solved the issue

Do you happen to have any thoughts on this as well ?

Hi @ugoren , how did you solve this issue? I encountered the same issue trying to train the EncoderDecodeModel using the seq2seqtrainer.

Add a “decoder_” prefix

Yea, I did just that, but still got the error (transformers==4.9.2):

batch['attention_mask'] = inputs.attention_mask
batch['input_ids'] = inputs.input_ids
batch['token_type_ids'] = inputs.token_type_ids
batch["decoder_input_ids"] = outputs.input_ids.copy()
batch["labels"] = outputs.input_ids.copy()

Where outputs are from decoding the translations. I guess the error I got was something else.

I am facing the same issue.
@ugoren can you please elaborate your solution?