Could I inference the Encoder-Decoder model without specify "decoder_input_ids"?

I’m using Encoder-Decoder model to train a translation task, while partial of the data are unlabeled.

For labeled data, I can use the following codes to do the inference and compute the loss,

# model is composed of EncoderDecoder architecture
# source_data and target_data are processed by tokenizer beforehand
batch = {
    "inputs_idx": source_data["inputs_idx"],
    "attention_mask": source_data["attention_mask"],
    "decoder_input_ids": target_data["inputs_idx"]
    "decoder_attention_mask": target_data["attention_mask"],
    "labels": target_data["inputs_idx"].clone()
}

output = model(**batch)
supervised_loss = output["loss"]

Besides the supervised loss, I also want to compute some unlabeled loss over the predicted logits of unlabeled source data, such as,

batch = {
    "inputs_idx": source_data["inputs_idx"],
    "attention_mask": source_data["attention_mask"],
}

output = model(**batch)

unsupervised_loss = some_loss_func(output["logits"])

However, I can not do the inference without specifying “decoder_input_ids”, the decoder will produce error about

You have to specify either input_ids or inputs_embeds

So far, I assign source_data["idx"] for decoder_input_ids to avoid the issue, but I feel like it is incorrect cause it will bring inconsistency in inference between labeled and unlabeled data. So, I am wondering how should I do inference for unlabeled data correctly.

Hi
during inference use output = model.generate(**batch) instead of output = model(**batch)

Also during training:
decoder_input_ids != target_data[“inputs_idx”]
labels = target_data[“inputs_idx”]
and decoder_input_ids = shift_to_right(target_data[“inputs_idx”]) - this action is performed automatically in library code, so you can simply omit decoder_input_ids argument

@yurii , thanks for the reply.

I think I confuse others by using the term “inference.” Here I am doing is to “forward” the model without using decoder_input_ids and labels, cause I’d like to compute some unsupervised loss on unlabeled data. Plus, I don’t want to break the auto-grad graph, so I think model.generate() is not a good choice for my case?

Could you show me where the code snippet or document about automatically doing the shift_to_right thing? I could’t find it myself. Thanks a lot.

In case of training conditional model (e.g BartForConditionalGeneration), when decoder_input_ids is absent it will be created automatically by right-shift of labels:

In case of training bare model (e.g BartModel), when decoder_input_ids is absent it will be created automatically by right-shift of input_ids:

By the way, try
"inputs_ids": source_data["inputs_ids"]
instead of
"inputs_idx": source_data["inputs_idx"]

1 Like

I see. I used EncoderDecoderModel before, not the Bart Model, so there is no this feature there. Besides, their behaviors seems different. Now, I’d try to use Bart to see how it goes. I personally think EncoderDecoderModel should also be able to forward without “decoder_input_ids” and “labels”. Anyways, thank you for sharing that.