Could I inference the Encoder-Decoder model without specify "decoder_input_ids"?

louis2889184 · April 27, 2021, 8:12pm

I’m using Encoder-Decoder model to train a translation task, while partial of the data are unlabeled.

For labeled data, I can use the following codes to do the inference and compute the loss,

# model is composed of EncoderDecoder architecture
# source_data and target_data are processed by tokenizer beforehand
batch = {
    "inputs_idx": source_data["inputs_idx"],
    "attention_mask": source_data["attention_mask"],
    "decoder_input_ids": target_data["inputs_idx"]
    "decoder_attention_mask": target_data["attention_mask"],
    "labels": target_data["inputs_idx"].clone()
}

output = model(**batch)
supervised_loss = output["loss"]

Besides the supervised loss, I also want to compute some unlabeled loss over the predicted logits of unlabeled source data, such as,

batch = {
    "inputs_idx": source_data["inputs_idx"],
    "attention_mask": source_data["attention_mask"],
}

output = model(**batch)

unsupervised_loss = some_loss_func(output["logits"])

However, I can not do the inference without specifying “decoder_input_ids”, the decoder will produce error about

You have to specify either input_ids or inputs_embeds

So far, I assign source_data["idx"] for decoder_input_ids to avoid the issue, but I feel like it is incorrect cause it will bring inconsistency in inference between labeled and unlabeled data. So, I am wondering how should I do inference for unlabeled data correctly.

yurii · April 29, 2021, 11:33am

Hi
during inference use output = model.generate(**batch) instead of output = model(**batch)

Also during training:
decoder_input_ids != target_data[“inputs_idx”]
labels = target_data[“inputs_idx”]
and decoder_input_ids = shift_to_right(target_data[“inputs_idx”]) - this action is performed automatically in library code, so you can simply omit decoder_input_ids argument

louis2889184 · April 30, 2021, 6:44pm

@yurii , thanks for the reply.

I think I confuse others by using the term “inference.” Here I am doing is to “forward” the model without using decoder_input_ids and labels, cause I’d like to compute some unsupervised loss on unlabeled data. Plus, I don’t want to break the auto-grad graph, so I think model.generate() is not a good choice for my case?

Could you show me where the code snippet or document about automatically doing the shift_to_right thing? I could’t find it myself. Thanks a lot.

yurii · April 30, 2021, 7:03pm

In case of training conditional model (e.g BartForConditionalGeneration), when decoder_input_ids is absent it will be created automatically by right-shift of labels:

github.com

huggingface/transformers/blob/8d43c71a1ca3ad322cc45008eb66a5611f1e017e/src/transformers/models/bart/modeling_bart.py#L1283-L1287


if labels is not None:
    if decoder_input_ids is None:
        decoder_input_ids = shift_tokens_right(
            labels, self.config.pad_token_id, self.config.decoder_start_token_id
        )

In case of training bare model (e.g BartModel), when decoder_input_ids is absent it will be created automatically by right-shift of input_ids:

github.com

huggingface/transformers/blob/8d43c71a1ca3ad322cc45008eb66a5611f1e017e/src/transformers/models/bart/modeling_bart.py#L1147-L1152


# different to other models, Bart automatically creates decoder_input_ids from
# input_ids if no decoder_input_ids are provided
if decoder_input_ids is None and decoder_inputs_embeds is None:
    decoder_input_ids = shift_tokens_right(
        input_ids, self.config.pad_token_id, self.config.decoder_start_token_id
    )

By the way, try
"inputs_ids": source_data["inputs_ids"]
instead of
"inputs_idx": source_data["inputs_idx"]

louis2889184 · May 1, 2021, 1:24am

I see. I used EncoderDecoderModel before, not the Bart Model, so there is no this feature there. Besides, their behaviors seems different. Now, I’d try to use Bart to see how it goes. I personally think EncoderDecoderModel should also be able to forward without “decoder_input_ids” and “labels”. Anyways, thank you for sharing that.

Topic		Replies	Views
The meaning of 'decoder input ids' in encoder-decoder model Beginners	1	2405	July 29, 2022
Is there a way to return the "decoder_input_ids" from "tokenizer.prepare_seq2seq_batch"? 🤗Transformers	5	3351	December 29, 2020
From Transformers Version v4.12.0 onwards, The example colab BERT2BERT is wrong. (Things to keep in mind when using from transformers import EncoderDecoderModel) 🤗Transformers	0	270	February 16, 2024
Seq2Seq Loss computation in Trainer Beginners	9	6023	October 28, 2021
T5 fine tuning, loss difference when using labels and decoder_input_ids 🤗Transformers	2	1178	October 12, 2020

Could I inference the Encoder-Decoder model without specify "decoder_input_ids"?

Related topics