How to use Seq2SeqTrainer (Seq2SeqDataCollator) in v4.2.1


I’d like to update my training script using Seq2SeqTrainer to match the newest version, v4.2.1.

My code worked with v3.5.1.
However, when I update it, it doesn’t work with v4.2.1.

It is said that ValueError occurs.

  File "/****/", line 193, in compute_loss
    loss, _ = self._compute_loss(model, inputs, labels)
  File "/****/", line 180, in _compute_loss
    loss = self.loss_fn(logits.view(-1, logits.shape[-1]), labels.view(-1))
ValueError: Expected input batch_size (464) to match target batch_size (480).

I tried print debug,

    def _compute_loss(self, model, inputs, labels):
        if self.args.label_smoothing == 0:
            if self.data_args is not None and self.data_args.ignore_pad_token_for_loss:
                # force training to ignore pad token
                logits = model(**inputs, use_cache=False)[0]


                loss = self.loss_fn(logits.view(-1, logits.shape[-1]), labels.view(-1))

and got:

torch.Size([8, 58])
torch.Size([8, 58, 50266])
torch.Size([8, 60])

(I added my own special token, so the embedding size becomes 50266)

Am I forgetting to do the necessary processing when updating the file to fit the new version?

In the Seq2SeqDataCollator, it seems that shift_tokens_right, which was imported from transformers.models.bart.modeling_bart is no longer needed.
I update my own DataCollator on the basis of this new Seq2SeqDataCollator, and I think something I’m misunderstanding is related to here.

Thank you in advance.

I lowered the version from 4.2.1 to 4.1.1 and reverted to the version that has shift_tokens_right in Seq2SeqDataCollator.
I revert my own DataCollator to the old version, then, apparently, the above problem no longer occurs.

Are there any tips to make my own DataCollator for Seq2SeqTrainer in v4.2.1?

Thank you.

Hi @yusukemori

We are in the process of re-writing the seq2seq fine-tuning scripts, and Seq2SeqDataCollator will probably be deprecated, so I would wait to adapt the code to the latest version, you could check out this PR, if you want to try the new script New run_seq2seq script by sgugger · Pull Request #9605 · huggingface/transformers · GitHub

Hi @valhalla

Thank you for the detailed answer.
I now understand that you are in the process of re-writing the seq2seq fine-tuning scripts and adapting Seq2SeqDataCollator to the latest version is in a waiting state.

I think I’ll use the conventional script for now, but I’d love to check out the PR.

The PR has been merged, so you should be able to use a similar workflow. Note that the processing that used to be done in Seq2SeqDataCollator is now done on the dataset directly.

Hi @sgugger

Thank you for giving me the information!
I’ll check the PR and closely look at the changes about where processing is done.