How to use Seq2SeqTrainer (Seq2SeqDataCollator) in v4.2.1

yusukemori · January 17, 2021, 9:27am

Hello,

I’d like to update my training script using Seq2SeqTrainer to match the newest version, v4.2.1.

My code worked with v3.5.1.
However, when I update it, it doesn’t work with v4.2.1.

It is said that ValueError occurs.

  File "/****/seq2seq_trainer.py", line 193, in compute_loss
    loss, _ = self._compute_loss(model, inputs, labels)
  File "/****/seq2seq_trainer.py", line 180, in _compute_loss
    loss = self.loss_fn(logits.view(-1, logits.shape[-1]), labels.view(-1))
ValueError: Expected input batch_size (464) to match target batch_size (480).

I tried print debug,
inserted:

    def _compute_loss(self, model, inputs, labels):
        if self.args.label_smoothing == 0:
            if self.data_args is not None and self.data_args.ignore_pad_token_for_loss:
                # force training to ignore pad token
                logits = model(**inputs, use_cache=False)[0]

                print(inputs["input_ids"].shape)
                print(logits.shape)
                print(labels.shape)

                loss = self.loss_fn(logits.view(-1, logits.shape[-1]), labels.view(-1))

and got:

torch.Size([8, 58])
torch.Size([8, 58, 50266])
torch.Size([8, 60])

(I added my own special token, so the embedding size becomes 50266)

Am I forgetting to do the necessary processing when updating the file to fit the new version?

In the Seq2SeqDataCollator, it seems that shift_tokens_right, which was imported from transformers.models.bart.modeling_bart is no longer needed.
I update my own DataCollator on the basis of this new Seq2SeqDataCollator, and I think something I’m misunderstanding is related to here.

Thank you in advance.

yusukemori · January 17, 2021, 10:57am

I lowered the version from 4.2.1 to 4.1.1 and reverted to the version that has shift_tokens_right in Seq2SeqDataCollator.
I revert my own DataCollator to the old version, then, apparently, the above problem no longer occurs.

Are there any tips to make my own DataCollator for Seq2SeqTrainer in v4.2.1?

Thank you.

valhalla · January 19, 2021, 6:06am

Hi @yusukemori

We are in the process of re-writing the seq2seq fine-tuning scripts, and Seq2SeqDataCollator will probably be deprecated, so I would wait to adapt the code to the latest version, you could check out this PR, if you want to try the new script New run_seq2seq script by sgugger · Pull Request #9605 · huggingface/transformers · GitHub

yusukemori · January 19, 2021, 7:12am

Hi @valhalla

Thank you for the detailed answer.
I now understand that you are in the process of re-writing the seq2seq fine-tuning scripts and adapting Seq2SeqDataCollator to the latest version is in a waiting state.

I think I’ll use the conventional script for now, but I’d love to check out the PR.

sgugger · January 19, 2021, 9:15pm

The PR has been merged, so you should be able to use a similar workflow. Note that the processing that used to be done in Seq2SeqDataCollator is now done on the dataset directly.

yusukemori · January 20, 2021, 3:47am

Hi @sgugger

Thank you for giving me the information!
I’ll check the PR and closely look at the changes about where processing is done.

Topic		Replies	Views
Error in Seq2SeqTrainingArguments 🤗Transformers	3	940	May 30, 2023
Seq2SeqTrainer: enabled must be a bool (got NoneType) 🤗Transformers	15	3955	December 5, 2022
Trainer code for token-wise prediction model Intermediate	0	436	June 6, 2022
Reshaping logits when using Trainer Beginners	1	5336	May 23, 2022
Custom Training Loss Function for Seq2Seq BART Beginners	1	1728	July 21, 2023

How to use Seq2SeqTrainer (Seq2SeqDataCollator) in v4.2.1

Related topics