Help addapting pytorch/text-classification example to t5

mirix · May 25, 2023, 7:17am

Hello,

I would like to fine-tune models from the flan-t5 family for text classification on my own data.

Being a beginner, I decided to start by running the examples provided. The example runs fine with the default pretrained model. We are still labelling our data, so right now I am focusing on switching to another model.

I have tried to adapt run_glue.py to t5 by changing the imports, namely: T5Config, T5ForConditionalGeneration, T5TokenizerFast.

However, I receive the following error:

│ /home/emoman/.local/lib/python3.10/site-packages/transformers/models/t5/modeling_t5.py:990 in    │
│ forward                                                                                          │
│                                                                                                  │
│    987 │   │   │   │   raise ValueError("You have to initialize the model with valid token embe  │
│    988 │   │   │   inputs_embeds = self.embed_tokens(input_ids)                                  │
│    989 │   │                                                                                     │
│ ❱  990 │   │   batch_size, seq_length = input_shape                                              │
│    991 │   │                                                                                     │
│    992 │   │   # required mask seq length can be calculated via length of past                   │
│    993 │   │   mask_seq_length = past_key_values[0][0].shape[2] + seq_length if past_key_values  │
╰──────────────────────────────────────────────────────────────────────────────────────────────────╯
ValueError: not enough values to unpack (expected 2, got 1)

Any ideas?

This is my command:

export TASK_NAME=sst2

python run_glue_t5.py \
--model_name_or_path 'google/flan-t5-base' \
--task_name $TASK_NAME \
--do_train \
--do_eval \
--max_seq_length 128 \
--per_device_train_batch_size 32 \
--learning_rate 2e-5 \
--num_train_epochs 2 \
--output_dir /tmp/$TASK_NAME/ \
--evaluation_strategy steps \
--save_total_limit 1 \
--load_best_model_at_end True \
--overwrite_output_dir \
--optim adamw_torch \
--use_ipex \
--jit_mode_eval \
--no_cuda

Best regards,

Ed

mirix · May 25, 2023, 7:40am

The full trace:

╭─────────────────────────────── Traceback (most recent call last) ────────────────────────────────╮
│ /home/emoman/Work/exploration/ipex/transformers/examples/pytorch/text-classification/run_glue_t5 │
│ .py:623 in <module>                                                                              │
│                                                                                                  │
│   620                                                                                            │
│   621                                                                                            │
│   622 if __name__ == "__main__":                                                                 │
│ ❱ 623 │   main()                                                                                 │
│   624                                                                                            │
│                                                                                                  │
│ /home/emoman/Work/exploration/ipex/transformers/examples/pytorch/text-classification/run_glue_t5 │
│ .py:531 in main                                                                                  │
│                                                                                                  │
│   528 │   │   │   checkpoint = training_args.resume_from_checkpoint                              │
│   529 │   │   elif last_checkpoint is not None:                                                  │
│   530 │   │   │   checkpoint = last_checkpoint                                                   │
│ ❱ 531 │   │   train_result = trainer.train(resume_from_checkpoint=checkpoint)                    │
│   532 │   │   metrics = train_result.metrics                                                     │
│   533 │   │   max_train_samples = (                                                              │
│   534 │   │   │   data_args.max_train_samples if data_args.max_train_samples is not None else    │
│                                                                                                  │
│ /home/emoman/.local/lib/python3.10/site-packages/transformers/trainer.py:1664 in train           │
│                                                                                                  │
│   1661 │   │   inner_training_loop = find_executable_batch_size(                                 │
│   1662 │   │   │   self._inner_training_loop, self._train_batch_size, args.auto_find_batch_size  │
│   1663 │   │   )                                                                                 │
│ ❱ 1664 │   │   return inner_training_loop(                                                       │
│   1665 │   │   │   args=args,                                                                    │
│   1666 │   │   │   resume_from_checkpoint=resume_from_checkpoint,                                │
│   1667 │   │   │   trial=trial,                                                                  │
│                                                                                                  │
│ /home/emoman/.local/lib/python3.10/site-packages/transformers/trainer.py:1940 in                 │
│ _inner_training_loop                                                                             │
│                                                                                                  │
│   1937 │   │   │   │   │   with model.no_sync():                                                 │
│   1938 │   │   │   │   │   │   tr_loss_step = self.training_step(model, inputs)                  │
│   1939 │   │   │   │   else:                                                                     │
│ ❱ 1940 │   │   │   │   │   tr_loss_step = self.training_step(model, inputs)                      │
│   1941 │   │   │   │                                                                             │
│   1942 │   │   │   │   if (                                                                      │
│   1943 │   │   │   │   │   args.logging_nan_inf_filter                                           │
│                                                                                                  │
│ /home/emoman/.local/lib/python3.10/site-packages/transformers/trainer.py:2735 in training_step   │
│                                                                                                  │
│   2732 │   │   │   return loss_mb.reduce_mean().detach().to(self.args.device)                    │
│   2733 │   │                                                                                     │
│   2734 │   │   with self.compute_loss_context_manager():                                         │
│ ❱ 2735 │   │   │   loss = self.compute_loss(model, inputs)                                       │
│   2736 │   │                                                                                     │
│   2737 │   │   if self.args.n_gpu > 1:                                                           │
│   2738 │   │   │   loss = loss.mean()  # mean() to average on multi-gpu parallel training        │
│                                                                                                  │
│ /home/emoman/.local/lib/python3.10/site-packages/transformers/trainer.py:2767 in compute_loss    │
│                                                                                                  │
│   2764 │   │   │   labels = inputs.pop("labels")                                                 │
│   2765 │   │   else:                                                                             │
│   2766 │   │   │   labels = None                                                                 │
│ ❱ 2767 │   │   outputs = model(**inputs)                                                         │
│   2768 │   │   # Save past state if it exists                                                    │
│   2769 │   │   # TODO: this needs to be fixed and made cleaner later.                            │
│   2770 │   │   if self.args.past_index >= 0:                                                     │
│                                                                                                  │
│ /home/emoman/.local/lib/python3.10/site-packages/torch/nn/modules/module.py:1501 in _call_impl   │
│                                                                                                  │
│   1498 │   │   if not (self._backward_hooks or self._backward_pre_hooks or self._forward_hooks   │
│   1499 │   │   │   │   or _global_backward_pre_hooks or _global_backward_hooks                   │
│   1500 │   │   │   │   or _global_forward_hooks or _global_forward_pre_hooks):                   │
│ ❱ 1501 │   │   │   return forward_call(*args, **kwargs)                                          │
│   1502 │   │   # Do not call functions when jit is used                                          │
│   1503 │   │   full_backward_hooks, non_full_backward_hooks = [], []                             │
│   1504 │   │   backward_pre_hooks = []                                                           │
│                                                                                                  │
│ /home/emoman/.local/lib/python3.10/site-packages/transformers/models/t5/modeling_t5.py:1720 in   │
│ forward                                                                                          │
│                                                                                                  │
│   1717 │   │   │   │   decoder_attention_mask = decoder_attention_mask.to(self.decoder.first_de  │
│   1718 │   │                                                                                     │
│   1719 │   │   # Decode                                                                          │
│ ❱ 1720 │   │   decoder_outputs = self.decoder(                                                   │
│   1721 │   │   │   input_ids=decoder_input_ids,                                                  │
│   1722 │   │   │   attention_mask=decoder_attention_mask,                                        │
│   1723 │   │   │   inputs_embeds=decoder_inputs_embeds,                                          │
│                                                                                                  │
│ /home/emoman/.local/lib/python3.10/site-packages/torch/nn/modules/module.py:1501 in _call_impl   │
│                                                                                                  │
│   1498 │   │   if not (self._backward_hooks or self._backward_pre_hooks or self._forward_hooks   │
│   1499 │   │   │   │   or _global_backward_pre_hooks or _global_backward_hooks                   │
│   1500 │   │   │   │   or _global_forward_hooks or _global_forward_pre_hooks):                   │
│ ❱ 1501 │   │   │   return forward_call(*args, **kwargs)                                          │
│   1502 │   │   # Do not call functions when jit is used                                          │
│   1503 │   │   full_backward_hooks, non_full_backward_hooks = [], []                             │
│   1504 │   │   backward_pre_hooks = []                                                           │
│                                                                                                  │
│ /home/emoman/.local/lib/python3.10/site-packages/transformers/models/t5/modeling_t5.py:990 in    │
│ forward                                                                                          │
│                                                                                                  │
│    987 │   │   │   │   raise ValueError("You have to initialize the model with valid token embe  │
│    988 │   │   │   inputs_embeds = self.embed_tokens(input_ids)                                  │
│    989 │   │                                                                                     │
│ ❱  990 │   │   batch_size, seq_length = input_shape                                              │
│    991 │   │                                                                                     │
│    992 │   │   # required mask seq length can be calculated via length of past                   │
│    993 │   │   mask_seq_length = past_key_values[0][0].shape[2] + seq_length if past_key_values  │
╰──────────────────────────────────────────────────────────────────────────────────────────────────╯
ValueError: not enough values to unpack (expected 2, got 1)

Could the issue be that the decoder expect a different format?

mirix · May 25, 2023, 12:07pm

I switched to DataCollatorForSeq2Seq and now I get the following error:

TypeError: T5ForConditionalGeneration.forward() got an unexpected keyword argument 'label'

I believe this to be a show stopper. I guess I will have to build the whole thing from scratch rather than try and adapt this script.

mirix · May 25, 2023, 1:14pm

It seems to be working! It will take several hours on my hardware but at least there are no errors so far.

The key was to use the class T5ForSequenceClassification from here:

With a bit of hacking like replacing “labels” with “label”.

mirix · May 25, 2023, 1:16pm

Also:

from transformers import (
    T5Config,
    T5TokenizerFast,
    DataCollatorForSeq2Seq,
    EvalPrediction,
    HfArgumentParser,
    PretrainedConfig,
    Trainer,
    TrainingArguments,
    set_seed,
)

from t5_extra_models import T5ForSequenceClassification

And the corresponding replacements in the code.

Topic		Replies	Views
Errors when fine-tuning T5 Beginners	7	6548	January 3, 2022
How to fine-tune T5-base model? Beginners	10	4610	July 28, 2021
Finetuning T5 on custom data Models	0	1069	November 13, 2020
Training T5 on mlm task from scratch 🤗Transformers	4	3292	July 29, 2022
Example of how to pretrain T5? 🤗Transformers	15	16165	March 16, 2023

Help addapting pytorch/text-classification example to t5

Related topics