Fine-tuning: not enough values to unpack (expected 2, got 1)

alighasemi · December 4, 2022, 7:00pm

Hi,

I’m trying to fine-tune erfan226/persian-t5-paraphraser paraphrase generator model for Persian sentences. I used the Persian dataset of tapaco and reformatted it to match the glue (mrpc) dataset which is used in the fine-tuning documentation. I have uploaded my dataset in alighasemi/farsi_paraphrase_detection.

I followed every step of the Trainer video (Tokenization, ComputerMetrics, TrainingArgs, …). However, when I run trainer.train() I get the following error:

---------------------------------------------------------------------------
ValueError                                Traceback (most recent call last)
<ipython-input-28-3435b262f1ae> in <module>
----> 1 trainer.train()

7 frames
/usr/local/lib/python3.8/dist-packages/transformers/models/t5/modeling_t5.py in forward(self, input_ids, attention_mask, encoder_hidden_states, encoder_attention_mask, inputs_embeds, head_mask, cross_attn_head_mask, past_key_values, use_cache, output_attentions, output_hidden_states, return_dict)
    941             inputs_embeds = self.embed_tokens(input_ids)
    942 
--> 943         batch_size, seq_length = input_shape
    944 
    945         # required mask seq length can be calculated via length of past

ValueError: not enough values to unpack (expected 2, got 1)

Here’s my code:

dataset = load_dataset("alighasemi/farsi_paraphrase_detection")

tokenizer = AutoTokenizer.from_pretrained("erfan226/persian-t5-paraphraser")

def tokenize_function(examples):
    return tokenizer(examples["sentence1"], examples["sentence2"], truncation=True)

tokenized_dataset = dataset.map(tokenize_function, batched=True)
data_collator = DataCollatorWithPadding(tokenizer)

model = AutoModelForSeq2SeqLM.from_pretrained("erfan226/persian-t5-paraphraser", num_labels=2)

I could not use the AutoModelForSequenceClassification to load the model since I would get this error:

---------------------------------------------------------------------------
ValueError                                Traceback (most recent call last)
<ipython-input-29-5650b38a14f3> in <module>
----> 1 model = AutoModelForSequenceClassification.from_pretrained("erfan226/persian-t5-paraphraser", num_labels=2)

/usr/local/lib/python3.8/dist-packages/transformers/models/auto/auto_factory.py in from_pretrained(cls, pretrained_model_name_or_path, *model_args, **kwargs)
    464                 pretrained_model_name_or_path, *model_args, config=config, **hub_kwargs, **kwargs
    465             )
--> 466         raise ValueError(
    467             f"Unrecognized configuration class {config.__class__} for this kind of AutoModel: {cls.__name__}.\n"
    468             f"Model type should be one of {', '.join(c.__name__ for c in cls._model_mapping.keys())}."

ValueError: Unrecognized configuration class <class 'transformers.models.t5.configuration_t5.T5Config'> for this kind of AutoModel: AutoModelForSequenceClassification.
Model type should be one of AlbertConfig, BartConfig, BertConfig, BigBirdConfig, BigBirdPegasusConfig, BloomConfig, CamembertConfig, CanineConfig, ConvBertConfig, CTRLConfig, Data2VecTextConfig, DebertaConfig, DebertaV2Config, DistilBertConfig, ElectraConfig, ErnieConfig, EsmConfig, FlaubertConfig, FNetConfig, FunnelConfig, GPT2Config, GPTNeoConfig, GPTJConfig, IBertConfig, LayoutLMConfig, LayoutLMv2Config, LayoutLMv3Config, LEDConfig, LiltConfig, LongformerConfig, LukeConfig, MarkupLMConfig, MBartConfig, MegatronBertConfig, MobileBertConfig, MPNetConfig, MvpConfig, NezhaConfig, NystromformerConfig, OpenAIGPTConfig, OPTConfig, PerceiverConfig, PLBartConfig, QDQBertConfig, ReformerConfig, RemBertConfig, RobertaConfig, RoCBertConfig, RoFormerConfig, SqueezeBertConfig, TapasConfig, TransfoXLConfig, XLMConfig, XLMRobertaConfig, XLMRobertaXLConfig, XLNetConfig, YosoConfig.

training_args = TrainingArguments(
    "farsi_paraphraser",
    num_train_epochs=5,
    evaluation_strategy="epoch"
)

trainer = Trainer(
    model,
    training_args,
    train_dataset=tokenized_dataset["train"],
    eval_dataset=tokenized_dataset["validation"],
    data_collator=data_collator,
    tokenizer=tokenizer,
)

trainer.train()

alighasemi · December 4, 2022, 7:03pm

@sgugger I would be thrilled if you could help me since you were teaching in the video.

Topic		Replies	Views
Not enough values to unpack (expected 2, got 1) in training IMDB dataset Models	1	894	March 2, 2022
Not enough values to unpack (expected 2, got 1) when training with T5ForConditionalGeneration Beginners	0	1320	August 24, 2022
ValueError: too many values to unpack (expected 2) or not enough values to unpack (expected 2, got 1). T5ForConditionalGeneration 🤗Transformers	0	172	May 23, 2024
ValueError: too many values to unpack (expected 2) in text summarization. Possibly due to nested lists? 🤗Transformers	1	1749	September 29, 2023
Error when Fine-tuning pretrained Masked Language Model 🤗Transformers	12	7854	March 9, 2023

Fine-tuning: not enough values to unpack (expected 2, got 1)

Related topics