When i use 'transformers.TrainingArguments', i got loss errors

Hexixi · April 26, 2023, 8:55am

when i use ‘transformers.TrainingArguments’ and set (evaluation_strategy=“steps”,save_strategy=“steps”, eval_steps=200,) , i got loss errors.
my code:

    trainer = ModifiedTrainer(
    model=model,
    train_dataset=train_data,
    eval_dataset =val_data, 

    args=transformers.TrainingArguments(
        per_device_train_batch_size=4,
        gradient_accumulation_steps=1,
        # warmup_steps=100,
        num_train_epochs=4,
        learning_rate=2e-6,
        fp16=True,
        logging_steps=10,
        optim="adamw_torch",
        # evaluation_strategy="steps",
        save_strategy="steps",
        # eval_steps=200,
        save_steps=200,
        # load_best_model_at_end=True,
        output_dir='./fenlei',
        save_total_limit=2,
        # load_best_model_at_end=True,
        # remove_unused_columns=True,
    ),
    callbacks=[TensorBoardCallback(writer)],
    data_collator=data_collator,)

it could be:
│ ❱ 230 │ trainer.train() │
│ 231 │ │
│ 232 │ writer.close() │
│ 233 │ # save model │
│ │
│ /home/hexinyu/miniconda3/envs/nlp/lib/python3.10/site-packages/transformers/trainer.py:1662 in │
│ train │
│ │
│ 1659 │ │ inner_training_loop = find_executable_batch_size( │
│ 1660 │ │ │ self._inner_training_loop, self._train_batch_size, args.auto_find_batch_size │
│ 1661 │ │ ) │
│ ❱ 1662 │ │ return inner_training_loop( │
│ 1663 │ │ │ args=args, │
│ 1664 │ │ │ resume_from_checkpoint=resume_from_checkpoint, │
│ 1665 │ │ │ trial=trial, │
│ │
│ /home/hexinyu/miniconda3/envs/nlp/lib/python3.10/site-packages/transformers/trainer.py:2006 in │
│ _inner_training_loop │
│ │
│ 2003 │ │ │ │ │ self.state.epoch = epoch + (step + 1 + steps_skipped) / steps_in_epo │
│ 2004 │ │ │ │ │ self.control = self.callback_handler.on_step_end(args, self.state, s │
│ 2005 │ │ │ │ │ │
│ ❱ 2006 │ │ │ │ │ self._maybe_log_save_evaluate(tr_loss, model, trial, epoch, ignore_k │
│ 2007 │ │ │ │ else: │
│ 2008 │ │ │ │ │ self.control = self.callback_handler.on_substep_end(args, self.state │
│ 2009 │
│ │
│ /home/hexinyu/miniconda3/envs/nlp/lib/python3.10/site-packages/transformers/trainer.py:2287 in │
│ _maybe_log_save_evaluate │
│ │
│ 2284 │ │ │ │ │ ) │
│ 2285 │ │ │ │ │ metrics.update(dataset_metrics) │
│ 2286 │ │ │ else: │
│ ❱ 2287 │ │ │ │ metrics = self.evaluate(ignore_keys=ignore_keys_for_eval) │
│ 2288 │ │ │ self._report_to_hp_search(trial, self.state.global_step, metrics) │
│ 2289 │ │ │
│ 2290 │ │ if self.control.should_save: │
│ │
│ /home/hexinyu/miniconda3/envs/nlp/lib/python3.10/site-packages/transformers/trainer.py:2995 in │
│ evaluate │
│ │
│ 2992 │ │ start_time = time.time() │
│ 2993 │ │ │
│ 2994 │ │ eval_loop = self.prediction_loop if self.args.use_legacy_prediction_loop else se │
│ ❱ 2995 │ │ output = eval_loop( │
│ 2996 │ │ │ eval_dataloader, │
│ 2997 │ │ │ description=“Evaluation”, │
│ 2998 │ │ │ # No point gathering the predictions if there are no metrics, otherwise we d │
│ │
│ /home/hexinyu/miniconda3/envs/nlp/lib/python3.10/site-packages/transformers/trainer.py:3176 in │
│ evaluation_loop │
│ │
│ 3173 │ │ │ │ │ batch_size = observed_batch_size │
│ 3174 │ │ │ │
│ 3175 │ │ │ # Prediction step │
│ ❱ 3176 │ │ │ loss, logits, labels = self.prediction_step(model, inputs, prediction_loss_o │
│ 3177 │ │ │ inputs_decode = self._prepare_input(inputs[“input_ids”]) if args.include_inp │
│ 3178 │ │ │ │
│ 3179 │ │ │ if is_torch_tpu_available(): │
│ │
│ /home/hexinyu/miniconda3/envs/nlp/lib/python3.10/site-packages/transformers/trainer.py:3431 in │
│ prediction_step │
│ │
│ 3428 │ │ │ else: │
│ 3429 │ │ │ │ if has_labels or loss_without_labels: │
│ 3430 │ │ │ │ │ with self.compute_loss_context_manager(): │
│ ❱ 3431 │ │ │ │ │ │ loss, outputs = self.compute_loss(model, inputs, return_outputs= │
│ 3432 │ │ │ │ │ │ # loss= self.compute_loss(model, inputs, return_outputs=True) │
│ 3433 │ │ │ │ │ │
│ 3434 │ │ │ │ │ loss = loss.mean().detach() │
│ │
│ /home/hexinyu/miniconda3/envs/nlp/lib/python3.10/site-packages/torch/_tensor.py:930 in iter │
│ │
│ 927 │ │ # NB: We have intentionally skipped torch_function dispatch here. │
│ 928 │ │ # See gh-54457 │
│ 929 │ │ if self.dim() == 0: │
│ ❱ 930 │ │ │ raise TypeError(“iteration over a 0-d tensor”) │
│ 931 │ │ if torch._C._get_tracing_state(): │
│ 932 │ │ │ warnings.warn( │
│ 933 │ │ │ │ "Iterating over a tensor might cause the trace to be incorrect. " │
╰──────────────────────────────────────────────────────────────────────────────────────────────────╯
TypeError: iteration over a 0-d tensor

zhangsongbo365 · April 28, 2023, 5:45am

I have the same problem. I have tried 4.27.1 and 4.28.1, also the same error.

Topic		Replies	Views
Key Error 'loss' while fine tuning GPT-2 with the Trainer utility 🤗Transformers	9	7466	May 10, 2022
Specify Loss for Trainer / TrainingArguments 🤗Transformers	5	21396	October 5, 2021
Logs of training and validation loss Beginners	10	32592	February 14, 2025
Implementing a Trainer with custom loss produces key error 🤗Accelerate	2	3114	April 30, 2023
No log for validation loss in trainer.train() Beginners	4	6100	April 13, 2024

When i use 'transformers.TrainingArguments', i got loss errors

Related topics