Evaluation stuck at 0% when trying to finetune OD model

jb-bc · September 3, 2024, 12:09am

I’m trying to base a training loop using this script but I get the following issue:

50%|██████████████████████████████████████████████████████████████████████████████████▌                                                                                  | 77/154 [00:52<00:43,  1.77it/s][INFO|trainer.py:3829] 2024-09-03 02:07:25,225 >> 
***** Running Evaluation *****
[INFO|trainer.py:3831] 2024-09-03 02:07:25,225 >>   Num examples = 305
[INFO|trainer.py:3834] 2024-09-03 02:07:25,225 >>   Batch size = 8

  0%|                                                                                                                                                                               | 0/20 [00:00<?, ?it/s]

Where after the first epoch, the evaluation loop tries to start but it stays at 0% without any movement and not even an error. The GPU usage meanwhile is at 100% while this is happening, and the memory is also being consumed.

Is there a way to add even more debug logs, or has someone come across this issue before please? Thanks.

Topic		Replies	Views
Can't use multi GPU in evaluation from Trainer 🤗Transformers	3	929	December 6, 2023
Trainer.evaluate() freezing 🤗Transformers	3	499	August 23, 2024
Potential bug in the rt-detr v2 fine tune script 🤗Transformers	3	249	February 27, 2025
Trainer.evaluate() is freezing Beginners	5	787	February 3, 2025
All GPUs at 100% except GPU0 at 0%? 🤗Transformers	0	31	November 25, 2024

Evaluation stuck at 0% when trying to finetune OD model

Related topics