Training deit with new size of image

yanagar25 · July 25, 2024, 8:50am

Hi,
I am trying to train deit for image classification with different size of images.
Everything works fine but when the training begins i am getting that grad_norm is nan and the loss is 0.0
any suggestions why it happens?
I’m using Trainer with the following TrainingArguments, i tried gradient_accumulation_steps=4 as well
I tried setting bfp16=True, my gpu doesn’t support it. I tried also setting fp16=False and still the grad_norm is nan and loss is 0.0

args = TrainingArguments(
	f"{model_name}-finetuned",
	remove_unused_columns=False,
	evaluation_strategy = "epoch",
	save_strategy = "epoch",
	fp16=True,
	learning_rate=1e-6,
    gradient_accumulation_steps=1,
	per_device_train_batch_size=bs,
	per_device_eval_batch_size=bs,
	num_train_epochs=3,
	logging_steps=1,
	load_best_model_at_end=True,
	metric_for_best_model="accuracy",
	push_to_hub=False,
)

trainer = Trainer(
    model=model,
    args=args,
    data_collator=collate_fn,
    compute_metrics=compute_metrics,
    train_dataset=dataset["train"],
    eval_dataset=dataset["test"],
    tokenizer=image_processor,
)

Topic		Replies	Views
Training Loss 0.0000 and Validation Loss nan Intermediate	2	143	March 12, 2025
Trainer API object detection 🤗Transformers	2	44	December 29, 2024
Loss not Decreasing: Hiera MAE Pretraining from Scratch 🤗Transformers	0	27	January 6, 2025
Trainer.train() Models	0	216	October 23, 2023
UserWarning: Was asked to gather along dimension 0, but all input tensors were scalars; will instead unsqueeze and return a vector when running trainer Beginners	2	1457	September 10, 2024

Training deit with new size of image

Related topics