I am running k-fold cross validation for fine-tuning a pre-trained model. I have set up my trainer with load_best_model_at_end
=True and each fold runs for 30 epochs.
-
Dose it start training - from the second fold - using the best model obtained from the previous fold (i.e. after the 30 epochs completed in the previous fold), as
load_best_model_at_end
is set to be true? -
How can disable all caches of huggingface? As I don’t want it to use any configuration, any data - such as checkpoint etc from the previous fold. My sample code snippet is as follows:
for fold in range(100):
# config
config = AutoConfig.from_pretrained(
pretained_model,
no_of_labels=no_of_labels,
label2id={label: i for i, label in enumerate(labels)},
id2label={i: label for i, label in enumerate(labels)},
)
model = SomeModelClass.from_pretrained(
pretained_model,
config=config,
ignore_mismatched_sizes=True
)
model.to(device)
training_args = TrainingArguments(
output_dir=".\checkpoints\\"
per_device_train_batch_size=64,
per_device_eval_batch_size=128,
gradient_accumulation_steps=2,
num_train_epochs = 30,
learning_rate=5e-5,
weight_decay=0.0001,
warmup_ratio=0.1,
gradient_checkpointing=True,
fp16=True,
evaluation_strategy="epoch",
save_strategy="epoch",
logging_steps=500,
report_to=["tensorboard"],
logging_dir= ".\logs\\" ,
load_best_model_at_end=True,
metric_for_best_model="accuracy",
greater_is_better=False,
push_to_hub=False,
)
trainer = Trainer(
model=model,
args=training_args,
train_dataset=train_dataset,
eval_dataset=eval_dataset,
compute_metrics=compute_metrics,
tokenizer=feature_extractor,
)
train_result = trainer.train()