I performed fine tuning flan-t5-base model using peft.
here is my peft config code
peft_model = get_peft_model(original_model,
lora_config).to('cuda')
output_dir = f'/kaggle/working/peft-dialogue-summary-lora-training-{str(int(time.time()))}'
peft_training_args = TrainingArguments(
output_dir=output_dir,
per_device_train_batch_size=8,
gradient_accumulation_steps=4,
learning_rate=1e-3, # Higher learning rate than full fine-tuning.
num_train_epochs=1,
save_strategy="epoch",
logging_steps=15,
)
peft_trainer = Trainer(
model=peft_model,
args=peft_training_args,
train_dataset=tokenized_datasets["train"],
)
original_model is google/flan-t5-base
And during perf_trainer.train() error happened.
Some tensors share memory, this will lead to duplicate memory on disk and potential differences when loading them again.
Please help me!