Hey everyone,
I am a bit unsure how to proceed regarding the mentioned topic.
The baseline is a model created via Huggingface’s library as an AutoModelForCausalLM model, PEFT and a LoRA approach with subsequent merging of the weights.
I now want to further fine tune the model without losing its original properties - in this case via instruction fine tuning or prefix tuning.
My approach would be the following:
model = AutoModelForCausalLM.from_pretrained(
model_id,
use_cache=False if gradient_checkpointing else True
device_map="auto",
load_in_8bit=True,
)
model = create_peft_config(model)
output_dir = "/tmp"
training_args = TrainingArguments(
output_dir=output_dir,
overwrite_output_dir=True,
per_device_train_batch_size=per_device_train_batch_size,
per_device_eval_batch_size=per_device_train_batch_size,
bf16=bf16,
learning_rate=lr,
num_train_epochs=epochs,
gradient_checkpointing=gradient_checkpointing,
gradient_accumulation_steps=2,
logging_dir=f"{output_dir}/logs",
logging_strategy="steps",
logging_steps=10,
optim="adafactor",
save_strategy="epoch",
save_total_limit=3,
evaluation_strategy="epoch",
load_best_model_at_end=False,
no_cuda=False,
auto_find_batch_size=True
)
trainer = Trainer(
model=model,
args=training_args,
train_dataset=dataset_train,
compute_metrics=compute_metrics,
preprocess_logits_for_metrics=preprocess_logits_for_metrics,
eval_dataset=dataset_eval,
data_collator=default_data_collator
)
trainer.train()
trainer.model.save_pretrained(output_dir)
del model
del trainer
peft_config = PeftConfig.from_pretrained(output_dir)
model = AutoModelForCausalLM.from_pretrained(
peft_config.base_model_name_or_path,
load_in_8bit=False,
return_dict=True,
device_map="auto",
torch_dtype=torch.float16,
low_cpu_mem_usage=True,
)
model = PeftModel.from_pretrained(
model,
output_dir,
torch_dtype=torch.float16,
device_map="auto",
)
model.eval()
os.makedirs("lora", exist_ok=True)
merged_model = model.merge_and_unload()
merged_model.save_pretrained('lora')
tokenizer = AutoTokenizer.from_pretrained(model_id)
tokenizer.save_pretrained('lora')
In principle, I am loading the original model with the merged weights, finetune that on new data likewise with PEFT and LoRA and afterwards merging the weights again into the base model.
Is this a sensible approach, or is there something to suggest, for example, that I might even significantly compromise the original capabilities by doing so? If something speaks against it, what would be a better approach?
Kind regards and thanks in advance