Hi folks. I am attempting to use large language models (specifically Phi-3-mini) as a Token Classifier. This was recently made easy to do with the transformers
library thanks to the Phi3ForTokenClassification
implementation. I am having difficulty training this model via Parameter Efficient Fine Tuning (PEFT, i.e. LoRa).
I am creating an instance of Phi3ForTokenClassification
from the pre-trained Phi-3-mini model as follows:
model = Phi3ForTokenClassification.from_pretrained(
"microsoft/Phi-3-mini-4k-instruct",
attn_implementation="flash_attention_2",
num_labels=len(labels_vocab),
id2label=id2label,
label2id=label2id,
use_cache=False,
torch_dtype=torch.bfloat16
)
As expected, since the head of this model is getting replaced with a linear layer for predicting the one-hot-encoded token labels, I get the warning that that specific layer has not been trained yet:
Some weights of Phi3ForTokenClassification were not initialized from the model checkpoint at microsoft/Phi-3-mini-4k-instruct and are newly initialized: ['classifier.bias', 'classifier.weight']
You should probably TRAIN this model on a down-stream task to be able to use it for predictions and inference.
At this point. I am assuming that I need to perform a fine-tuning of the core mode layers (i.e. attention heads / mlp, etc.) and a full training of that last classifier layer.
I am training on a GTX 4090 (24GB of NVRAM). As such, I need to leverage PEFT with a LoRa which I configure as follows:
peft_config = LoraConfig(
r=32,
lora_alpha=16,
lora_dropout=0.1,
bias="none",
task_type="TOKEN_CLS",
target_modules=["q_proj", "k_proj", "v_proj", "o_proj",
"gate_proj", "up_proj", "down_proj"],
modules_to_save=["classifier"],
inference_mode=False
)
peft_model = get_peft_model(model, peft_config)
When checking the number of trainable parameters, I get trainable params: 18,000,953 || all params: 3,740,755,058 || trainable%: 0.4812
. Which seems right to me.
Based on what I’ve research on modules_to_save
. This seems like the right configuration and would result in a full training of the classifier
module of the model. When I print the model details, this is the classifier layer: (classifier): Linear(in_features=3072, out_features=57, bias=True)
.
Since this is LoRa and we’re training new weights, I drafted my training configuration with a fairly aggressive learning rate, as follows:
training_args = TrainingArguments(
bf16=True,
output_dir="outputs",
learning_rate=(2e-4 * 4),
gradient_accumulation_steps=4,
per_device_train_batch_size=4,
per_device_eval_batch_size=4,
num_train_epochs=3,
logging_steps=10,
logging_strategy="steps",
eval_strategy="epoch",
save_strategy="epoch",
report_to="wandb"
)
And I am training with:
trainer = Trainer(
model=peft_model,
args=training_args,
train_dataset=dataset['train'],
eval_dataset=dataset['validation'],
tokenizer=tokenizer,
data_collator=data_collator,
# calculates precision, recall, accuracy, and f1
compute_metrics=compute_metrics,
)
The training seems to run correctly. Every epoch the prevision, recall, accuracy, and f1 scores all seem reasonable and improving. After the 1st epoch, my f1 score is ~0.66
, improving to ~0.72
in the 2nd epoch.
Once my short training run is complete. I save the model as follows:
merged_model = trainer.model.merge_and_unload()
merged_model.save_pretrained("model-name")
To test my model, I load it for inference as follows:
model = AutoModelForTokenClassification.from_pretrained("model-name")
tokenizer = AutoTokenizer.from_pretrained("microsoft/Phi-3-mini-4k-instruct")
token_classifier = pipeline("ner", model=model, tokenizer=tokenizer)
I get very poor results during inference. I am not completely convinced I am doing this correctly. I made a few assumption above regarding how the training would work that I am not sure are correct.
I rented an A6000Ada to do a full (non-PEFT) training on the same dataset. After 2 epochs, the training had a lower accuracy, precision, recall, and f1 score. But, it performed significantly better when doing test inference.
Does anyone have any suggestion regarding how I can make this better. I am not afraid to go deep dive material. I have a ton to learn and I’m here for it. Thanks in advance!