Hello everybody,
I want to implement LoRA to fine-tune a BERT model.
My code returns me all weights and I freeze them with .requires_grad=False
.
Now I want to add the BAx
multiplication of LoRA to the weights of BERT as trainable parameters, as shown in the paper of LoRA:
h=Wx+BAx
How can I implement LoRA with Pytorch?
Currently my code is:
# Load pre-trained model and tokenizer
model = BertForMaskedLM.from_pretrained('bert-base-uncased')
tokenizer = BertTokenizer.from_pretrained('bert-base-uncased')
# Print all weights
for param, vals in model.named_parameters():
if "weight" in param:
print(param, "\t\t", vals.shape)
# Freeze all weights
for param in model.parameters():
param.requires_grad = False
# How much parameters are trainable?
sum(p.numel() for p in model.parameters() if p.requires_grad)
In the microsoft implementation on github they creates an own LoRALayer():
class LoRALayer():
def __init__(
self,
r: int,
lora_alpha: int,
lora_dropout: float,
merge_weights: bool,
):
self.r = r
self.lora_alpha = lora_alpha
# Optional dropout
if lora_dropout > 0.:
self.lora_dropout = nn.Dropout(p=lora_dropout)
else:
self.lora_dropout = lambda x: x
# Mark the weight as unmerged
self.merged = False
self.merge_weights = merge_weights
Further they have implementations for Embedding, Linear, MergedLinear, ConvLoRA, Conv2d(ConvLoRA), Conv1d(ConvLoRA) and Conv3d(ConvLoRA)
. Why do they implement this edit layers and what can I transfer of this implementation to my individual LoRA implementation?
Best regards and thank you in advance
Christian