How to implement LoRA with Pytorch?

Hello everybody,

I want to implement LoRA to fine-tune a BERT model.
My code returns me all weights and I freeze them with .requires_grad=False.
Now I want to add the BAx multiplication of LoRA to the weights of BERT as trainable parameters, as shown in the paper of LoRA:

How can I implement LoRA with Pytorch?

Currently my code is:

# Load pre-trained model and tokenizer
model = BertForMaskedLM.from_pretrained('bert-base-uncased')
tokenizer = BertTokenizer.from_pretrained('bert-base-uncased')

# Print all weights
for param, vals in  model.named_parameters():
  if "weight" in param:
    print(param, "\t\t", vals.shape)

# Freeze all weights
for param in model.parameters():
  param.requires_grad = False

# How much parameters are trainable?
sum(p.numel() for p in model.parameters() if p.requires_grad)

In the microsoft implementation on github they creates an own LoRALayer():

class LoRALayer():
    def __init__(
        r: int, 
        lora_alpha: int, 
        lora_dropout: float,
        merge_weights: bool,
        self.r = r
        self.lora_alpha = lora_alpha
        # Optional dropout
        if lora_dropout > 0.:
            self.lora_dropout = nn.Dropout(p=lora_dropout)
            self.lora_dropout = lambda x: x
        # Mark the weight as unmerged
        self.merged = False
        self.merge_weights = merge_weights

Further they have implementations for Embedding, Linear, MergedLinear, ConvLoRA, Conv2d(ConvLoRA), Conv1d(ConvLoRA) and Conv3d(ConvLoRA). Why do they implement this edit layers and what can I transfer of this implementation to my individual LoRA implementation?

Best regards and thank you in advance