Direct Load vs. Base Model + LoRA: How Should You Use It?

I have trained an embedding model using LoRA and have some questions regarding how to load the trained LoRA adapter.

There are two different ways to load the LoRA adapter:
(1) Direct loading method

direct_adapter = SentenceTransformer(lora_adapter_path)

(2) Adding to a base model

model = SentenceTransformer(base_model)  
model.load_adapter(lora_adapter_path)  

I would like to know if there are any performance differences between these two methods and the reason for using one method over the other.

1 Like

I think the former may be slightly faster. If you apply the merge_and_unload() function mentioned in the following issue to the latter, it should become a merged state similar to the former.

Once merged, there are no particular drawbacks other than the fact that it cannot be separated.