Using Text Generation Inference with LoRA adapter

I had just trained my first LoRA model but I believe that I might have missed something.
After training a Flan-T5-Large model, I tested it and it was working perfectly.
I decided that I wanted to test its deployment using TGI.
I managed to deploy the base Flan-T5-Large model from Google using TGI as it was pretty straightforward. But When I came to test the LoRA model I got using pipeline, the model underperformed heavily.
I noticed that when I trained my LoRA model, I did not get a “config.json” file, I got an “adapter_config.json” file. I understood that what I basically had was only the adapter.
I don’t know if that is one of the reason, as after training I did more research concerning LoRA and I noticed that in the documention they had mentioned “merging” and “loading” between the base model and the LoRA, which I did not do at the start. I basically trained and got several checkpoints for each epoch. Tested the checkpoint that had the best metrics and pushed it to my private hub. These are the files that I have pushed to my hub:

  • gitattributes
  • README.md
  • adapter_config.json
  • adapter_model.safetensors
  • special_tokens_map.json
  • spiece.model
  • tokenizer.json
  • tokenizer_config.json

While trying to avoid re-training, how can I deploy the LoRA model to test properly using Pipeline so that I can also deploy it on TGI?