I have finetuned a LLM model on my custom data using peft and lora on Autotrain Advance UI.
Now how to create a inference endpoint after finetuning it?
If anyone have idea about it or have done this before please share it.
1 Like
+1 for that. text-generation-inference supports loading a PeftModel directly with the adapter, without the need to merge weights before. I’d expect the Inference Endpoints to support this option, however I couldn’t find a way to do it. Specifically for quantized models this is very important, as it’s not possible to merge LoRA weights into the base