PEFT + Inference

Is there a way to get PEFT to work with inference endpoints?

Ideally, we should be able to support multiple PEFT models with a common inference endpoint for the base model.

3 Likes

any updates here?

You could configure a custom handler that allows you to specify code to load the model and its adapters Create custom Inference Handler

How can we use it text-generation-inference? I assume that just loading it through a custom handler will serve it via transformers, no?