Minimize number of transformers checkpoints for serving muliple client

Hi all,

my objective is to build a platform where every costumer can send its own classification text corpus and get back its own model trained and served. Training a single transformers for every costumer is straightforward but untractable in terms of disk usage while number of costumers increases. I could use a single bert backbone to get embeddings from each corpus and train a custom two layers neural net for each costumers. It is a first strategy that make disk usage more reasonable.
My question is : does it exist a kind of white paper, blog or whatever that assess the problem and propose possible strategies while maintaining the highest performance.
I’m sure it is a common issue every AI based company could face.
Thanks for your help.


Hey @ykacer – have you looked at our newest library, peft? If your problem can be solved through fine-tuning of a few base models, the total disk usage is very reasonable :slight_smile:

Hi @joaogante, thanks a lot for the suggestion i’m gonna have a look at it.

Dear @joaogante, thanks again for your information, i was able to succesfully run a Lora based roberta with my own data using one of your examples notebook. Just a question: I was wondering how PEFT is different from Adapter framework?