Minimize number of transformers checkpoints for serving muliple client

Hi all,

my objective is to build a platform where every costumer can send its own classification text corpus and get back its own model trained and served. Training a single transformers for every costumer is straightforward but untractable in terms of disk usage while number of costumers increases. I could use a single bert backbone to get embeddings from each corpus and train a custom two layers neural net for each costumers. It is a first strategy that make disk usage more reasonable.
My question is : does it exist a kind of white paper, blog or whatever that assess the problem and propose possible strategies while maintaining the highest performance.
I’m sure it is a common issue every AI based company could face.
Thanks for your help.