Confusion between diffusers and webui

Hi, I’m faced with a confusing situation and would appreciate any advice from anyone. My colleague trained some lora models and deploy SD with webui. He then found the performance is not satisfying (due to old GPU machines) and I’m assigned the task to improve it a little bit. I’m new to the SD area, and after some digging I found there’re roughly two mainstream architectures, i.e., CompVis (or stability-ai or A1111) and diffusers. I compared the performance of 3 methods: CompVis’ scripts/txt2img.py (assuming it’s close to webui), diffusers and onediff. The result showed that performance increases in the above order. But webui provides lot’s of additional functionalities like prompt weighting, dynamic lora loading/unloading, text inversion out-of-the-box. I know they could be achieved using diffusers and maybe some 3rdparty libs or customized code. So I’m planing to make use of diffusers’ pipeline, and integrate webui’s functionalities. But I’m not sure whether it’s worthy.
The SD world is evolving quickly, led by Stability AI, CompVis or RunwayML, and followed by the vast community, including HuggingFace. If I did it this way, next time when new features emerges I would have to wait for diffusers to catch up with it or make it myself, which is time or energy consuming. So I’m not sure whether this is a good approch for me.
To my knowledge, optimizations for SD include xformers, attention slicing, pytorch 2.0’s sdp. (Correct me if I’m mistaken. And I’d like to know if tensorrt or Nvidia’s FasterTransformer could make it further.) These are all included in diffusers and webui, so I’m not sure what causes the performance gap.