Mini Stable Diffusion. How-to guide?

After playing with image generation using Stable Diffusion models in Web UI - I find questions for myself:
Is it possible to make it run much faster? Its not about optimizations to run existsing models, but make completely new one, smaller. Like use Unet with less channels. What speed difference will be if make block channels value 4,8,16 times smaller? Does it will be still good in image quality if use very thing topic (less words and objects) ? Fast training on consumer GPU?

I’m new in this topic, now doing investigation in diffusers docs and source code, but maybe somebody can guide me to get my idea faster.
So right now I’m looking around how to replace UNet in existing SD model by smaller Unet of same kind, check image generation time, then try no train it.

Originaly idea was about make something like VTube model for webcamera with only one character, but usual Stable Diffusion models is too slow and resource-hungry to make this on gaming GPUs, even high-end (like 20 images per second and still have free performance and memory for game).