Hi everyone
We’re building an instant virtual try-on application where users can:
-
Upload their own avatar (full-body image)
-
Separately upload different outfit items such as t-shirts, jackets, hoodies, etc.
-
Then try them on instantly, with proper multi-layering support (for example, a jacket correctly layering over a t-shirt).
We’ve tested several VTO and segmentation models from Hugging Face and other sources, but so far none of them support proper real-time layering between multiple clothing items and the human body.
Our main goals are:
-
Realistic and accurate multi-layer garment compositing
-
Instant processing speed (ideally within 1–5 seconds per try-on)
-
Ability to segment human and each clothing item separately
-
Blend layers naturally (maintaining texture, folds, and depth)
We’d love some guidance on:
-
What model architecture or pipeline we should follow for this kind of instant layering try-on system
-
Any pretrained models or open-source frameworks that already support multi-layer virtual try-on
-
Whether we should combine human parsing + clothing segmentation + warping models, and if so, how to structure that
Any advice, references, or model suggestions would be hugely appreciated