Layout-to-Image Conditioning

Hello there,

I’m trying to implement the the layout-to-image generation from the original stable diffusion paper. But I was not successful so far. Has anyone done it yet to use a sequence of bounding boxes as conditioning?
So my goal is to train the model on a set of bounding boxes with labels to generate realistic images.

Thank you so much for your help.