Class-conditional image generation

Hey everyone,
for a university project I want to play around with class-conditional DDIM image generation based on a custom created dataset of microscopy images. Basically I want to generate synthetic microscopy images of a specific biological application based on a set of features that should be visible.

I have a dataset of roughly 3400 images that I tagged manually with 4 different features.
I.e. for each image I have a vector of length 4, with 1 indicating that the specific feature is visible in the image. I want this to be necessary to control what kind of images are generated, like in stable diffusion but without a whole text prompt.

I need ideas or implementation hints on how to implement that using huggingface and accelerate.
I dont how how to add the feature vector in the pipeline for training and inference.

Thanks for any kind of help! :slight_smile: