How to minimize pixelation effect when training on small images

offchan · May 17, 2023, 1:35am

When training the model on datasets with diverse image size e.g. from 256 upto 1024, I typically resize every image to a specific size e.g. 512 and then train the model on 512x512.

After training, the model tend to generate pixelated images when CFG is set to high value. (I think it’s because of the 256x256 images being upscaled to 512)

Is “converting everything into the same size” how people typically train the model?
Should we add extra tags to the caption e.g. “pixelated” or “low quality” to small images and use them as negative prompts at inference?
Any best practices of how I can tackle training the model on diverse image sizes using huggingface datasets? How do I batch several images with similar sizes together?

williamberman · May 22, 2023, 6:31pm

I believe in our training scripts, converting to the same size is usually what we do. However if that’s recommended always, I do see potential issues with it.

I don’t know about number 2

You can post process the images as they’re loadable through standard torch datasets, we have examples in some of the training scripts (controlnet iirc)

offchan · May 22, 2023, 11:51pm

Regarding number 3, I also refer to images with different aspect ratios which people usually do Aspect Ratio Bucketing (ARB) with. How do I do ARB with huggingface datasets? Is it possible?

williamberman · May 29, 2023, 5:19pm

Yes that’s a good point, aspect ratio bucketing makes a lot of sense. I don’t think there’s a generic way to load only images of a particular aspect ratio as it should depend on how the dataset is stored. Assuming there’s some set of index files that point to url’s of the images, I’d maybe recommend forking the dataset into multiple datasets such that the index files are filtered on resolution. That seems like the most straightforward way.

Novruz97 · June 8, 2023, 7:24am

Hey @offchan, could you tell me how did you tackle the problem of training controlnet with different image sizes?
I am trying to train on laion2B-en which has various size ratios. Did you separate similar sized images or other type of action? I would really appreciate your answer!

offchan · June 8, 2023, 8:17am

I crop and resize them to be a square image of same size e.g. 512x512.
I only train on images which are bigger than 512x512. If they are not bigger, then I’ll drop them from the training set. The reason is because I don’t want to resize small images to be big as they will become pixelated and might hurt model quality. I’m not sure if this is how most people do it but it seems to work fine for now.

I haven’t done Aspect Ratio Bucketing yet so I cannot train the model with varying aspect ratios. They’re currently all cropped to be square.

Topic		Replies	Views
Changing resolution of transformer models for training 🤗Transformers	0	644	September 2, 2022
Fine tuning image transformer on higher resolution Beginners	11	7868	May 1, 2024
How to train controlnet with very large dataset 🧨 Diffusers	0	89	July 7, 2024
How to reduce cache during training Beginners	5	4305	January 16, 2024
VQModel usage issues 🧨 Diffusers	0	401	October 20, 2023

How to minimize pixelation effect when training on small images

Related topics