Training HF transformer models on custom (not text) data

Hi,
is there a way to train data not in text form using the hf models? For example, if I want to train a transformer on some representation of images and use it to generate new images, how exactly would I accomplish this?