I have a semantic segmentation problem with several hundred classes. It appears that SegformerImageProcesser needs to be able to trivially convert masks to an 8-bit PIL. If I pass in an RGB pil, then I get pixel_values.shape = (3,512,512), and labels.shape = (512,512,3). That makes me think this isn’t an intended usage.
AFAIK the only transform I need to apply to the mask is a resize, so I can easily do that on my own, but it seems like an odd limitation.
Am I misunderstanding something here?