Put 6 channels into the BeitFeatureExtractor input

There are two pictures and a mask image. I want to train by merging the two images to make 6 channels and comparing them with the mask image.
I have 6 channels as inputs to the BeitForSemanticSegmentation model. And when I put an image with 6 channels in BeitFeatureExtractor, it says that it cannot be processed. I was wondering if there is a way to modify the input width, height and even channels in BeitFeatureExtractor.
I know the width and height, but I don’t know the channel
