You have to have a DatasetDict Format with image/pixel_values and label, alternatively you can also create like this,
data
|-training
| |-image
| |-img_id.jpg
| |-mask
| |-mask_id.jpg
└──validation
|-image
|-img_id.jpg
|-mask
|-mask_id.jpg