You can disable decoding and apply your own transform to decode the images:
def decode_images(batch):
batch["rawscan"] = [decode_image(raw_data) for raw_data in batch["rawscan"]]
return batch
ds = ds.cast_column("rawscan", Image(decode=False)
ds = ds.with_transform(decode_images)
ds[0]["rawscan"] # transformed using decode_images
Then you can also iterate on your dataset to print the list of invalid images (e.g. using a try/except in decode_images)