How to check if image exists at image url?

TheNoob3131 · July 21, 2022, 3:54pm

So every time I try to train my model (google/vit-base-patch16-224-in21k) on my dataset for image classification, it always gives a FileNotFound error (see below).

Is there a way to go through the entire ‘Photos’ column and delete/identify whichever rows have an image url that doesn’t exist? Is there a python/huggingface function for that?

nielsr · July 22, 2022, 7:54am

You can simply skip it as follows:

def to_pillow(examples):
    urls = examples['Photo']
    images = []
    for url in urls:
      try:
          image = Image.open(requests.get(url, stream=True).raw)
          images.append(image)
      except:
          pass
    
    examples['image'] = images
    
    return examples

dataset = load_dataset("TheNoob3131/mosquito-data")
dataset = dataset.map(to_pillow, batched=True)

Topic		Replies	Views
Handling non-existing url in image dataset while cast_column 🤗Datasets	2	419	January 16, 2024
Handle errors when loading images (404, corrupted, etc) 🤗Datasets	4	824	August 17, 2023
How to run image classification on image url 🤗Datasets	5	2637	July 21, 2022
Handling decoding errors such as UnidentifiedImageError 🤗Datasets	10	861	February 5, 2025
Issues in loading image from dataset Beginners	3	1166	January 22, 2024

How to check if image exists at image url?

Related topics