How to run image classification on image url

My dataset has all of its photos as jpg urls, which are all strings. How can I run an image classification model like base ViT or ResNet-50 to convert the string to images?

I usually use the requests module for that (which is part of Python):

from PIL import Image
import requests

url = 'http://images.cocodataset.org/val2017/000000039769.jpg'
image = Image.open(requests.get(url, stream=True).raw)
image

So would I have to define a function to go through all of the rows in the ā€˜Imageā€™ column? How would I accomplish that?

Or is there a way to convert the column into the feature type for images?

Yes you can probably map them all to Pillow images and then cast to the Image feature:

from datasets import load_dataset
import requests
from PIL import Image

def to_pillow(examples):
    urls = examples['Photo']
    images = []
    for url in urls:
      image = Image.open(requests.get(url, stream=True).raw)
      images.append(image)
    
    examples['image'] = images
    
    return examples

dataset = load_dataset("TheNoob3131/mosquito-data")
dataset = dataset.map(to_pillow, batched=True)

from datasets import Image

dataset = dataset.cast_column('image', Image)

Iā€™m going to cc @mariosasko here (map seems to very slow). Alternatively, you can do:

dataset.set_transform(to_pillow)

to do this on the fly.

1 Like

You can make it much faster by doing it in parallel by the way :slight_smile:

dataset = dataset.map(to_pillow, batched=True, num_proc=4)
1 Like