Should I use .map(processor) or define tokenizer=processor?

I am working in a vision use case. I have the processor:

processor = ViTImageProcessor.from_pretrained('google/vit-base-patch16-224')

Then I process my dataset as follow:

def apply_processor(example):
  example['pixel_values'] = processor(example['image'].convert("RGB"), return_tensors="pt").pixel_values.squeeze()
  return example

processed_dataset = pet_dataset.map(apply_processor)

Considering this, should I also add the tokenizer = processor in the transformers.Trainer ? If not, which one is the best option, doing the map/transform/etc or doing the tokenizer=processor?

Thanks in advance!

1 Like