Is there any downside to using either options? If I remember correctly (?) lambdas are not picklable. So my assumption would be that if you do something like
new_dataset = my_dataset.map(lambda batch: my_processing_func(batch, model, tokenizer), batched=True)
it won’t be cached. Is that correct?