Batch processing for stream dataset


I am working on the Oscar dataset and trying to filter some entries based on the results of the One-Shot classification model.

Since the Oscars dataset is huge and I can’t load at once, I’ m using the stream mode of Datasets.

To filter the dataset, I use this function

def infernce_batch(examples):

   outputs, scores, texts   = [], [], []

   for exemple in  exemples['text']:
       res = classifier_ort(exemple, classes)
       res_mean = np.mean(np.array(res['scores']))

       outputs.append(True if res_mean > 0.5 else False)

   return {"text": texts, "is_class": outputs, "score": scores} 

After that I call the MAP function to iterate throw the dataset

updated_dataset =, 

And to get back the results

for exemple in updated_dataset.take(100):


To me, this is not the most efficient way to process this dataset, as it is streaming data row by row and not by batch, and I didn’t see any difference between using the Map function with batched=True and the MAP function with batched=False.

So my question is, is there an efficient way to process the Oscar dataset faster using stream mode and batch processing?

Thank you