Progress bar of dataset.map with num_proc>1 hangs

Hello,

I’m currently experiencing an issue with my code. Here’s the code snippet:

  df = pd.read_csv(self.args["csv_input"])
  ds = Dataset.from_pandas(df)

  ds = ds.map(
    self.process_data,
    num_proc=self.args["num_workers"],
    with_indices=True,
    batched=True,
    batch_size=int(len(df) / self.args["num_workers"]),
    load_from_cache_file=True,
    desc="Processing Data.....",
)

However, the progress bar hangs, and the status is not updating:

Processing Data... (num_proc=10):   0%|        | 0/520 [00:00<?, ? examples/s]

I’m using Python 3.10.13 and datasets-2.15.1.dev0. Any suggestions on resolving this issue would be greatly appreciated.

Thanks a million!

It is hard to tell what the problem is without the complete code. If it works in the single-process mode, then this most likely means you don’t have enough RAM for num_proc=10 (try to reduce writer_batch_size and batch_size in the map if that’s the case).

Thank you for your reply.

I tried running the code with a single core ( num_proc=1) but still have the same issue.

I think I have enough RAM but I will try with reducing the writer_batch_size and batch_size parameters to see how it goes.

Thank you again