Remove_columns option for .map

Like many people, I’m working through the tutorial on fine-tuning wav2vec2, adapting it in various ways. I’m considering the following line of code:

timit = timit.map(prepare_dataset, remove_columns=timit.column_names[“train”], num_proc=4)

My understanding is that the remove_columns option will first process any columns in the list and then remove them. I also think that, for a datasetDict, the function prepare_dataset will be applied to all dictionary entries (i.e. ‘train’, ‘test’, etc). So my question: is remove_columns strictly necessary here? What is its purpose exactly? Is it just removing unnecessary information from memory, or will the function be applied incorrectly if it is not included?

thanks
Jonathan

1 Like