Like many people, I’m working through the tutorial on fine-tuning wav2vec2, adapting it in various ways. I’m considering the following line of code:
timit = timit.map(prepare_dataset, remove_columns=timit.column_names[“train”], num_proc=4)
My understanding is that the remove_columns option will first process any columns in the list and then remove them. I also think that, for a datasetDict, the function prepare_dataset will be applied to all dictionary entries (i.e. ‘train’, ‘test’, etc). So my question: is remove_columns strictly necessary here? What is its purpose exactly? Is it just removing unnecessary information from memory, or will the function be applied incorrectly if it is not included?