Can dataset.map accept multiple arguments like python map

In Python, map works as follows:

map(func, arg_1, arg_2)

In datasets.map, we are required to pass in a callable (which expects objects of form dataset[idx], which means that certain things like tokenizer have to be defined and should be accessible within the scope of this function, along with that other parameters that we want to pass. Can we pass arguments like a normal func call as shown above ? I’m asking because I have two preprocess_func for train and validation split, and I have to write both functions twice which looks repetitive (since there are slight changes in both).

Hi @prajjwal1, as described in the docs, you can pass a dict called fn_kwargs that can include the extra arguments for your map function

1 Like

Okay, thanks. I had a look at docs before posting, but missed it.

1 Like