Hello. I received following error while processing(tokenizing) my custom dataset.
'i' format requires -2147483648 <= number <= 2147483647 error
.
I used following codes.
self.tokenized_datasets_training = dataset_training.map(
tokenize_function,
batched=True,
batch_size=6000,
remove_columns=["codes"],
load_from_cache_file=self.config.cache,
num_proc=4,
fn_kwargs={"tokenizer": self.tokenizer},
)
And dataset_training
has 2301617 rows. When I run it with the num_proc=1
, it works quite well. But it returns error when num_proc >= 2
. I run it on the server so it has 23GB. My data is about 7GB.
I can process it with num_proc=1
but what should I do if I want more than 2 for num_proc
.