@yjernite Problem While training Amharic Language BERT on oscar dataset
You need to remove the id
column in the dataset:
tokenized_datasets = datasets.map(tokenize_function, batched=True, num_proc=4, remove_columns=["id", "text"])
1 Like
That solved it , thank you
1 Like