I’ve update transformers package, as well as datasets package but still I have the same error.
By looking at the error report it seems that the error is in arrow_dataset.py module, in remove_columns() function. While the fix you mentioned was in transformers package so I was expecting that it wouldn’t have solved the issue.
I’m not very expert but by reading the error sentence:
TypeError: cannot pickle '_thread.lock' object
It seems that it is trying to pickle a thread object.
But I can not understand from where this thread object come from and why it “belongs” to the dataset object it is trying to deepcopy.
This reddit thread is not directly related to my scenario but it seems to explain the root cause of the problem (freezing to disk a thread object).
Do you have any idea what could be the source of this issue?