Pickling issue using map

I am mapping my dataset with the following compute_metrics method which give me a pickling issue.

    metric_cfg_list = config["metric_list"]
    metrics = [evaluate.load(metric_cfg["path"]) for metric_cfg in metric_cfg_list]

    # Placeholder for a tokenizer or normalizer class if needed.
    tokenizer = None

    def compute_metrics(sample):
        for metric in metrics:
            sample[metric.name] = metric.compute(
                predictions=[sample["clean_prediction"]],
                references=[sample["clean_label"]]
            )
        return sample

the following is the error message

Parameter 'function'=<function main.<locals>.compute_metrics at 0x7aa60a95f0a0> of the transform datasets.arrow_dataset.Dataset._map_single couldn't be hashed properly, a random hash was used instead. Make sure your transforms and parameters are serializable with pickle or dill for the dataset fingerprinting and caching to work. If you reuse this transform, the caching mec
hanism will consider it to be different from the previous calls and recompute everything. This warning is only showed once. Subsequent hashing failures won't be showed.                                                                                                                                                                                                               
Map (num_proc=16):   0%|                                                                                                                                                                                                                                                                                                              | 0/2116 [00:00<?, ? examples/s]                 
Traceback (most recent call last):                                                                                                                                                                                                                                                                                                                                                     
  File "/ws/ifp-54_2/hasegawa/haolong2/AI4EE/CSR4RSR/evaluation.py", line 207, in <module>  
...
    StockPickler.save(self, obj, save_persistent_id)                                         
  File "/ws/ifp-53_2/hasegawa/haolong2/miniconda3/envs/csr4rsr/lib/python3.10/pickle.py", line 578, in save                                                                                
    rv = reduce(self.proto)                                                                  
TypeError: cannot pickle 'ThreadLocalFileContext' object 

I saw a relevant post about the nonpicklable issue with some tokenizer and ppl solved it by implementing the getstate method or so. In my case, it’s an object from the evaluate package. I wonder how I should modify them to avoid this error.

1 Like

Hmm… unless it’s a problem with dill, multiprocessing, or the cache, it’s better to call lhonestq…

You can also provide your own unique hash in map if you want, with the new_fingerprint argument.
Or disable caching using

import datasets
datasets.disable_caching()

I tried both new_fingerprint and disable_cache(), but all still gave the same bug.

the complete error is as follow:

Map (num_proc=16):   0%|                                                                                                                                                                                                                                                                                                                               | 0/2116 [00:00<?, ? examples/s]
Traceback (most recent call last):                                                                                                                                                                                                                                                                                                                                                     
  File "/ws/ifp-54_2/hasegawa/haolong2/AI4EE/CSR4RSR/evaluation.py", line 213, in <module>                                                                                                                                                                                                                                                                                             
    main()                                                                                                                                                                                                                                                                                                                                                                             
  File "/ws/ifp-54_2/hasegawa/haolong2/AI4EE/CSR4RSR/evaluation.py", line 178, in main                                                                                                                                                                                                                                                                                                 
    ds[split] = ds[split].map(                                                                                                                                                                                                                                                                                                                                                         
  File "/ws/ifp-53_2/hasegawa/haolong2/miniconda3/envs/csr4rsr/lib/python3.10/site-packages/datasets/arrow_dataset.py", line 557, in wrapper                                                                                                                                                                                                                                           
    out: Union["Dataset", "DatasetDict"] = func(self, *args, **kwargs)                                                                                                                                                                                                                                                                                                                 
  File "/ws/ifp-53_2/hasegawa/haolong2/miniconda3/envs/csr4rsr/lib/python3.10/site-packages/datasets/arrow_dataset.py", line 3166, in map                                                                                                                                                                                                                                              
    for rank, done, content in iflatmap_unordered(                                                                                                                                                                                                                                                                                                                                     
  File "/ws/ifp-53_2/hasegawa/haolong2/miniconda3/envs/csr4rsr/lib/python3.10/site-packages/datasets/utils/py_utils.py", line 720, in iflatmap_unordered                                                                                                                                                                                                                               
    [async_result.get(timeout=0.05) for async_result in async_results]                                                                                                                                                                                                                                                                                                                 
  File "/ws/ifp-53_2/hasegawa/haolong2/miniconda3/envs/csr4rsr/lib/python3.10/site-packages/datasets/utils/py_utils.py", line 720, in <listcomp>                                                                                                                                                                                                                                       
    [async_result.get(timeout=0.05) for async_result in async_results]                                                                                                                                                                                                                                                                                                                 
  File "/ws/ifp-53_2/hasegawa/haolong2/miniconda3/envs/csr4rsr/lib/python3.10/site-packages/multiprocess/pool.py", line 774, in get                                                                                                                                                                                                                                                    
    raise self._value                                                                                                                                                                                                                                                                                                                                                                  
  File "/ws/ifp-53_2/hasegawa/haolong2/miniconda3/envs/csr4rsr/lib/python3.10/site-packages/multiprocess/pool.py", line 540, in _handle_tasks                                                                                                                                                                                                                                          
    put(task)                                                                                                                                                                                                                                                                                                                                                                          
  File "/ws/ifp-53_2/hasegawa/haolong2/miniconda3/envs/csr4rsr/lib/python3.10/site-packages/multiprocess/connection.py", line 209, in send                                                                                                                                                                                                                                             
    self._send_bytes(_ForkingPickler.dumps(obj))                                                                                                                                                                                                                                                                                                                                       
  File "/ws/ifp-53_2/hasegawa/haolong2/miniconda3/envs/csr4rsr/lib/python3.10/site-packages/multiprocess/reduction.py", line 54, in dumps                                                                                                                                                                                                                                              
    cls(buf, protocol, *args, **kwds).dump(obj)                                                                                                                                                                                                                                                                                                                                        
  File "/ws/ifp-53_2/hasegawa/haolong2/miniconda3/envs/csr4rsr/lib/python3.10/site-packages/dill/_dill.py", line 420, in dump                                                                                                                                                                                                                                                          
    StockPickler.dump(self, obj)                                                                                                                                                                                                                                                                                                                                                       
  File "/ws/ifp-53_2/hasegawa/haolong2/miniconda3/envs/csr4rsr/lib/python3.10/pickle.py", line 487, in dump                                                                                                                                                                                                                                                                            
    self.save(obj)                                                                                                                                                                                                                                                                                                                                                                     
  File "/ws/ifp-53_2/hasegawa/haolong2/miniconda3/envs/csr4rsr/lib/python3.10/site-packages/dill/_dill.py", line 414, in save                                                                                                                                                                                                                                                          
    StockPickler.save(self, obj, save_persistent_id)                                                                                                                                                                                                                                                                                                                                   
  File "/ws/ifp-53_2/hasegawa/haolong2/miniconda3/envs/csr4rsr/lib/python3.10/pickle.py", line 560, in save                                                                                                                                                                                                                                                                            
    f(self, obj)  # Call unbound method with explicit self                                                                                                                                                                                                                                                                                                                             
  File "/ws/ifp-53_2/hasegawa/haolong2/miniconda3/envs/csr4rsr/lib/python3.10/pickle.py", line 902, in save_tuple                                                                                                                                                                                                                                                                      
    save(element)
  File "/ws/ifp-53_2/hasegawa/haolong2/miniconda3/envs/csr4rsr/lib/python3.10/site-packages/dill/_dill.py", line 414, in save
    StockPickler.save(self, obj, save_persistent_id)
  File "/ws/ifp-53_2/hasegawa/haolong2/miniconda3/envs/csr4rsr/lib/python3.10/pickle.py", line 560, in save
    f(self, obj)  # Call unbound method with explicit self
  File "/ws/ifp-53_2/hasegawa/haolong2/miniconda3/envs/csr4rsr/lib/python3.10/pickle.py", line 887, in save_tuple
    save(element)
  File "/ws/ifp-53_2/hasegawa/haolong2/miniconda3/envs/csr4rsr/lib/python3.10/site-packages/dill/_dill.py", line 414, in save
    StockPickler.save(self, obj, save_persistent_id)
  File "/ws/ifp-53_2/hasegawa/haolong2/miniconda3/envs/csr4rsr/lib/python3.10/pickle.py", line 560, in save
    f(self, obj)  # Call unbound method with explicit self
  File "/ws/ifp-53_2/hasegawa/haolong2/miniconda3/envs/csr4rsr/lib/python3.10/site-packages/dill/_dill.py", line 1217, in save_module_dict
    StockPickler.save_dict(pickler, obj)
  File "/ws/ifp-53_2/hasegawa/haolong2/miniconda3/envs/csr4rsr/lib/python3.10/pickle.py", line 972, in save_dict
    self._batch_setitems(obj.items())
  File "/ws/ifp-53_2/hasegawa/haolong2/miniconda3/envs/csr4rsr/lib/python3.10/pickle.py", line 998, in _batch_setitems
    save(v)
  File "/ws/ifp-53_2/hasegawa/haolong2/miniconda3/envs/csr4rsr/lib/python3.10/site-packages/dill/_dill.py", line 414, in save
    StockPickler.save(self, obj, save_persistent_id)
  File "/ws/ifp-53_2/hasegawa/haolong2/miniconda3/envs/csr4rsr/lib/python3.10/pickle.py", line 560, in save
    f(self, obj)  # Call unbound method with explicit self
File "/ws/ifp-53_2/hasegawa/haolong2/miniconda3/envs/csr4rsr/lib/python3.10/pickle.py", line 902, in save_tuple                                                                          
    save(element)                                                                                                                                                                          
  File "/ws/ifp-53_2/hasegawa/haolong2/miniconda3/envs/csr4rsr/lib/python3.10/site-packages/dill/_dill.py", line 414, in save                                                              
    StockPickler.save(self, obj, save_persistent_id)                                                                                                                                       
  File "/ws/ifp-53_2/hasegawa/haolong2/miniconda3/envs/csr4rsr/lib/python3.10/pickle.py", line 560, in save                                                                                
    f(self, obj)  # Call unbound method with explicit self                                                                                                                                 
  File "/ws/ifp-53_2/hasegawa/haolong2/miniconda3/envs/csr4rsr/lib/python3.10/pickle.py", line 887, in save_tuple                                                                          
    save(element)                                                                                                                                                                          
  File "/ws/ifp-53_2/hasegawa/haolong2/miniconda3/envs/csr4rsr/lib/python3.10/site-packages/dill/_dill.py", line 414, in save                                                              
    StockPickler.save(self, obj, save_persistent_id)                                                                                                                                       
  File "/ws/ifp-53_2/hasegawa/haolong2/miniconda3/envs/csr4rsr/lib/python3.10/pickle.py", line 560, in save                                                                                
    f(self, obj)  # Call unbound method with explicit self                                                                                                                                 
  File "/ws/ifp-53_2/hasegawa/haolong2/miniconda3/envs/csr4rsr/lib/python3.10/site-packages/dill/_dill.py", line 1217, in save_module_dict                                                 
    StockPickler.save_dict(pickler, obj)                                                                                                                                                   
  File "/ws/ifp-53_2/hasegawa/haolong2/miniconda3/envs/csr4rsr/lib/python3.10/pickle.py", line 972, in save_dict                                                                           
    self._batch_setitems(obj.items())                                                                                                                                                      
  File "/ws/ifp-53_2/hasegawa/haolong2/miniconda3/envs/csr4rsr/lib/python3.10/pickle.py", line 998, in _batch_setitems                                                                     
    save(v)                                                                                                                                                                                
  File "/ws/ifp-53_2/hasegawa/haolong2/miniconda3/envs/csr4rsr/lib/python3.10/site-packages/dill/_dill.py", line 414, in save                                                              
    StockPickler.save(self, obj, save_persistent_id)                                                                                                                                       
  File "/ws/ifp-53_2/hasegawa/haolong2/miniconda3/envs/csr4rsr/lib/python3.10/pickle.py", line 560, in save                                                                                
    f(self, obj)  # Call unbound method with explicit self                                                                                                                                 
  File "/ws/ifp-53_2/hasegawa/haolong2/miniconda3/envs/csr4rsr/lib/python3.10/site-packages/dill/_dill.py", line 1985, in save_function                                                    
    _save_with_postproc(pickler, (_create_function, (                                                                                                                                      
  File "/ws/ifp-53_2/hasegawa/haolong2/miniconda3/envs/csr4rsr/lib/python3.10/site-packages/dill/_dill.py", line 1117, in _save_with_postproc                                              
    pickler.save_reduce(*reduction)                                                                                                                                                        
  File "/ws/ifp-53_2/hasegawa/haolong2/miniconda3/envs/csr4rsr/lib/python3.10/pickle.py", line 692, in save_reduce                                                                         
    save(args)                                                                                                                                                                             
  File "/ws/ifp-53_2/hasegawa/haolong2/miniconda3/envs/csr4rsr/lib/python3.10/site-packages/dill/_dill.py", line 414, in save                                                              
    StockPickler.save(self, obj, save_persistent_id)                                                                                                                                       
  File "/ws/ifp-53_2/hasegawa/haolong2/miniconda3/envs/csr4rsr/lib/python3.10/pickle.py", line 560, in save                                                                                
    f(self, obj)  # Call unbound method with explicit self                                                                                                                                 
  File "/ws/ifp-53_2/hasegawa/haolong2/miniconda3/envs/csr4rsr/lib/python3.10/pickle.py", line 887, in save_tuple                                                                          
    save(element)                                                                                                                                                                          
  File "/ws/ifp-53_2/hasegawa/haolong2/miniconda3/envs/csr4rsr/lib/python3.10/site-packages/dill/_dill.py", line 414, in save                                                              
    StockPickler.save(self, obj, save_persistent_id)                                                                                                                                       
  File "/ws/ifp-53_2/hasegawa/haolong2/miniconda3/envs/csr4rsr/lib/python3.10/pickle.py", line 560, in save                                                                                
    f(self, obj)  # Call unbound method with explicit self                                                                                                                                 
  File "/ws/ifp-53_2/hasegawa/haolong2/miniconda3/envs/csr4rsr/lib/python3.10/pickle.py", line 932, in save_list                                                                           
    self._batch_appends(obj)                                                                                                                                                               
  File "/ws/ifp-53_2/hasegawa/haolong2/miniconda3/envs/csr4rsr/lib/python3.10/pickle.py", line 956, in _batch_appends                                                                      
    save(x)                                                                                                                                                                                
  File "/ws/ifp-53_2/hasegawa/haolong2/miniconda3/envs/csr4rsr/lib/python3.10/site-packages/dill/_dill.py", line 414, in save                                                              
    StockPickler.save(self, obj, save_persistent_id)                                                                                                                                       
  File "/ws/ifp-53_2/hasegawa/haolong2/miniconda3/envs/csr4rsr/lib/python3.10/pickle.py", line 603, in save                                                                                
    self.save_reduce(obj=obj, *rv)                                                                                                                                                         
  File "/ws/ifp-53_2/hasegawa/haolong2/miniconda3/envs/csr4rsr/lib/python3.10/pickle.py", line 717, in save_reduce                                                                         
    save(state)                                                                                                                                                                            
  File "/ws/ifp-53_2/hasegawa/haolong2/miniconda3/envs/csr4rsr/lib/python3.10/site-packages/dill/_dill.py", line 414, in save                                                              
    StockPickler.save(self, obj, save_persistent_id)                                                                                                                                       
  File "/ws/ifp-53_2/hasegawa/haolong2/miniconda3/envs/csr4rsr/lib/python3.10/pickle.py", line 560, in save                                                                                
    f(self, obj)  # Call unbound method with explicit self                                                                                                                                 
  File "/ws/ifp-53_2/hasegawa/haolong2/miniconda3/envs/csr4rsr/lib/python3.10/site-packages/dill/_dill.py", line 1217, in save_module_dict                                                 
    StockPickler.save_dict(pickler, obj)                                                                                                                                                   
  File "/ws/ifp-53_2/hasegawa/haolong2/miniconda3/envs/csr4rsr/lib/python3.10/pickle.py", line 972, in save_dict                                                                           
    self._batch_setitems(obj.items())                                                                                                                                                      
  File "/ws/ifp-53_2/hasegawa/haolong2/miniconda3/envs/csr4rsr/lib/python3.10/pickle.py", line 998, in _batch_setitems                                                                     
    save(v)                                                                                                                                                                                
  File "/ws/ifp-53_2/hasegawa/haolong2/miniconda3/envs/csr4rsr/lib/python3.10/site-packages/dill/_dill.py", line 414, in save                                                              
    StockPickler.save(self, obj, save_persistent_id)                                                                                                                                       
  File "/ws/ifp-53_2/hasegawa/haolong2/miniconda3/envs/csr4rsr/lib/python3.10/pickle.py", line 560, in save                                                                                
    f(self, obj)  # Call unbound method with explicit self                                                                                                                                 
  File "/ws/ifp-53_2/hasegawa/haolong2/miniconda3/envs/csr4rsr/lib/python3.10/pickle.py", line 932, in save_list                                                                           
    self._batch_appends(obj)                                                                                                                                                               
  File "/ws/ifp-53_2/hasegawa/haolong2/miniconda3/envs/csr4rsr/lib/python3.10/pickle.py", line 959, in _batch_appends                                                                      
    save(tmp[0])                                                                                                                                                                           
  File "/ws/ifp-53_2/hasegawa/haolong2/miniconda3/envs/csr4rsr/lib/python3.10/site-packages/dill/_dill.py", line 414, in save                                                              
    StockPickler.save(self, obj, save_persistent_id)                                                                                                                                                                                                                                                                                                                                   
  File "/ws/ifp-53_2/hasegawa/haolong2/miniconda3/envs/csr4rsr/lib/python3.10/pickle.py", line 603, in save                                                                                
    self.save_reduce(obj=obj, *rv)                                                                                                                                                         
  File "/ws/ifp-53_2/hasegawa/haolong2/miniconda3/envs/csr4rsr/lib/python3.10/pickle.py", line 717, in save_reduce                                                                                                                                                                                                                                                                     
    save(state)                                                                                                                                                                            
  File "/ws/ifp-53_2/hasegawa/haolong2/miniconda3/envs/csr4rsr/lib/python3.10/site-packages/dill/_dill.py", line 414, in save
    StockPickler.save(self, obj, save_persistent_id)                                                                                                                                                                                                                                                                                                                                   
  File "/ws/ifp-53_2/hasegawa/haolong2/miniconda3/envs/csr4rsr/lib/python3.10/pickle.py", line 560, in save
    f(self, obj)  # Call unbound method with explicit self                                                                                                                                 
  File "/ws/ifp-53_2/hasegawa/haolong2/miniconda3/envs/csr4rsr/lib/python3.10/site-packages/dill/_dill.py", line 1217, in save_module_dict
    StockPickler.save_dict(pickler, obj)                                                                                                                                                   
  File "/ws/ifp-53_2/hasegawa/haolong2/miniconda3/envs/csr4rsr/lib/python3.10/pickle.py", line 972, in save_dict
    self._batch_setitems(obj.items())                                                                                                                                                      
  File "/ws/ifp-53_2/hasegawa/haolong2/miniconda3/envs/csr4rsr/lib/python3.10/pickle.py", line 998, in _batch_setitems
    save(v)                                                                                                                                                                                
  File "/ws/ifp-53_2/hasegawa/haolong2/miniconda3/envs/csr4rsr/lib/python3.10/site-packages/dill/_dill.py", line 414, in save                                                                                                                                                                                                                                                          
    StockPickler.save(self, obj, save_persistent_id)                                                                                                                                       
  File "/ws/ifp-53_2/hasegawa/haolong2/miniconda3/envs/csr4rsr/lib/python3.10/pickle.py", line 578, in save                                                                                                                                                                                                                                                                            
    rv = reduce(self.proto)                             
TypeError: cannot pickle 'ThreadLocalFileContext' object 

1 Like

Hmm… @lhoestq map function or PyArrow issue…?

It looks like the ThreadLocalFileContext from filelock is not picklable, and therefore can’t be used with .map() with num_proc=...

Apparently thid can be fixed using thread_local=False, see the docs at filelock

Can you modify evaluate to pass thread_local=False to all FileLock objects and try again to see if it works ?

2 Likes

I am not sure if I do it right.

I modify the function get_from_cache in the file_utils located
…/miniconda3/envs/csr4rsr/lib/python3.10/site-packages/evaluate/utils/file_utils.py
from

with FileLock(lock_path): # Origin

to

with FileLock(lock_path, thread_local=False): # Modified

but the problem persist.

1 Like

By adding this code chunck before importing evaluating seems solved the problem.

from filelock import FileLock as OriginalFileLock

class PatchedFileLock(OriginalFileLock):
    def __init__(self, *args, **kwargs):
        kwargs["thread_local"] = False  # Force it every time
        super().__init__(*args, **kwargs)

import filelock
filelock.FileLock = PatchedFileLock

Thanks for the insight @lhoestq.
Would you mind telling where you find the clue for the error if it’s not too much trouble
In this way, I might be able to fix it the same way in the future.

2 Likes

Great ! Let me know if you think we should make this the default in datasets and evaluate, apparently this logic appears with python >= 3.11

Would you mind telling where you find the clue for the error if it’s not too much trouble
In this way, I might be able to fix it the same way in the future.

The dill error says “TypeError: cannot pickle ‘ThreadLocalFileContext’ object”, so it means that in the function you pass to map() there is an object that contains a ThreadLocalFileContext that is not supported by dill for multiprocessing.

I searched on google for ThreadLocalFileContext on github.com to look for packages that have such objects and figured it came from filelock which is a dependency of evaluate. Finally the filelock changelog they mention ThreadLocalFileContext as a recent addition for FileLock

2 Likes

Thanks for the explanation!

I think it would be great to set it as the default in my case, which is several metrics that need to be computed for a dataset. For me, I just want to avoid using multiple rounds of map. Or maybe there is a better way to do it that I haven’t figured out.

1 Like