Hi everyone,
I am having a hard time trying to understand some underlying mechanisms regarding the cache.
For some reason, the same operations are repeated when carried twice; I have checked that each Dataset in my DatasetDict has the right cache_files, but it looks like they are not saved when I terminate the script. A few questions that might help me understand the issue:
- When are the files written in the cache?
 
- Is the cache configuration stateful across different runs? e.g., if I disable the cache in a script, is it still disabled in other scripts or in another run of the same script with that line commented out?
 
I already checked the documentation, but I didn’t find much in these regards.
Thanks a lot,
D.
             
            
              
              
              
            
            
           
          
            
            
              Hi!
Every method with cache_file_name as a parameter in the signature writes a cache file to disk, and the cache configuration is not stateful.
Regarding repeating a cached operation, this can happen if:
In both these scenarios, the solution is to specify a cache_file_name to make the cache file permanent or avoid “non-deterministic” hashing.
             
            
              
              
              
            
            
           
          
            
            
              Hello,
thanks for your reply.
The operation can be cached, the Hasher returns a code for the function. The dataset has the cache_files attributes correctly set. Debugging some more, I found out the function to return different hashes between runs.
This is the snippet causing the problem
  map_params = {
      "function": lambda x: {"x": self.transform_func(x["img"])},
      "writer_batch_size": 100,
      "num_proc": 1,
  }
  self.data[f"task_{self.task_ind}"] = self.data[f"task_{self.task_ind}"].map(
      **map_params
  )
and the transform_func is the instantiation of the following
transform_func:
  _target_: torchvision.transforms.Compose
  transforms:
    - _target_: torchvision.transforms.ToTensor
    - _target_: torchvision.transforms.Normalize
      mean: [0.485, 0.456, 0.406]
      std: [0.229, 0.224, 0.225]
What may be happening? The runs have the same fixed seed.
EDIT: I can’t reproduce the problem in a minimal setting, if I just load twice the same function and get its hash it works fine, I am not sure where the problem may originate at this point
             
            
              
              
              
            
            
           
          
            
            
              Found the issue:
there was an attribute of type set in the class referred by self in the above snippet, it caused the hash function of the whole module to be different across different runs, and therefore also that of self.transform_func was different.
Thanks for your help.