Hi everyone,
I am having a hard time trying to understand some underlying mechanisms regarding the cache.
For some reason, the same operations are repeated when carried twice; I have checked that each Dataset
in my DatasetDict
has the right cache_files
, but it looks like they are not saved when I terminate the script. A few questions that might help me understand the issue:
- When are the files written in the cache?
- Is the cache configuration stateful across different runs? e.g., if I disable the cache in a script, is it still disabled in other scripts or in another run of the same script with that line commented out?
I already checked the documentation, but I didn’t find much in these regards.
Thanks a lot,
D.
Hi!
Every method with cache_file_name
as a parameter in the signature writes a cache file to disk, and the cache configuration is not stateful.
Regarding repeating a cached operation, this can happen if:
In both these scenarios, the solution is to specify a cache_file_name
to make the cache file permanent or avoid “non-deterministic” hashing.
Hello,
thanks for your reply.
The operation can be cached, the Hasher
returns a code for the function. The dataset has the cache_files
attributes correctly set. Debugging some more, I found out the function to return different hashes between runs.
This is the snippet causing the problem
map_params = {
"function": lambda x: {"x": self.transform_func(x["img"])},
"writer_batch_size": 100,
"num_proc": 1,
}
self.data[f"task_{self.task_ind}"] = self.data[f"task_{self.task_ind}"].map(
**map_params
)
and the transform_func
is the instantiation of the following
transform_func:
_target_: torchvision.transforms.Compose
transforms:
- _target_: torchvision.transforms.ToTensor
- _target_: torchvision.transforms.Normalize
mean: [0.485, 0.456, 0.406]
std: [0.229, 0.224, 0.225]
What may be happening? The runs have the same fixed seed.
EDIT: I can’t reproduce the problem in a minimal setting, if I just load twice the same function and get its hash it works fine, I am not sure where the problem may originate at this point
Found the issue:
there was an attribute of type set
in the class referred by self
in the above snippet, it caused the hash function of the whole module to be different across different runs, and therefore also that of self.transform_func
was different.
Thanks for your help.