Can you check ds.cache_files
? Since you loaded the dataset from memory using .from_pandas
, then the dataset has no associated cache directory to save intermediate results.
To fix this you can specify cache_file_name
in .map()
, this way it will write the results on your disk instead of using memory