OS: Ubuntu 20 LTS
When I used HuggingFace dataset.map() to process big datasets, its speed degraded very fast and my disk was filled up, then the process crashed. I tried to delete ~/.cache/huggingface
, but only reclaimed a small fraction of my disk space (3GB). I searched the internet but could not find any relevant answer.
I had used map()
function to process a image dataset.
processed_dataset = dataset.map(
function=image_feature_extraction_text_tokenization,
batched=True,
fn_kwargs={"max_target_length": 256},
batch_size=1024,
num_proc=4,
)
Now the computer is unusable, but I have to use it later for other jobs, so I appreciate if you can help me free up spaces.