Dataloader time problem on custom dataset based on huggingface

mariosasko · June 13, 2022, 3:18pm

Hi! Each call to self.encoded_context[col_name][idx] brings the entire column data in memory first hence the bad performance (we plan to make this faster; see Add some iteration method on a dataset column (specific for inference) · Issue #4180 · huggingface/datasets · GitHub) . Instead you should use self.encoded_context[idx][col_name] to access the data.

Topic		Replies	Views
Fetching data takes too too much time 🤗Datasets	1	1292	June 13, 2022
HuggingFace dataset: each element in list of batch should be of equal size 🤗Datasets	3	10377	August 10, 2023
Hugging face datasets -- reading image shape takes very long time Beginners	1	281	April 4, 2023
Defining a custom dataset for fine-tuning translation Beginners	4	5083	July 10, 2021
Tensorflow Huggingface Datasets Equivalent to PyTorch 🤗Datasets	2	1044	June 27, 2022