Dataloader time problem on custom dataset based on huggingface

Hi! Each call to self.encoded_context[col_name][idx] brings the entire column data in memory first hence the bad performance (we plan to make this faster; see Add some iteration method on a dataset column (specific for inference) · Issue #4180 · huggingface/datasets · GitHub) . Instead you should use self.encoded_context[idx][col_name] to access the data.

2 Likes