Hi! Each call to self.encoded_context[col_name][idx]
brings the entire column data in memory first hence the bad performance (we plan to make this faster; see Add some iteration method on a dataset column (specific for inference) · Issue #4180 · huggingface/datasets · GitHub) . Instead you should use self.encoded_context[idx][col_name]
to access the data.
2 Likes