Dataset select function: retrieving the examples not selected

Hi,

Is there a good way of retrieving the examples that were filtered out when using the DatasetDict.filter() function ?

For now, I’m calling filter() on a DatasetDict that way:

datasets = datasets.filter(lambda example: not example['label_kept'] in labels_to_remove)

For now I compute the list before for each split, but I was wondering if there’s a better way to do that. I need the id of these removed examples to compare with the original dev/test file at the end.

Thanks

1 Like