Get all unique labels values in a sorted manner

Hello, I am new to Huggingface Datasets packages, currently I’m getting myself familiar with load datasets from Huggingface and I face difficulties in getting unique labels values. Usually calling dataset[‘label’] will print out the whole column in the table, but what if I want to get a list of distinct labels values. I did try using dataset.unique() to get them but this feels unorthodox as it only return an unsorted list like [2, 3, 4 ,0, 1]. Is calling sort() on the list is the way to do it or the package have a method explicitly for this purpose that I’m not aware of?

1 Like

That would be the easiest, use sorted.

2 Likes

I am also exploring Huggingface Datasets and faced the same challenge. dataset. unique does give an unsorted list, but still calling sort() on the result is a good approach to organizing your labels. This becomes much more complex when looking for multilingual datasets. I am currently working on one of my client’s projects where they are offering web translation services, so I have to keep in mind that unique labels are correct, especially when managing content for different languages. I need someone who can guide me in this process by having a look at my client’s business.