Datasets Viewer: Searching for text

It seems that the dataset Viewer search function returns no hits if one searches for terms such as “what”, “can”, “which”, and so on. Has the indexing function removed stopwords like this? The rows are returned if one uses the SQL console, but the returned rows in the SQL console don’t give access to the column with audio, for a dataset that includes audio files.
Is there a way to search for stop words like this in the default datasets Viewer?
It would be really useful if all of the textual content in a column could be searchable.

1 Like

The Dataset Viewer is built into HF, so we users can’t do anything about it directly, but it’s being developed openly on github, so if you raise an issue, it might get passed.

We rely on the duckdb FTS extention.

We create the index here:

As you can see, we don’t specify the stopwords parameter, which defaults to a pre-defined list of 571 English stopwords (which might be in contradiction with the language used in the stemmer param, btw)

cc @lhoestq @asoria for visibility.

1 Like