Datasets map issues

Hi,

I am applying a map function of my datasets. ex:
df, _ = df.map(preprocessor, batched=True, num_proc=self.num_cores)

When there are empty values for a particular column and all empty values fall into a single batch while applying the preprocessor using map in batch mode and num_proc, it identifies the feature type as null for the column in that particular batch and fails with ValueError: Features must match for all datasets

Is there way to force the feature type or ignoring feature type as a parameter to ‘map’ so that it won’t check it during concatenation? @lhoestq

Hi ! Yes you can specify the features yourself by passing features=Features({...}) to map :wink:

Will it apply the features to all batches irrespective of the actual types?

Yes.

Note that features are nullable - so you can specify a Value("string") type for a column containing strings and null values.