Hi, I was using amazon datasets for trying to build a small language detector. but bumped up the numpy.str_ error during training phase. You can view my colab notebook here: Google Colab .
I was using review body field as text and language field as label. and dropped the other fields.
I found that the ‘language’ data field is type of datasets.Value, not the datasets.ClassLabel. I guessed this causing the numpy.str_ error during training.
Question: how do I convert datasets.Value to datasets.ClassLabel ? One way I can think of is doing str2int inside preprocess_function/tokenize method but curious that is there any existing conversion method to do that.