Hello,
I am working on a Named Entity Recognition project. This is the data that I am working with is Named Entity Recognition (NER) Corpus | Kaggle.
When I try to map the tokenize_and_align_labels function, i get the following error: ArrowInvalid: Could not convert ‘[’ with type str: tried to convert to int64. I am pretty sure it has to do with all of the columns having a dtype of string.
That is okay for the sentence column, but for the two tag columns (POS & tag), they should be a list of strings (or maybe a sequence of strings).
How do I convert just those two columns to lists (or sequences) of strings?
Thanks,
Brian
P.S.- If you need any addition code to answer, let me know. This is my first post here!