Up untill yesterday my NER model was working fine using the pipeline and the aggregation_strategy=“max”. Now, I get the error message: TypeError: Can’t convert [’ In’] to PyString which is the first word of my sentence.
I noticed that using “simple” it works fine, but for my system “max” was the best working one.
It seems to me that now the tokenizer is putting words in a list whereas before it didn’t.
Did something change in the use of the tokenizer or aggregation_stategy?
I can’t figure out why this is happening.
This is my model and code:
tokenizer_mbert_mul = AutoTokenizer.from_pretrained("StivenLancheros/roberta-base-biomedical-clinical-es-finetuned-ner-CRAFT_AugmentedTransfer_ES") model_mbert_mul = AutoModelForTokenClassification.from_pretrained("StivenLancheros/roberta-base-biomedical-clinical-es-finetuned-ner-CRAFT_AugmentedTransfer_ES") ner = pipeline('ner', aggregation_strategy="max", model=model_mbert_mul, tokenizer=tokenizer_mbert_mul)
This a sentence of my data:
"“In biology, a gene (from genos (Greek) meaning generation or birth or gender) is a basic unit of heredity and a sequence of nucleotides in DNA that encodes the synthesis of a gene product”
This is the error message:
/usr/local/lib/python3.7/dist-packages/transformers/tokenization_utils_fast.py in convert_tokens_to_string(self, tokens) 533 534 def convert_tokens_to_string(self, tokens: List[str]) -> str: --> 535 return self.backend_tokenizer.decoder.decode(tokens) 536 537 def _decode( TypeError: Can't convert [' In'] to PyString
Any help is appretiated.