Error with aggregation_strategy="max", TypeError: Can't convert [' In'] to PyString

StivenLancheros · April 1, 2022, 10:27am

Good day,

Up untill yesterday my NER model was working fine using the pipeline and the aggregation_strategy=“max”. Now, I get the error message: TypeError: Can’t convert [’ In’] to PyString which is the first word of my sentence.
I noticed that using “simple” it works fine, but for my system “max” was the best working one.
It seems to me that now the tokenizer is putting words in a list whereas before it didn’t.
Did something change in the use of the tokenizer or aggregation_stategy?
I can’t figure out why this is happening.
This is my model and code:

tokenizer_mbert_mul = AutoTokenizer.from_pretrained("StivenLancheros/roberta-base-biomedical-clinical-es-finetuned-ner-CRAFT_AugmentedTransfer_ES")

model_mbert_mul = AutoModelForTokenClassification.from_pretrained("StivenLancheros/roberta-base-biomedical-clinical-es-finetuned-ner-CRAFT_AugmentedTransfer_ES")
ner = pipeline('ner', aggregation_strategy="max", model=model_mbert_mul, tokenizer=tokenizer_mbert_mul)

This a sentence of my data:

"“In biology, a gene (from genos (Greek) meaning generation or birth or gender) is a basic unit of heredity and a sequence of nucleotides in DNA that encodes the synthesis of a gene product”

This is the error message:

/usr/local/lib/python3.7/dist-packages/transformers/tokenization_utils_fast.py in convert_tokens_to_string(self, tokens)
    533 
    534     def convert_tokens_to_string(self, tokens: List[str]) -> str:
--> 535         return self.backend_tokenizer.decoder.decode(tokens)
    536 
    537     def _decode(

TypeError: Can't convert [' In'] to PyString

Any help is appretiated.

Topic		Replies	Views
Text Classification tokenizer problems on inference Intermediate	4	2274	October 12, 2022
Tokenizer from scratch Error TypeError: Can't convert None to PyString Beginners	1	1082	December 26, 2022
NER tag , aggregation stratergy 🤗Tokenizers	2	7181	February 1, 2022
[Solved] TypeError: Object of type int64 is not JSON serializable Beginners	1	9681	August 28, 2024
Converting Input String to List (or Sequence) of Strings Beginners	1	2213	August 25, 2023

Error with aggregation_strategy="max", TypeError: Can't convert [' In'] to PyString

Related topics