Is there a trained language model for classifying text with special characters?

Omri · June 16, 2023, 1:59pm

I am looking to classify (binary) text and of course I prefer to use some transformers model that has already been trained on English and fine tuning it on my data.
But in my classification task it makes a big difference if there are special characters or not. For example, for the text “i like dogs” the label is 0 and for the text “i like “dogs”” the label is 1.
The problem is that every trained language model that I have already found ignores in one way or another special characters in its tokenizer (either they are cleaned or they are classified as unk, etc.).
Do you know a suitable model?

Topic		Replies	Views
Special tokens and inference Intermediate	0	333	November 16, 2020
Questions on model's tokens 🤗Tokenizers	0	601	March 24, 2021
How much cleaning for transformers? Beginners	2	7851	August 27, 2020
How to add special tokens to a pretrained model? Beginners	0	387	June 18, 2021
Multi-label token classification: "-100" special label 🤗Transformers	1	506	September 18, 2023

Is there a trained language model for classifying text with special characters?

Related topics