Character level attention with Longformer for sequence classification

smillar · February 25, 2021, 8:18pm

Hey guys I am trying to figure out how to use the Longformer at the character level. It is mentioned in the paper also. I looked at the docs but I can’t find what I am looking for.

Can I just adjust my pre-processing so instead of tokenising:

“Hello, I like cake!”

the input to be tokenised is something like:

“H” “e” “l” “l” “o” “,” “I” “l” “i” “k” “e” “c” “a” “k” “e” “!”

and then the tokeniser will assign ids to every character?

Thanks.

Topic		Replies	Views
LongFormer tokenizer has the same token_type_ids for sequence pairs 🤗Tokenizers	0	714	December 20, 2021
How to do text classification on long sequence? Beginners	3	3248	May 14, 2023
Huggingface sequence classification unfreezing layers 🤗Transformers	2	1312	March 24, 2022
DataCollator for list of inputs? Intermediate	0	458	November 1, 2022
Using Longformer with full attention for comparison Beginners	3	1474	November 18, 2022

Character level attention with Longformer for sequence classification

Related topics