Hi I have some questions about using pretrained bert.
Can I put a chunk of words into one input token? For example, split “hi my name is Linda and today i will~” as “hi my name is Linda” and “and today i will” and make each split as one embedding vector (i.e using average word2vec) and treat each split vector as one input token. Is it okay to apply it to the existing pre-trained models?
Actually i’m forced to use phrase wise token in my task so the models for long sequences are not the option.