Bypassing tokenizers

Hi everyone,
Is it possible to bypass the tokenizer and directly provide the input embeddings to train the BERT model?
Thanks!

You can just create a subclass of the model that you want and modify its forward pass.

You can also feed input_embeds instead of input_ids to your mdel.

1 Like