Bypassing tokenizers

thenoone · November 22, 2020, 11:08am

Hi everyone,
Is it possible to bypass the tokenizer and directly provide the input embeddings to train the BERT model?
Thanks!

BramVanroy · November 22, 2020, 4:51pm

You can just create a subclass of the model that you want and modify its forward pass.

sgugger · November 23, 2020, 1:49pm

You can also feed input_embeds instead of input_ids to your mdel.

Topic		Replies	Views
Pre - Train model with inputs_embeds 🤗Transformers	0	373	July 4, 2023
Do you have to use a model card's accompanying tokenizer? Beginners	1	307	November 4, 2022
Bert embeddings Beginners	0	357	January 6, 2022
Is it OK to get word embedding without adding special tokens? Beginners	3	1358	April 15, 2023
Should I use BertModel or BertModelForLM? Beginners	2	455	February 10, 2022