Can we use tokenizer from one architecture and model from another one?

sps · September 29, 2021, 7:37am

I’ve a Bert tokenizer, which is pre-trained on some dataset. Now I want to fine tune some task in-hand with a Roberta model. So in this scenario

Can I use Bert tokenizer output as input to Roberta Model?
Does such kind of setup makes sense between autoregressive and non-autoregressive models, i.e., using Bert tokenizer with XLNet model?
Does these kind of setups make sense?

From what I understand, this can be implemented, but doesn’t make sense. But I can use some experience or clarification in this direction.

rgwatwormhill · September 29, 2021, 8:03pm

hi sps.

I think it would be possible to use a Bert tokenizer with a Roberta Model, but you would have to train the Roberta model from scratch. You wouldn’t be able to take advantage of transfer learning by using a pre-trained Roberta.

Why would you want to do that?

You might run into problems with things like the sep and cls tokens, which might have different conventions between Bert and Roberta, though I expect you could write some code to deal with that.

A tokenizer splits your text up into chunks, and replaces each chunk with a numerical value. I think Bert and Roberta do this in different ways, but that shouldn’t make the systems incompatible. Any embedding layer should be able to learn to use the numbers that come out of WordPiece, BytePair or SentencePiece tokenizers.

Have you seen this intro to tokenizers [Summary of the tokenizers — transformers 4.11.1 documentation]

sps · September 30, 2021, 4:11am

Yes, I agree with you. This type of setup largely doesn’t make sense.

Actually, I wanted to use some autoregressive model such as XLNet, but for my specific data I don’t have XLNet model. So I don’t have an appropriate tokenizer. I was having this wild thought, if I can use some pre-existing tokenizer(which trained on similar data to that of mine) and input it to XLNet model. Argument being(as you mentioned) - anyhow layers would learn some thing, in the worst case they will learn from scratch similar to finetune setting.

But yeah, I also think this can’t/shouldn’t be done as we wouldn’t know how worse the scenario has become.

Topic		Replies	Views
Tokenizer vs Model 🤗Tokenizers	0	249	June 24, 2024
Fine tune a saved model with custom tokenizer 🤗Transformers	3	2960	December 15, 2020
Tokenizer decoding using BERT, RoBERTa, XLNet, GPT2 Beginners	7	8415	September 21, 2020
Domain adaptation of Language Model and Tokenizer Beginners	8	2852	June 17, 2024
Questions about the connection between tokenizer and the model Beginners	0	308	September 19, 2023

Can we use tokenizer from one architecture and model from another one?

Related topics