How do I know which models will produce token_type_ids and which wont?

So this section: Processing the data - Hugging Face Course.

Mentions that certain checkpoints dont have token_types_ids.

Is there certain features of a model that indicate whether or not it supports token_type_ids? If so, what are those qualities?

Thanks :hugs:

Hi Edward!

The token_type_ids are returned if the model has seen them in pre-training and knows what to do with them. So it all depends how the model was pre-trained.

But as the course also mentions, you usually don’t have to worry about the token_type_ids - as long as you use the same checkpoint for the tokenizer and the model, everything will be fine as the tokenizer knows what to provide to its model.

Hope that helps,


1 Like