How do I know which models will produce token_type_ids and which wont?

So this section: Processing the data - Hugging Face Course.

Mentions that certain checkpoints dont have token_types_ids.

Is there certain features of a model that indicate whether or not it supports token_type_ids? If so, what are those qualities?

Thanks :hugs:

Hi Edward!

The token_type_ids are returned if the model has seen them in pre-training and knows what to do with them. So it all depends how the model was pre-trained.

But as the course also mentions, you usually don’t have to worry about the token_type_ids - as long as you use the same checkpoint for the tokenizer and the model, everything will be fine as the tokenizer knows what to provide to its model.

Hope that helps,

Cheers
Heiko

1 Like