Configure RobertaTokenizer

Hi, I am willing to configure RobertaTokenizer such that it outputs token_type_ids that it doesn’t by default. Is there a way to do that?

I have changed the model configuration and updated its type_vocab_size to 2, like so:

model = RobertaModel.from_pretrained('roberta-base')

# Update config to finetune token type embeddings
model.config.type_vocab_size = 2 

# Create a new Embeddings layer, with 2 possible segments IDs instead of 1
model.embeddings.token_type_embeddings = nn.Embedding(2, model.config.hidden_size)
                
# Initialize it
model.embeddings.token_type_embeddings.weight.data.normal_(mean=0.0, std=model.config.initializer_range)

I want to input token_type_ids to the model instance like so:

model(token_ids, attn_masks, token_type_ids)