I am trying to use the code built-in version 2.5.1 with the following code:
Step1: Add special tokens and update the model
model -->GPT2DoubleHeadsModel
tokenizer–> GPT2Tokenizer
ATTR_TO_SPECIAL_TOKEN = {'bos_token': '<bos>', 'eos_token': '<eos>', 'pad_token': '<pad>',
'additional_special_tokens': ['<speaker1>', '<speaker2>']}
orig_num_tokens = len(tokenizer.encoder)
num_added_tokens = tokenizer.add_special_tokens(ATTR_TO_SPECIAL_TOKEN)
if num_added_tokens > 0:
model.resize_token_embeddings(new_num_tokens=orig_num_tokens + num_added_tokens)
And Training using the following code:
(lm_loss), (mc_loss), *_ = model(
input_ids, token_type_ids=token_type_ids, mc_token_ids=mc_token_ids,
mc_labels=mc_labels, lm_labels=lm_labels
)
Now my question is:
- When I look at the current documentation the parameters lm_labels seems to be changed to labels,
Also the way GPT2 model returns loss is different. - Is the API for adding new tokens still valid for 3.5.1 ?
How do I find these?