Hi all,
One quick question on the size of roberta tokenizer and model.
I notice that the model_max_len of ‘roberta-base’ tokenizer is 512 while the max_position_embeddings of roberta-base model is set at 514. May I know the reason behind this.
I think the bos and eos token have already been added in the map(preprocess_function, batched=True).
If both are set as same value, a error mesage is received (IndexError: index out of range in self)
Thanks.