Token indices sequence length is longer (Python)

Hello,

I am trying to extract GPT2 pretrained vectors for text of arbitrary length. So I tried to set the n_positions argument in the config of the model to a higher value than the default. But I’m still gettting warnings as if I haven’t set it at all. Any idea what I’m doing wrong?

My code:

from transformers import GPT2Tokenizer, GPT2Model, GPT2Config
import torch

max_len = 10000

config = GPT2Config(n_positions=max_len,)
tokenizer = GPT2Tokenizer.from_pretrained(ā€˜gpt2’,ignore_mismatched_sizes=True,config=config)
model = GPT2Model.from_pretrained(ā€˜gpt2’,ignore_mismatched_sizes=True,config=config)
text= " ".join([ā€œaā€]*2000)

encoded_input = tokenizer(text, return_tensors=ā€˜pt’)
output = model(**encoded_input)