Token indices sequence length is longer (Python)


I am trying to extract GPT2 pretrained vectors for text of arbitrary length. So I tried to set the n_positions argument in the config of the model to a higher value than the default. But I’m still gettting warnings as if I haven’t set it at all. Any idea what I’m doing wrong?

My code:

from transformers import GPT2Tokenizer, GPT2Model, GPT2Config
import torch

max_len = 10000

config = GPT2Config(n_positions=max_len,)
tokenizer = GPT2Tokenizer.from_pretrained(‘gpt2’,ignore_mismatched_sizes=True,config=config)
model = GPT2Model.from_pretrained(‘gpt2’,ignore_mismatched_sizes=True,config=config)
text= " ".join([“a”]*2000)

encoded_input = tokenizer(text, return_tensors=‘pt’)
output = model(**encoded_input)