I’m getting this error which is clear:
Token indices sequence length is longer than the specified maximum sequence length for this model (2215 > 2048)
But I don’t understand how come I’m getting it since I’m specifying a tokenizer:
pp = pipeline(task="text-generation", model="awesome_model", tokenizer="awesome_model")
results = [out for out in tqdm(pp(KeyDataset(self.ds['test'], "text")))]
Previously in my training I do the following:
tokenizer = AutoTokenizer.from_pretrained(self.pretrained_model_name, )
tokenizer.pad_token = tokenizer.eos_token
tokenized_ds = self.ds.map(
lambda x: tokenizer(x['text'], max_length=700, truncation=True),
batched=True,
)
How should I do the predictions on test?