Hello, I am trying to create a pipeline from a trained model. From what I understand I need to provide a tokenizer so that my new input will be tokenised. I guess, it should look like this;
from transformers import pipeline, AutoModel
model_name = "TestModel"
model = AutoModelForSequenceClassification.from_pretrained(model_name)
tokenizer = AutoTokenizer.from_pretrained(model_name)
classifier = pipeline('sentiment-analysis', model=model, tokenizer=tokenizer, return_all_scores=True)
My question is where do the other steps from th tokenisation process take place, like the padding and truncation. During training, my sequences where processed as follows;
train_encodings = tokenizer(seq_train, truncation=True, padding=True,
max_length=1024, return_tensors="pt")
Is that no longer needed?