tokenizer = AutoTokenizer.from_pretrained(
base_model_id,
padding=True,
padding_side=“left”,
pad_token=“[PAD]”,
add_eos_token=True,
add_bos_token=True,
)
Set the padding token in the tokenizer
tokenizer.pad_token = tokenizer.eos_token
Now initialize the trainer
trainer = SetFitTrainer(
model=model,
train_dataset=tokenized_train_dataset,
eval_dataset=tokenized_val_dataset,
loss_class=CosineSimilarityLoss,
batch_size=16,
num_iterations=20, # The number of text pairs to generate for contrastive learning
column_mapping={“text”: “text”, “label”: “label”} # Map dataset columns to text/label expected by trainer
)
ValueError: Asking to pad but the tokenizer does not have a padding token. Please select a token to use as pad_token
(tokenizer.pad_token = tokenizer.eos_token e.g.)
or add a new pad token via tokenizer.add_special_tokens({'pad_token': '[PAD]'})
.