Hey all, I have a fine-tuning question. I’m following this tutorial here to fine-tune a sentence transformer: Train and Fine-Tune Sentence Transformers Models
In the dataset preparation section, they’ve got this example code:
from sentence_transformers import InputExample
train_examples = []
train_data = dataset['train']['set']
# For agility we only 1/2 of our available data
n_examples = dataset['train'].num_rows // 2
for i in range(n_examples):
example = train_data[i]
train_examples.append(InputExample(texts=[example['query'], example['pos'][0], example['neg'][0]]))
The tutorial says:
You can obtain much better results by increasing the number of examples.
I’m wondering if this is in reference to the length of train_examples
or the length of texts
when initializing the InputExample
object.
My question is: Say I have 100 sentences that I’ve declared as similar, would it be better to have one InputExample
with len(texts) = 100
or 50 InputExamples
with len(texts)=2
?