Hey,
I’m trying to fine tune gpt2 (small) on a custom dataset.
The idea is I will pass in objects and gpt2 should create a story out of the objects.
Example:
prompt:
house, cat, table
completion:
Once upon a time, there was a cozy house on the edge of a small town. In that house lived a mischievous cat who loved to jump onto the table and knock things over.
My Code Snippet:
for i in range(epochs):
for X, y, a in chatData:
optim.zero_grad()
loss = model(X, attention_mask=a, labels=y).loss #this is where I struggle.
loss.backward()
optim.step()
Variables:
X is the prompt e.g. “[BOS] house, cat, table [STORY START]”
y is the completion e.g. “the story about the objects. [EOS]”*
Of course I tokenized X and y and also added truncation and padding tokens (if necessary).
But is this the right approach? Because as more I search in the internet I find that people are putting the input = labels. They would probably do something like this (but I’m not sure):
X = [BOS] house, cat, table [STORY START] the story about the objects. [EOS]
loss = model(X, attention_mask=a, labels=X).loss
But is this correct for my specific use case as well?
I’m honestly a little bit confused right now:D