I’m training codellama using PEFT/SFT, split my dataset between train/test, and set my compute_metrics training param to the following. My train/test data is the same format, which is “an input sequence ->: an output sequence ”.

```
metric = evaluate.load("accuracy")
def compute_metrics(eval_pred):
eval_acc = 0.0
predictions = np.argmax(eval_pred[0], axis=-1).astype("int32")
accs = []
for i in range(len(eval_pred[0])):
acc = metric.compute(predictions=predictions[i], references=eval_pred[1][i].astype("int32"))
accs.append(acc["accuracy"])
return {"accuracy": np.mean(np.array(accs))}
```

Basically, I am comparing the predictions against the inputs and averaging them. However, after sevearl epochs of training, eval accuracy is still 0.00102. Does that seem right? Thanks for your insight, all-