Am I using evaluate correctly? eval_accuracy is super low

I’m training codellama using PEFT/SFT, split my dataset between train/test, and set my compute_metrics training param to the following. My train/test data is the same format, which is “an input sequence ->: an output sequence ”.

metric = evaluate.load("accuracy")
def compute_metrics(eval_pred):
    
    eval_acc = 0.0
    predictions = np.argmax(eval_pred[0], axis=-1).astype("int32")
    
    accs = []
    for i in range(len(eval_pred[0])):
        acc = metric.compute(predictions=predictions[i], references=eval_pred[1][i].astype("int32"))
        accs.append(acc["accuracy"])
    
    return {"accuracy": np.mean(np.array(accs))}

Basically, I am comparing the predictions against the inputs and averaging them. However, after sevearl epochs of training, eval accuracy is still 0.00102. Does that seem right? Thanks for your insight, all-