How to interpret metrics for a Seq2Seq task?

yannbane · April 8, 2023, 10:27am

I’m fine-tuning distilgpt2 to translate English sentences into regex (a specific type I implemented).

I am unsure how to interpret accuracy in this scenario and how exactly to evaluate model performance. The accuracy usually goes from around 60% at step 50 to around 70% at step 700. Then, it slowly plateaus.

But how is it computed exactly, and what does it mean when doing Seq2Seq tasks such as this? Is it simply the proportion of correctly predicted labels for a given sequence? If so, is that even informative for this task?

If not, what options do I have to track performance?

When I run inference on examples I come up with myself, the model doesn’t seem very good. I’ve seen other regex generating models do far better than my own. Ultimately it’s most likely an issue with the small dataset.

Topic		Replies	Views
How to interpret fine-tuned model results and use model Beginners	0	554	March 18, 2021
Seq-2-Seq Predictions for Longer Sequences and Question for compute metrics function Beginners	0	457	December 16, 2021
Accuracy of MLM model 🤗Transformers	5	1550	July 13, 2021
How to measure accuracy while fine-tuning bert-base model? 🤗Transformers	1	1718	July 22, 2021
Evaluate model at saved checkpoint 🤗Transformers	0	1300	June 22, 2021

How to interpret metrics for a Seq2Seq task?

Related topics