Treating Punctuatio restoration as Seq2Seq task

Punctuation restoration has been tried to solve by treating it as a multi-class classification problem. i.e., fill in the punctuation between words {O, “,”, “!”, “:”}.

I tried seq2seq approach on for that and the trained BART model seems really good at it. To compare it to previous methods, I need to frame the results as a classification format like this

What could be best way to get Precision, Recall and F1 scores for seq2seq task?

  • I tried to use tokenizer to separate the punctuations but it usually combines the punctuations and use another id.