ROUGE score problem

ROUGE score metric is not working for non-English(Arabic) language:

!pip install rouge_score
from datasets import load_metric
metric= load_metric("rouge")

pred_str =['السلام عليكم كيف حالك']
label_str=['السلام عليكم صديقي كيف حالك']
metric.add_batch(predictions=pred_str, references=label_str)
metric.compute()

output

{‘rouge1’: AggregateScore(low=Score(precision=0.0, recall=0.0, fmeasure=0.0), mid=Score(precision=0.0, recall=0.0, fmeasure=0.0), high=Score(precision=0.0, recall=0.0, fmeasure=0.0)),
‘rouge2’: AggregateScore(low=Score(precision=0.0, recall=0.0, fmeasure=0.0), mid=Score(precision=0.0, recall=0.0, fmeasure=0.0), high=Score(precision=0.0, recall=0.0, fmeasure=0.0)),
‘rougeL’: AggregateScore(low=Score(precision=0.0, recall=0.0, fmeasure=0.0), mid=Score(precision=0.0, recall=0.0, fmeasure=0.0), high=Score(precision=0.0, recall=0.0, fmeasure=0.0)),
‘rougeLsum’: AggregateScore(low=Score(precision=0.0, recall=0.0, fmeasure=0.0), mid=Score(precision=0.0, recall=0.0, fmeasure=0.0), high=Score(precision=0.0, recall=0.0, fmeasure=0.0))}

This problem occurs because the rouge score tokenizer eliminates all non-English characters, we can change for accepting “Arabic, Kurdish, Farsi” by replacing ‘a-z0-9’ --to–> ’ a-z0-9\u0600-\u06ff\u0750-\u077f\ufb50-\ufbc1\ufbd3-\ufd3f\ufd50-\ufd8f\ufd50-\ufd8f\ufe70-\ufefc\uFDF0-\uFDFD.0-9’ .
You can simply run this code before calling the ROUGE metric.

#read the rouge score tokenize file 
fin = open("/opt/conda/lib/python3.7/site-packages/rouge_score/tokenize.py", "rt")
#read file contents to string
data = fin.read()
#replace all occurrences of the required string
data = data.replace('a-z0-9', 'a-z0-9\\u0600-\\u06ff\\u0750-\\u077f\\ufb50-\\ufbc1\\ufbd3-\\ufd3f\\ufd50-\\ufd8f\\ufd50-\\ufd8f\\ufe70-\\ufefc\\uFDF0-\\uFDFD.0-9')
#close the input file
fin.close()
#open the input file in write mode
fin = open("/opt/conda/lib/python3.7/site-packages/rouge_score/tokenize.py", "wt")
#overrite the input file with the resulting data
fin.write(data)
#close the file
fin.close()

for the rouge_score tokenizer path, you can find it by this:

from rouge_score import tokenize
tokenize
#output: the tokenize path

Hi, I tried your approach, but it still doesn’t work with my dataset in Persian language. I get all zeroes with my rouge scores. And now I think that it is a bug in hugging face rouge metric, because I got the correct rouge scores by using rouge library in python directly. Hope that hugging face fixes it soon :slight_smile: