I am using rouge1 as an evaluation metric for my generation model. The mid-score that I get looks like this:
mid=Score(precision=0.2203567834437479, recall=0.28703975940955084, fmeasure=0.21778941370227095)
I can see that the fmeasure is not really the harmonic mean of precision and recall as it should be? Is there any reason for this behaviour?