NLP eval metrics

I want to calculate the metrics like Bleu meteor, rouge L, spice,cider. Both the means and variance in the test set as well. How could I achieve this.