Evaluating creative NLG

I am looking for a way of evaluating the performance of a models creative writing ability based on the prompt given. I am aware of ROUGE and BLEU but they don’t really encapsulate the performance of the model accurately.

If anyone has any inputs or links would be much appreciated.

