What would be optimal metrics for Code Generation?

I am trying StableCode Instruct Alpha 3B Model with Codechef Submissions Dataset and want to figure out the correct metrics for code generation and usage for the same. Should I use perplexity for Code Generation or ROUGE/METEOR or CodeBLEU ?