Mixtral bad FP16 performance

I am currently testing Mixtral mistralai/Mixtral-8x7B-v0.1
If I use it in FP32 I get a wikitext perplexity of ~4 (only tested on 10% of the dataset) which matches the expectations of the model
If I enable FP16 in deepspeed I get a perplexity of 258 :scream: I cannot really find other reports of this performance drop. anyone else seen that or run FP16 successfully ?