Fp16, bf16 in TrainingArgs vs BitsAndBytesConfig

Hello, I was going through this excellent article on perf tuning: Efficient Training on a Single GPU

  1. The first question I have w.r.t. TrainingArgs is that are fp16, bf16, tf32 mutually exclusive? i.e., you would only set one of them to be True?
  2. Second, what should be the order (best to worst). I understand bf16 is better than f16. but where does tf32 fall?
  3. Third, if I am using BitsAndBytesConfig, it also has a field bnb_compute_dtype. is it the same thing as above (meaning if I am setting the flags in TrainingArgs I don’t need to set bnb_compute_dtype and vice-versa) or different? and why?

Thanks.