Hello, I was going through this excellent article on perf tuning: Efficient Training on a Single GPU
- The first question I have w.r.t. TrainingArgs is that are fp16, bf16, tf32 mutually exclusive? i.e., you would only set one of them to be True?
- Second, what should be the order (best to worst). I understand bf16 is better than f16. but where does tf32 fall?
- Third, if I am using BitsAndBytesConfig, it also has a field bnb_compute_dtype. is it the same thing as above (meaning if I am setting the flags in TrainingArgs I don’t need to set bnb_compute_dtype and vice-versa) or different? and why?