Training Arguments to do pure bf16 training?

chiragjn · December 20, 2023, 3:42pm

I am trying to finetune some large language models and I was wondering if there is a configuration to do pure bf16 training i.e. parameters, optimizer, gradients all in bf16?

I already know about --bf16 True which uses torch AMP, but I don’t want to use mixed precision at all if possible.
I know there are certain risks involved with stability but getting rid of mixed precision will help reduce memory footprint and everything in bf16 will keep things fast.

I am currently exploring how Lightning does this with their bf16-true mode to replicate it with trainer.

Topic		Replies	Views
Using Quantization with fp16/bf16 Trainer flag 🤗Transformers	0	711	February 14, 2024
Can I use fp16 model for mixed precision training? 🤗Transformers	0	296	January 16, 2024
bf16=True in TrainingArgument Beginners	0	1050	July 2, 2023
Can we use mixed precision with all? (fp16 + fp32 + bf16) 🤗Transformers	0	273	December 1, 2022
Mixed precision for bfloat16-pretrained models 🤗Transformers	2	12409	April 21, 2021

Training Arguments to do pure bf16 training?

Related topics