Hi,
I tried to use Trainer.train() to train my own model. I set bf16 : bool = True
like this:
@dataclass
class MyTrainingArgs(TrainingArguments):
# fsdp : str = "full_shard auto_wrap" #TODO
bf16 : bool = True
bf16_full_eval : bool =True
...
half_precision_backendis default, so it is auto.
But when I was debuging, in the forward phase, I printed the middle output stochatically, they are f32 instead of bf16, such as:
class MLP(nn.Module):
def __init__(self, config):
super().__init__()
self.c_fc = nn.Linear(config.n_embd, 4 * config.n_embd, bias=config.bias)
self.gelu = nn.GELU()
self.c_proj = nn.Linear(4 * config.n_embd, config.n_embd, bias=config.bias)
self.dropout = nn.Dropout(config.dropout)
def forward(self, x):
x = self.c_fc(x)
print(x.dtype())
x = self.gelu(x)
x = self.c_proj(x)
x = self.dropout(x)
return x
I checked the trainer, I think the amp context is related to this part:
... ## this code is from transformers/trainer.py
with cp_context():
model.train()
if hasattr(self.optimizer, "train") and callable(self.optimizer.train):
self.optimizer.train()
inputs = self._prepare_inputs(inputs)
if is_sagemaker_mp_enabled():
loss_mb = smp_forward_backward(model, inputs, self.args.gradient_accumulation_steps)
return loss_mb.reduce_mean().detach().to(self.args.device)
with self.compute_loss_context_manager():
loss = self.compute_loss(model, inputs, num_items_in_batch=num_items_in_batch)
def compute_loss_context_manager(self):
"""
A helper wrapper to group together context managers.
"""
ctx_stack = contextlib.ExitStack()
autocast_ctx = self.autocast_smart_context_manager()
if not isinstance(autocast_ctx, contextlib.nullcontext):
ctx_stack.enter_context(autocast_ctx)
return ctx_stack
And I think compute_loss_context_manager is not related to use cuda amp, so my question is that based on the printed results I think I didn’t use bf16 successfully. How Transformers uses cuda amp?
My Transformers version is ‘4.56.2’.My GPU support ‘bf16’”
>>>torch.cuda.is_bf16_supported()
True
Thank you !