Using quantized optimizer from bitsandbytes with transformers

hello - referring to

import bitsandbytes as bnb
from torch import nn
from transformers.trainer_pt_utils import get_parameter_names

training_args = TrainingArguments(per_device_train_batch_size=4, **default_args)

decay_parameters = get_parameter_names(model, [nn.LayerNorm])
decay_parameters = [name for name in decay_parameters if "bias" not in name]
optimizer_grouped_parameters = [
        "params": [p for n, p in model.named_parameters() if n in decay_parameters],
        "weight_decay": training_args.weight_decay,
        "params": [p for n, p in model.named_parameters() if n not in decay_parameters],
        "weight_decay": 0.0,

optimizer_kwargs = {
    "betas": (training_args.adam_beta1, training_args.adam_beta2),
    "eps": training_args.adam_epsilon,
optimizer_kwargs["lr"] = training_args.learning_rate
adam_bnb_optim = bnb.optim.Adam8bit(
    betas=(training_args.adam_beta1, training_args.adam_beta2),

is above code still needed when using latest version of transformers (4.31) from main? can’t we just do:


and do I as a user still need to do something (what? example would be nice) w.r.t. this note in the docs:

Note that in order to use the 8-bit optimizer with an existing pretrained model a change to the embedding layer is needed. Read this issue for more information