Using quantized optimizer from bitsandbytes with transformers

hello - referring to https://huggingface.co/docs/transformers/main/en/perf_train_gpu_one:

import bitsandbytes as bnb
from torch import nn
from transformers.trainer_pt_utils import get_parameter_names

training_args = TrainingArguments(per_device_train_batch_size=4, **default_args)

decay_parameters = get_parameter_names(model, [nn.LayerNorm])
decay_parameters = [name for name in decay_parameters if "bias" not in name]
optimizer_grouped_parameters = [
    {
        "params": [p for n, p in model.named_parameters() if n in decay_parameters],
        "weight_decay": training_args.weight_decay,
    },
    {
        "params": [p for n, p in model.named_parameters() if n not in decay_parameters],
        "weight_decay": 0.0,
    },
]

optimizer_kwargs = {
    "betas": (training_args.adam_beta1, training_args.adam_beta2),
    "eps": training_args.adam_epsilon,
}
optimizer_kwargs["lr"] = training_args.learning_rate
adam_bnb_optim = bnb.optim.Adam8bit(
    optimizer_grouped_parameters,
    betas=(training_args.adam_beta1, training_args.adam_beta2),
    eps=training_args.adam_epsilon,
    lr=training_args.learning_rate,
)

is above code still needed when using latest version of transformers (4.31) from main? can’t we just do:

args=transformers.TrainingArguments(
     ...
        optim='paged_adamw_8bit',
    )

and do I as a user still need to do something (what? example would be nice) w.r.t. this note in the docs:

Note that in order to use the 8-bit optimizer with an existing pretrained model a change to the embedding layer is needed. Read this issue for more information