Hi,
The LLM.8bit() algorithm as explained in the blog post is meant for inference.
However, bitsandbytes also provides functionalities to train models more efficiently, namely an 8-bit optimizer. See here for more info: GitHub - TimDettmers/bitsandbytes: 8-bit CUDA functions for PyTorch