Fine-tuning with load_in_8bit and inference without load_in_8bit possible?

Hi,

The LLM.8bit() algorithm as explained in the blog post is meant for inference.

However, bitsandbytes also provides functionalities to train models more efficiently, namely an 8-bit optimizer. See here for more info: GitHub - TimDettmers/bitsandbytes: 8-bit CUDA functions for PyTorch