Fine-tuning with load_in_8bit and inference without load_in_8bit possible?

nielsr · August 23, 2022, 12:08pm

Hi,

The LLM.8bit() algorithm as explained in the blog post is meant for inference.

However, bitsandbytes also provides functionalities to train models more efficiently, namely an 8-bit optimizer. See here for more info: GitHub - TimDettmers/bitsandbytes: 8-bit CUDA functions for PyTorch

Topic		Replies	Views
Unable to inference in 8bit mode: 'NoneType' object has no attribute 'device' 🤗Transformers	4	2287	December 14, 2023
Does load_in_8bit directly load the model in 8bit? (spoliler, do not seem like it) Beginners	0	1494	July 11, 2023
Enabling load_in_8bit makes inference much slower 🤗Transformers	3	1809	February 13, 2024
Unable to load 8bit model in Kaggle with dual GPU Beginners	5	1808	April 3, 2023
How can I load an LLM in 4-bits 🤗Transformers	0	485	August 2, 2023