Hugging Face Forums
Enabling load_in_8bit makes inference much slower
š¤Transformers
chaochaoli
September 9, 2023, 7:48am
3
me tooļ¼and training is slower too
show post in topic
Related topics
Topic
Replies
Views
Activity
Mistral load_in_8bit slow inference
š¤Transformers
0
252
May 24, 2024
Should 8bit quantization make inference faster on GPU?
š¤Transformers
1
676
April 1, 2024
Some questions about GPT-J inference using int8
š¤Transformers
3
1425
January 24, 2023
Does load_in_8bit directly load the model in 8bit? (spoliler, do not seem like it)
Beginners
0
1494
July 11, 2023
Load_in_8bit vs. loading 8-bit quantized model
š¤Transformers
6
6968
May 13, 2024