I’m using bitsandbytes int8 and PEFT with transformers. Is there any advantage to using bf16 with int8 even though it gets cast to fp16 during quantization? Or should this be strictly avoided?
Related topics
Topic | Replies | Views | Activity | |
---|---|---|---|---|
The quantization code in the "Gentle Introduction to 8-bit Matrix Multiplication for transformers" blog post yields error | 1 | 728 | May 29, 2023 | |
Bfloat16 conversion results in significantly slower computation for various transformer models | 0 | 1432 | December 20, 2021 | |
Low bf16 performance on TPU, int4 vs int8 quantizatoin | 0 | 369 | June 1, 2024 | |
Some questions about GPT-J inference using int8 | 3 | 1428 | January 24, 2023 | |
Mixed precision for bfloat16-pretrained models | 2 | 12490 | April 21, 2021 |