I am running a RAG pipeline, with LlamaIndex and quantized LLama3-8B-Instruct. I just installed these libraries:
!pip install --upgrade huggingface_hub
!pip install --upgrade peft
!pip install llama-index bitsandbytes accelerate llama-index-llms-huggingface llama-index-embeddings-huggingface
!pip install --upgrade transformers
!pip install --upgrade sentence-transformers
Then I was looking to run the quantization pipeline like this:
import torch
from llama_index.llms.huggingface import HuggingFaceLLM
from transformers import BitsAndBytesConfig
quantization_config = BitsAndBytesConfig(
load_in_4bit=True,
bnb_4bit_compute_dtype=torch.float16,
bnb_4bit_quant_type="nf4",
bnb_4bit_use_double_quant=True,
)
However, I got this error returned to me: ModuleNotFoundError: No module named 'huggingface_hub.inference._types'
. Last time I worked with this pipeline two months ago, the code worked, so I think LlamaIndex has changed something; especially since when I clicked on the error, it referenced to: from huggingface_hub.inference._types import ConversationalOutput
, but ConversationalOutput
module doesn’t exist in HuggingFace docs.
So, what should I do to fix this error and be able to run this RAG pipeline?