ModuleNotFoundError: No module named 'huggingface_hub.inference._types'

I am running a RAG pipeline, with LlamaIndex and quantized LLama3-8B-Instruct. I just installed these libraries:

!pip install --upgrade huggingface_hub 
!pip install --upgrade peft
!pip install llama-index bitsandbytes accelerate llama-index-llms-huggingface llama-index-embeddings-huggingface
!pip install --upgrade transformers
!pip install --upgrade sentence-transformers

Then I was looking to run the quantization pipeline like this:

import torch
from llama_index.llms.huggingface import HuggingFaceLLM
from transformers import BitsAndBytesConfig

quantization_config = BitsAndBytesConfig(
    load_in_4bit=True,
    bnb_4bit_compute_dtype=torch.float16,
    bnb_4bit_quant_type="nf4",
    bnb_4bit_use_double_quant=True,
)

However, I got this error returned to me: ModuleNotFoundError: No module named 'huggingface_hub.inference._types'. Last time I worked with this pipeline two months ago, the code worked, so I think LlamaIndex has changed something; especially since when I clicked on the error, it referenced to: from huggingface_hub.inference._types import ConversationalOutput, but ConversationalOutput module doesn’t exist in HuggingFace docs.

So, what should I do to fix this error and be able to run this RAG pipeline?

1 Like