Using a paid inference end point to query llamaindex knowledge graph gives worse results than the free inference api

Hi

I have successfully followed this: https://medium.aiplanet.com/implement-rag-with-knowledge-graph-and-llama-index-6a3370e93cdd and used it to read a 2 page pdf. When I submit queries, the results are very good (well, as good as I need).

To speed things up to use a larger document I have created an inference endpoint based on HuggingFaceH4/zephyr-7b-beta and access it via

llm = HuggingFaceInferenceAPI(
    model_name="https://my_endpoint_ref.aws.endpoints.huggingface.cloud", token=hf_token
)

If I now provide a 20 page pdf, submitting the same query now gives me poor results, often resulting in responses which inform that there is no relevant data in the document even though the 2 pages that I used originally are contained in the larger document.

I have tried using locally hosted LLMs but as you can imagine they are too slow on my machine.

Can anyone give me a clue as to why using the same model, but on a paid endpoint should give worse results?
Thanks.

Hi ,
I am also trying to follow the same medium article and i am facing this error "ModuleNotFoundError: No module named ‘HuggingFaceInferenceAPI’ " , how did u solve this ?

Thanks

Hi

A recent update to llama index requires you to call it like this:

pip install llama-index-llms-huggingface
from llama_index.llms.huggingface import HuggingFaceInferenceAPI
2 Likes