Using a paid inference end point to query llamaindex knowledge graph gives worse results than the free inference api

tomhgmail · February 28, 2024, 10:54am

Hi

I have successfully followed this: https://medium.aiplanet.com/implement-rag-with-knowledge-graph-and-llama-index-6a3370e93cdd and used it to read a 2 page pdf. When I submit queries, the results are very good (well, as good as I need).

To speed things up to use a larger document I have created an inference endpoint based on HuggingFaceH4/zephyr-7b-beta and access it via

llm = HuggingFaceInferenceAPI(
    model_name="https://my_endpoint_ref.aws.endpoints.huggingface.cloud", token=hf_token
)

If I now provide a 20 page pdf, submitting the same query now gives me poor results, often resulting in responses which inform that there is no relevant data in the document even though the 2 pages that I used originally are contained in the larger document.

I have tried using locally hosted LLMs but as you can imagine they are too slow on my machine.

Can anyone give me a clue as to why using the same model, but on a paid endpoint should give worse results?
Thanks.

nikhita2821 · March 8, 2024, 3:05am

Hi ,
I am also trying to follow the same medium article and i am facing this error "ModuleNotFoundError: No module named ‘HuggingFaceInferenceAPI’ " , how did u solve this ?

Thanks

tomhgmail · March 8, 2024, 3:29pm

Hi

A recent update to llama index requires you to call it like this:

pip install llama-index-llms-huggingface
from llama_index.llms.huggingface import HuggingFaceInferenceAPI

Topic		Replies	Views
How to use llm model's api? Beginners	2	2931	November 14, 2024
How to batch process 5mm prompts of llama 2 using inference endpoints? Inference Endpoints on the Hub	0	1324	July 30, 2023
Inference Endpoints / Model choices / Help Inference Endpoints on the Hub	1	23	July 10, 2025
Problem to deploy endpoint Inference Endpoints on the Hub	3	304	July 19, 2024
Serverless Inference API Beginners	1	452	September 16, 2024

Using a paid inference end point to query llamaindex knowledge graph gives worse results than the free inference api

Related topics