Help With SchemaLLMPathExtractor and HuggingFaceInferenceAPI

I am working on a project building Property Graph Indexes using Llama Index and Hugging Face Inference API.

When I use the SimpleLLMPathExtractor and an LLM from HuggingFace Inference API, I am able to generate the property graph index with extracted entities and relationships.

For example, this code:

from llama_index.embeddings.huggingface import HuggingFaceEmbedding
from llama_index.llms.huggingface_api import HuggingFaceInferenceAPI
from typing import Literal
from llama_index.llms.ollama import Ollama
from llama_index.core.indices.property_graph import SimpleLLMPathExtractor

token = os.getenv("HUGGING_FACE_KEY")
llm = HuggingFaceInferenceAPI(
    model_name="meta-llama/Meta-Llama-3-70B-Instruct", token=token
)

embedding_model = HuggingFaceEmbedding(model_name="BAAI/bge-small-en-v1.5")

kg_extractor = SimpleLLMPathExtractor(
    llm=llm,
    max_paths_per_chunk=10,
    num_workers=4,
)

# Create the index
index = PropertyGraphIndex.from_documents(
    documents,
    embed_model=embedding_model,
    kg_extractors=[
        kg_extractor
    ],
    property_graph_store=graph_store,
    show_progress=True,
    use_async=True # Set use_async to False
)

Yields these results:

However, no entities are extracted when I try to apply a schema, like I’ve seen in these examples:

  1. blogs/llm/llama_index_neo4j_custom_retriever.ipynb at master · tomasonjo/blogs · GitHub
  2. llama_index/docs/docs/examples/property_graph/property_graph_advanced.ipynb at main · run-llama/llama_index · GitHub

For example, this code only adds chunks to the neo4j db, with no entities extracted:

from llama_index.embeddings.huggingface import HuggingFaceEmbedding
from llama_index.llms.huggingface_api import HuggingFaceInferenceAPI

token = os.getenv("HUGGING_FACE_KEY")
llm = HuggingFaceInferenceAPI(model_name="meta-llama/Meta-Llama-3-70B-Instruct", token=token)

embed_model = HuggingFaceEmbedding(model_name="BAAI/bge-small-en-v1.5")

kg_extractor = SchemaLLMPathExtractor(
    llm=llm,
    possible_entities=entities,
    possible_relations=relations,
    kg_validation_schema=validation_schema,
    strict=True,
)

from llama_index.core import Settings
Settings.llm = llm
Settings.embed_model = embed_model
Settings.chunk_size = 1024

# Create the index
index = PropertyGraphIndex.from_documents(
    documents,
    embed_model=embed_model,
    kg_extractors=[
        kg_extractor
    ],
    property_graph_store=graph_store,
    show_progress=True,
    use_async=True # Set use_async to False
)

However, If I try the same kind of schema extraction using an Ollama model running locally, I am able to extract entities.

For example this code extracts entities as expected:

from llama_index.embeddings.huggingface import HuggingFaceEmbedding
from llama_index.llms.huggingface_api import HuggingFaceInferenceAPI
from typing import Literal
from llama_index.llms.ollama import Ollama
from llama_index.core.indices.property_graph import SimpleLLMPathExtractor

llm = Ollama(model="llama3", json_mode=True, request_timeout=3600)
embed_model = HuggingFaceEmbedding(model_name="BAAI/bge-small-en-v1.5")

kg_extractor = SchemaLLMPathExtractor(
    llm=llm,
    possible_entities=entities,
    possible_relations=relations,
    kg_validation_schema=validation_schema,
    strict=True,
)

from llama_index.core import Settings
Settings.llm = llm
Settings.embed_model = embed_model
Settings.chunk_size = 1024

# Create the index
index = PropertyGraphIndex.from_documents(
    documents,
    embed_model=embed_model,
    kg_extractors=[
        kg_extractor
    ],
    property_graph_store=graph_store,
    show_progress=True,
    use_async=False # Set use_async to False
)

My key questions is, what could be different in the inference api that is causing it to not extract entities when the same or smaller model run locally with Ollama succeeds? It seems that the same model running on Ollama should yield the same results as the same model running on the inference API. Additionally, when using the OpenAI model gpt-4o, entities are extracted as expected.

I understand this might be a better question for Llama index, and I will be asking over there as well, but any help, tips, information, or pointers anyone would be willing to share would be greatly appreciated.

Additional Things I have Tried:

  • Using different models from hugging face inference api, such as llama-3-8b-instruct and non instruct versions, as well as mistral.
  • Using different Llama Index Settings or forgoing them completely.
  • Using Strict=False or Async=False in the kg_extractor