Use RAGAS with huggingface LLM

Hello everybody,

I want to use the RAGAS lib to evaluate my RAG pipeline. The evaluation model should be a huggingface model like Llama-2, Mistral, Gemma and more. How can I implement it with the named library or is there another solution?

The examples by the team Examples by RAGAS team aren’t helpful for me, because they doesn’t show, how to use specific Huggingface model. Also a specifc langchain pipeline doesn’t work, because it isn’t the required class.

Best regards
Christian

Hi,
take a look at this post. I think you will find it helpful :slight_smile:

2 Likes

If I read this correctly, some metrics also need an embedding model. The easiest way would be to put them in langchain wrappers as follows:

from transformers import pipeline, AutoTokenizer, AutoModelForCausalLM
from langchain import HuggingFacePipeline
from langchain_community.embeddings import HuggingFaceEmbeddings
from ragas.metrics import (
    answer_relevancy,
    faithfulness,
    context_recall,
    context_precision,
)
from ragas import evaluate

# embedding model
embedding_model = HuggingFaceEmbeddings("my-model-id")

# evaluator
model_id = "my-evaluator-id"
tokenizer = AutoTokenizer.from_pretrained(model_id)
model = AutoModelForCausalLM.from_pretrained(model_id)

pipe = pipeline(
    model=model,
    tokenizer=tokenizer,
    return_full_text=True,  # langchain expects the full text
    task='text-generation',
    temperature=0.1, 
    repetition_penalty=1.1  # without this output begins repeating
)

evaluator = HuggingFacePipeline(pipeline=pipe)

# ragas
result = evaluate(
    dataset=dataset,
    llm=evaluator,
    embeddings=embedding_model,
    metrics=[
        context_precision,
        faithfulness,
        answer_relevancy,
        context_recall,
    ],
)

I had recently implemented something similar. You have to have sentence-transformers and the latest version of transformers installed, otherwise I got problems with sentence-transformers.

But with the latest version of transformers I had problems importing the pipeline (some tensorflow error). But you can also create them yourself.

class CustomPipeline(Pipeline):
    def _sanitize_parameters(self, **kwargs):
        preprocess_kwargs = {}
        return preprocess_kwargs, {}, {}

    def preprocess(self, text):
        return self.tokenizer(text,  return_tensors="pt").to(self.device)

    def _forward(self, inputs):
        outputs = self.model.generate(**inputs)
        return outputs

    def postprocess(self, outputs):
        outputs = self.tokenizer.decode(outputs[0])
        return outputs

# init pipe
pipe = CustomPipelne(
    model=model,
    tokenizer=tokenizer,
    return_full_text=True,  # langchain expects the full text
    task='text-generation',
    temperature=0.1, 
    repetition_penalty=1.1
)

# langchain wrapper
evaluator = HuggingFacePipeline(pipeline=pipe)
1 Like

Hii @CKeibel ,

I want to evaluate my RAG using open-source LLM instead of GPT-4. I attached code snippets for the RAGAS evaluation. I want to make sure that I am passing llm in the evaluate function correctly or not? I am getting error saying that “WARNING:ragas.llms.output_parser:Failed to parse output. Returning None.” I have checked the several logs and I think the problem is in the passing the llm in the evaluate function.


from ragas.llms import LangchainLLMWrapper
from ragas.embeddings import LangchainEmbeddingsWrapper
from langchain_community.embeddings import HuggingFaceEmbeddings
from langchain_community.chat_models.huggingface import ChatHuggingFace
from langchain_community.llms import HuggingFaceHub

embedding_model = HuggingFaceEmbeddings(model_name="BAAI/bge-small-en-v1.5")
hugging_llm = HuggingFaceHub(
repo_id="HuggingFaceH4/zephyr-7b-beta",
task="text-generation",
model_kwargs={
"max_new_tokens": 512,
"top_k": 30,
"temperature": 0.1,
"repetition_penalty": 1.03,
},
)

llm = LangchainLLMWrapper(hugging_llm)
embeddings = LangchainEmbeddingsWrapper(embedding_model)

from ragas import evaluate
from ragas.metrics import context_precision
result = evaluate(
dataset=eval_dataset,
llm=llm,
embeddings=embeddings,
metrics=[
context_precision,
],
)


Thanks!