Use RAGAS with huggingface LLM

Christian2901 · March 4, 2024, 7:23am

Hello everybody,

I want to use the RAGAS lib to evaluate my RAG pipeline. The evaluation model should be a huggingface model like Llama-2, Mistral, Gemma and more. How can I implement it with the named library or is there another solution?

The examples by the team Examples by RAGAS team aren’t helpful for me, because they doesn’t show, how to use specific Huggingface model. Also a specifc langchain pipeline doesn’t work, because it isn’t the required class.

Best regards
Christian

dkoterwa · March 4, 2024, 12:34pm

Hi,
take a look at this post. I think you will find it helpful

CKeibel · March 7, 2024, 7:38am

If I read this correctly, some metrics also need an embedding model. The easiest way would be to put them in langchain wrappers as follows:

from transformers import pipeline, AutoTokenizer, AutoModelForCausalLM
from langchain import HuggingFacePipeline
from langchain_community.embeddings import HuggingFaceEmbeddings
from ragas.metrics import (
    answer_relevancy,
    faithfulness,
    context_recall,
    context_precision,
)
from ragas import evaluate

# embedding model
embedding_model = HuggingFaceEmbeddings("my-model-id")

# evaluator
model_id = "my-evaluator-id"
tokenizer = AutoTokenizer.from_pretrained(model_id)
model = AutoModelForCausalLM.from_pretrained(model_id)

pipe = pipeline(
    model=model,
    tokenizer=tokenizer,
    return_full_text=True,  # langchain expects the full text
    task='text-generation',
    temperature=0.1, 
    repetition_penalty=1.1  # without this output begins repeating
)

evaluator = HuggingFacePipeline(pipeline=pipe)

# ragas
result = evaluate(
    dataset=dataset,
    llm=evaluator,
    embeddings=embedding_model,
    metrics=[
        context_precision,
        faithfulness,
        answer_relevancy,
        context_recall,
    ],
)

I had recently implemented something similar. You have to have sentence-transformers and the latest version of transformers installed, otherwise I got problems with sentence-transformers.

But with the latest version of transformers I had problems importing the pipeline (some tensorflow error). But you can also create them yourself.

class CustomPipeline(Pipeline):
    def _sanitize_parameters(self, **kwargs):
        preprocess_kwargs = {}
        return preprocess_kwargs, {}, {}

    def preprocess(self, text):
        return self.tokenizer(text,  return_tensors="pt").to(self.device)

    def _forward(self, inputs):
        outputs = self.model.generate(**inputs)
        return outputs

    def postprocess(self, outputs):
        outputs = self.tokenizer.decode(outputs[0])
        return outputs

# init pipe
pipe = CustomPipelne(
    model=model,
    tokenizer=tokenizer,
    return_full_text=True,  # langchain expects the full text
    task='text-generation',
    temperature=0.1, 
    repetition_penalty=1.1
)

# langchain wrapper
evaluator = HuggingFacePipeline(pipeline=pipe)

kp1264 · April 30, 2024, 3:08pm

Hii @CKeibel ,

I want to evaluate my RAG using open-source LLM instead of GPT-4. I attached code snippets for the RAGAS evaluation. I want to make sure that I am passing llm in the evaluate function correctly or not? I am getting error saying that “WARNING:ragas.llms.output_parser:Failed to parse output. Returning None.” I have checked the several logs and I think the problem is in the passing the llm in the evaluate function.


from ragas.llms import LangchainLLMWrapper
from ragas.embeddings import LangchainEmbeddingsWrapper
from langchain_community.embeddings import HuggingFaceEmbeddings
from langchain_community.chat_models.huggingface import ChatHuggingFace
from langchain_community.llms import HuggingFaceHub

embedding_model = HuggingFaceEmbeddings(model_name="BAAI/bge-small-en-v1.5")
hugging_llm = HuggingFaceHub(
repo_id="HuggingFaceH4/zephyr-7b-beta",
task="text-generation",
model_kwargs={
"max_new_tokens": 512,
"top_k": 30,
"temperature": 0.1,
"repetition_penalty": 1.03,
},
)

llm = LangchainLLMWrapper(hugging_llm)
embeddings = LangchainEmbeddingsWrapper(embedding_model)

from ragas import evaluate
from ragas.metrics import context_precision
result = evaluate(
dataset=eval_dataset,
llm=llm,
embeddings=embeddings,
metrics=[
context_precision,
],
)

Thanks!

scarte · May 7, 2024, 12:41pm

Hi Keibel!
Thank you very much for your developed response. Even so, it still gives me some errors.

When you define the following class:

class CustomPipeline(Pipeline):

The Pipeline you use is imported from from transformers.pipelines.base import Pipeline?

Following your second code option gives me the following error:

  File "/usr/local/lib/python3.10/dist-packages/ragas/llms/base.py", line 147, in generate_text
    result = self.langchain_llm.generate_prompt(
  File "/usr/local/lib/python3.10/dist-packages/langchain_core/language_models/llms.py", line 633, in generate_prompt
    return self.generate(prompt_strings, stop=stop, callbacks=callbacks, **kwargs)
  File "/usr/local/lib/python3.10/dist-packages/langchain_core/language_models/llms.py", line 803, in generate
    output = self._generate_helper(
  File "/usr/local/lib/python3.10/dist-packages/langchain_core/language_models/llms.py", line 670, in _generate_helper
    raise e
  File "/usr/local/lib/python3.10/dist-packages/langchain_core/language_models/llms.py", line 657, in _generate_helper
    self._generate(
  File "/usr/local/lib/python3.10/dist-packages/langchain_community/llms/huggingface_pipeline.py", line 279, in _generate
    text = response["generated_text"]
TypeError: string indices must be integers

Do you know what it could be? Maybe Pipeline isn’t importing it well.

Thank you very much first of all!
Saioa

scarte · May 8, 2024, 10:31am

Hello @kp1264!

I’m looking for different ways to use Ragas with Llama-3 as the evaluator LLM, and I get this same warning that you mentioned:

“WARNING:ragas.llms.output_parser:Failed to parse output. Returning None.”

Besides, the metrics are Nan of course.

{‘faithfulness’: nan}

Did you manage to solve it?

Thank you very much!

sheetalkamthe55 · May 12, 2024, 4:29pm

I am getting the same error of,

“WARNING:ragas.llms.output_parser:Failed to parse output. Returning None.”

I tried including the trace using Langsmith to check for requests and responses. For the given input prompt I believe it is an issue of context length because I do get a blank response. I tried different LLMs but the error remains the same.

Please suggest any open-source models to evaluate.

codegood · May 17, 2024, 7:37pm

What eval datasets can I use for this? I’m getting error’s in the Squad dataset.

himjoshi · June 3, 2024, 6:26pm

@scarte @sheetalkamthe55
Were you able to resolve the error? I’m also facing similar error. I’m using mistral 7B, and utilized the max context length of 8k, but still stuck at this part.

scarte · June 4, 2024, 6:41am

Hi @himjoshi !

In the end I solved it by loading the model with Ollama, as explained in this tutorial:

You load the model you want with ChatOllama, which in your case will be mistral:

from langchain_community.chat_models import ChatOllama

llm = ChatOllama(model="mistral")

and the embeddings if you want them also with OllamaEmbeddings.

Sorry if it’s not very helpful, but it’s how I was able to move forward.
If you find another solution, please share it here! thank you!

Saioa

scarte · June 4, 2024, 6:45am

Hi @codegood !

You can use “explodinggradients/amnesty_qa” dataset, loading it as:

from datasets import load_dataset

data = load_dataset("explodinggradients/amnesty_qa", "english_v2")

Saurabh8255 · July 10, 2024, 11:42am

Hey I am facing same error did you solve the error please let me know
I am using code as
from ragas.llms import LangchainLLMWrapper
from ragas.embeddings import LangchainEmbeddingsWrapper
from langchain_community.embeddings import HuggingFaceEmbeddings
from langchain_community.chat_models.huggingface import ChatHuggingFace
from langchain.llms import HuggingFaceHub

embedding_model = HuggingFaceEmbeddings(model_name=“BAAI/bge-small-en-v1.5”)
hugging_llm = HuggingFaceHub(
repo_id=“HuggingFaceH4/zephyr-7b-beta”,
task=“text-generation”,
model_kwargs={
“max_new_tokens”: 512,
“top_k”: 30,
“temperature”: 0.1,
“repetition_penalty”: 1.03,
},
)

import nest_asyncio
nest_asyncio.apply()
from datasets import Dataset
import os
from ragas import evaluate
from ragas.metrics import faithfulness, answer_correctness

data_samples = {
‘question’: [‘When was the first super bowl?’, ‘Who won the most super bowls?’],
‘answer’: [‘The first superbowl was held on Jan 15, 1967’, ‘The most super bowls have been won by The New England Patriots’],
‘contexts’ : [[‘The First AFL–NFL World Championship Game was an American football game played on January 15, 1967, at the Los Angeles Memorial Coliseum in Los Angeles,’],
[‘The Green Bay Packers…Green Bay, Wisconsin.’,‘The Packers compete…Football Conference’]],
‘ground_truth’: [‘The first superbowl was held on January 15, 1967’, ‘The New England Patriots have won the Super Bowl a record six times’]
}

dataset = Dataset.from_dict(data_samples)

score = evaluate(dataset,metrics=[faithfulness,answer_correctness], llm=hugging_llm, embeddings = embedding_model )
score.to_pandas()

Output:
WARNING:ragas.llms.output_parser:Failed to parse output. Returning None.
WARNING:ragas.llms.output_parser:Failed to parse output. Returning None.
WARNING:ragas.llms.output_parser:Failed to parse output. Returning None.

iammano · July 16, 2024, 8:44am

Hey @CKeibel , By default when importing the RAGAs in my jupyter notebook it’s throwing an error regarding the OAI keys, did you come across this kind of error while implementing it? If possible can you provide me the package version you are using?

Thanks,
Mano

ctrlaltdelete · July 30, 2024, 7:53am

Hi @scarte !
Could you tell me which model do you load by Ollama? I load llama3.1 by using Ollama, but still have the error

sohammhatre · October 10, 2024, 11:51am

I’m getting the error

/usr/local/lib/python3.10/dist-packages/ragas/metrics/base.py in init(self, run_config)
    151                 f"Metric '{self.name}' has no valid LLM provided (self.llm is None). Please initantiate a the metric with an LLM to run."  # noqa
    152             )
--> 153         self.llm.set_run_config(run_config)
    154 
    155 

AttributeError: 'Mistral' object has no attribute 'set_run_config

for

from langchain_ollama.chat_models import ChatOllama
from langchain_ollama.embeddings import OllamaEmbeddings
from ragas import evaluate
from ragas.metrics import answer_relevancy
from datasets import Dataset
import json

# The model should be specified using the `model` parameter
req_llm = ChatOllama(model="mistral")

embeddings = OllamaEmbeddings(model="mistral")

# for m in metrics:
#     m.__setattr__("llm", req_llm)
#     m.__setattr__("embeddings", embeddings)

data = [
    {
        'question': 'Which is the most popular global sport?',
        'contexts': [
            "The popularity of sports can be measured in various ways, including TV viewership, social media presence, number of participants, and economic impact.",
            "Football is undoubtedly the world's most popular sport with major events like the FIFA World Cup.",
            "Sports personalities like Ronaldo and Messi draw a followership of more than 4 billion people."
        ],
        'response': 'Football is the most popular sport with around 4 billion followers worldwide',
        'answer': 'Football is the most popular sport with around 4 billion followers worldwide'
    }
]

# Convert the list to a Hugging Face Dataset object
dataset = Dataset.from_list(data)

# Step 3: Run the evaluation
results = evaluate(
    dataset=dataset,  # Use the Hugging Face Dataset object
    metrics=[answer_relevancy],
    llm=req_llm,
    embeddings=embeddings,
)

# Step 4: Print the results
print(json.dumps(results, indent=3))

John6666 · October 10, 2024, 12:40pm

It seems to be a bug.

github.com/explodinggradients/ragas

AttributeError: 'AzureChatOpenAI' object has no attribute 'set_run_config'

opened 06:39AM - 13 Feb 24 UTC

closed 01:27PM - 23 Feb 24 UTC

sprt-kmelchor

question

Hi, I've been encountering the error shown below when I try to run `evaluate…()` inside one of my projects (run on a Docker env). For some reason it works when I try to run it on a Notebook (using the Azure example in the docs). I've made sure that the Azure Model on both the docker image and notebook have the same configuration. ``` File "/usr/local/lib/python3.11/site-packages/ragas/evaluation.py", line 179, in evaluate [m.init(run_config) for m in metrics] File "/usr/local/lib/python3.11/site-packages/ragas/evaluation.py", line 179, in <listcomp> [m.init(run_config) for m in metrics] ^^^^^^^^^^^^^^^^^^ File "/usr/local/lib/python3.11/site-packages/ragas/metrics/base.py", line 116, in init self.llm.set_run_config(run_config) ^^^^^^^^^^^^^^^^^^^^^^^ AttributeError: 'AzureChatOpenAI' object has no attribute 'set_run_config' ``` Implementation-wise, the notebook is purely straight-forward but for the one inside the docker, I call `evaluate()` inside an async function. I'm not sure if this would have an effect but I invoke `evaluate()` the same way as I did in the Notebook: `ragas_result = evaluate(ragas_dataset, metrics=DEFAULT_METRICS, llm=DEFAULT_LLM_MODEL, embeddings=DEFAULT_EMBEDDINGS)` Here's the config for the Azure model I'm using (I got the settings from the notebook and they match the one inside the docker image): `AzureChatOpenAI(client=<openai.resources.chat.completions.Completions object at 0x157f7b910>, async_client=<openai.resources.chat.completions.AsyncCompletions object at 0x157f7ee10>, model_name='gpt-35-turbo', temperature=0.0, openai_api_key='...', openai_proxy='', request_timeout=60, azure_endpoint='https://...openai.azure.com/', deployment_name='...gpt-35-turbo', openai_api_version='2023-03-15-preview', openai_api_type='azure') ` Here are also some library versions: ``` langchain==0.1.6 langchain-community==0.0.19 langchain-core==0.1.22 langchain-openai==0.0.5 ragas==0.1.0 python==3.11.5 ``` Any help would be highly appreciated. I can provide more information if needed. Thank you.

RyanTree · October 20, 2024, 2:15am

Hi @scarte ,

dataset=load_dataset(“explodinggradients/amnesty_qa”,“english_v2”,trust_remote_code=True)
dataset_subset = dataset[“eval”].select(range(2))

llm = ChatOllama(model=“mistral”)
embedding_model = OllamaEmbeddings(model=“mistral”)

result = evaluate(

dataset=dataset_subset,

llm=llm,

embeddings=embedding_model,

metrics=[

context_precision

],

run_config=RunConfig(timeout=180.0, max_workers=16)

)

I did ollama run mistral and it was launched successfully but with the above mentioned code, it is still having response error(). Would be super appreciated if you could take a look?

laoyang103 · March 17, 2025, 10:25am

try this:

github.com/Microflow-IO/modsecurity-for-anylog

ragas_local_llm.ipynb

main

{
  "nbformat": 4,
  "nbformat_minor": 0,
  "metadata": {
    "colab": {
      "provenance": [],
      "gpuType": "T4",
      "authorship_tag": "ABX9TyPDpb9BWJbznzToK68Mwjsh",
      "include_colab_link": true
    },
    "kernelspec": {
      "name": "python3",
      "display_name": "Python 3"
    },
    "language_info": {
      "name": "python"
    },
    "accelerator": "GPU",
    "widgets": {
      "application/vnd.jupyter.widget-state+json": {

This file has been truncated. show original

Topic		Replies	Views
Custom BenchMark creation Intermediate	5	76	February 2, 2025
How to Use HuggingFace free Embedding models Beginners	3	5560	October 7, 2024
How can I use the models provided in huggingface.co/models? Beginners	3	1562	April 9, 2021
How to use hugging face transformers for testing a dataset 🤗Transformers	1	266	May 4, 2024
Evaluating RAG only with open-source Intermediate	1	614	May 24, 2024

Use RAGAS with huggingface LLM

Related topics