Argilla: What's the simplest way to use Suggestions when I log() new records from Python dicts?

rpeck-personal · October 23, 2024, 8:00pm

Hey there! I’m a beginner, using Argilla for labeling. I’m deployed in a HF Space for my startup. I’m using the Python API, and am log()ing dicts rather than creating rg.Record objects.

I’m a bit confused about how to use Suggestions to have both AI-generated responses to my Questions, as well as human-labeled responses.

My schema looks like this:

settings = rg.Settings(
    guidelines="""Given the Attribute we're trying to evaluate, and with the given examples \
    and the Behavior Interview Question in mind, score the candidate's response on a scale of 0 to 10 and \
    identify quotes from their response which provide evidence to back up your score.""",
    fields=[
        rg.TextField(
            name="candidate",
            title="Candidate",
            description="The candidate we are trying to judge.",
            use_markdown=False,
        ),
        rg.TextField(
            name="interview_id",
            title="Interview ID",
            description="The internal ID for the interview.",
            use_markdown=False,
        ),
        rg.TextField(
            name="attribute",
            title="Attribute",
            description="The attribute we are trying to judge in the candidate's response to the Behavioral Interview Question.",
            use_markdown=True,
        ),
        rg.TextField(
            name="attribute_definition",
            title="Definition",
            description="Definition of this Attribute.",
            use_markdown=True,
        ),
        rg.TextField(
            name="examples",
            title="Examples",
            description="Example rating and quotes for this Attribute.",
            use_markdown=True,
        ),
        rg.TextField(
            name="biq",
            title="BIQ",
            description="The Behavioral Interview Question we have asked in order to judge the candidate's fit with the given Attribute.",
            use_markdown=True,
        ),
        rg.TextField(
            name="response",
            title="Candidate Response",
            description="The Candidate's response to the Behavioral Interview Question.",
            use_markdown=True,
        ),
    ],
    questions=[
        rg.RatingQuestion(
            name="rating",
            title="numeric rating",
            description="What is the candidate's score for this attribute, from 0-10, using the examples as guidance?",
            values=list(range(11)),  # 0..10
        ),
        rg.SpanQuestion(
            name="quotes",
            title="Quotes",
            description="Candidate quotes that provide evidence for the score.",
            field="response",
            allow_overlapping=True,
            labels=["evidence"],
        ),
        rg.TextQuestion(
            name="reasoning",
            title="Reasoning",
            description="LLM's explanation of its rating.",
        )
    ],
)

I’m using a Pydantic model for a bit of runtime type checking, but then convert the data into a simple dict before logging it:

        record = ArgillaValuesModel(candidate = interview_info["candidateName"],
                                    interview_id = interview_config.interview_id(),
                                    attribute = attr_name,
                                    attribute_definition = attr.description,
                                    examples = "<br>".join([f"Text: {example.text}\nScore: {example.score}" for example in attr.examples]),
                                    biq = q_text,
                                    response = response,
                                    quotes = quote_ranges,
                                    rating = score,
                                    reasoning = reasoning,
                                   )
        pprint.pp(record.dict())
        argilla_records.append(record.dict())

I’m then calling log() thusly:

dataset.records.log(argilla_records)

Ok, so… in this case the things I’m inserting for the Question fields are from the LLM, so they should be Suggestions, right? What’s the simplest way to tweak this?

The example at Add, update, and delete records - Argilla Docs doesn’t exactly match up to my situation; it’s geared about adding suggestions for a label. In my case, I have three questions: rating, quotes, and reasoning…

I want to keep both the AI-generated answers and the human-generated labels in my Argilla dataset.

What’s the easiest way to add this?

rpeck-personal · October 23, 2024, 10:21pm

Following the example here: rg.Suggestion - Argilla Docs

which looks like this:

dataset.records.log(
    [
        {
            "prompt": "Hello World, how are you?",
            "label": "negative",  # this will be used as a suggestion
            "score": 0.9,  # this will be used as the suggestion score
            "model": "model_name",  # this will be used as the suggestion agent
        },
    ],
    mapping={
        "score": "label.suggestion.score",
        "model": "label.suggestion.agent",
    },  # `label` is the question name in the dataset settings
)

I added a mapping dict to my call to log():

dataset.records.log(
    argilla_records,
    mapping={
        "model": "rating.suggestion.agent",
        "model": "quotes.suggestion.agent",
        "model": "reasoning.suggestion.agent",        
    }
)

Note that my three Question fields for which I have my model making predictions are rating, which is numeric; quotes, which is a list of spans, and reasining, which is an explanation of the reasoning behind the numeric rating.

If I add this mapping dict, the reasoning field disappears from the web UI, and the other two Questions appear unchanged…

There must be something obvious I’m missing!

rpeck-personal · October 24, 2024, 5:44pm

Ok, I switched to using rg.Record instead of the dict api, and now it’s working:

        record = rg.Record(
            fields = {
                "candidate": interview_info["candidateName"],
                "interview_id": interview_config.interview_id(),
                "attribute": attr_name,
                "attribute_definition": attr.description,
                "examples": "<br>".join([f"Text: {example.text}\nScore: {example.score}" for example in attr.examples]),
                "biq": q_text,
                "response": response,
            },
            suggestions = [
                rg.Suggestion(value=quote_ranges, question_name="quotes", type="model"),
                rg.Suggestion(value=score, question_name="rating", type="model"),
                rg.Suggestion(value=reasoning, question_name="reasoning", type="model"),
            ]
        )

Yay.

IMO, if Suggestion s are supposed to work with the dict form of log() the docs / examples need to be better documented.

Topic		Replies	Views
Argilla: Having trouble getting labelers access to my dataset Spaces	3	66	November 10, 2024
Argilla: dataset.create() failing in my Organization space Spaces	4	149	September 30, 2024
Availability of the 'argilla/notux-chat-ui' model 🤗Transformers	1	255	November 15, 2024
Confusion regarding when to use dict-styled chat dialogue vs. when to format using chat template Intermediate	0	42	November 6, 2024
Data problem for live support for my e-commerce site Beginners	0	18	October 28, 2024

Argilla: What's the simplest way to use Suggestions when I log() new records from Python dicts?

Related topics