Argilla: What's the simplest way to use Suggestions when I log() new records from Python dicts?

Hey there! I’m a beginner, using Argilla for labeling. I’m deployed in a HF Space for my startup. I’m using the Python API, and am log()ing dicts rather than creating rg.Record objects.

I’m a bit confused about how to use Suggestions to have both AI-generated responses to my Questions, as well as human-labeled responses.

My schema looks like this:

settings = rg.Settings(
    guidelines="""Given the Attribute we're trying to evaluate, and with the given examples \
    and the Behavior Interview Question in mind, score the candidate's response on a scale of 0 to 10 and \
    identify quotes from their response which provide evidence to back up your score.""",
    fields=[
        rg.TextField(
            name="candidate",
            title="Candidate",
            description="The candidate we are trying to judge.",
            use_markdown=False,
        ),
        rg.TextField(
            name="interview_id",
            title="Interview ID",
            description="The internal ID for the interview.",
            use_markdown=False,
        ),
        rg.TextField(
            name="attribute",
            title="Attribute",
            description="The attribute we are trying to judge in the candidate's response to the Behavioral Interview Question.",
            use_markdown=True,
        ),
        rg.TextField(
            name="attribute_definition",
            title="Definition",
            description="Definition of this Attribute.",
            use_markdown=True,
        ),
        rg.TextField(
            name="examples",
            title="Examples",
            description="Example rating and quotes for this Attribute.",
            use_markdown=True,
        ),
        rg.TextField(
            name="biq",
            title="BIQ",
            description="The Behavioral Interview Question we have asked in order to judge the candidate's fit with the given Attribute.",
            use_markdown=True,
        ),
        rg.TextField(
            name="response",
            title="Candidate Response",
            description="The Candidate's response to the Behavioral Interview Question.",
            use_markdown=True,
        ),
    ],
    questions=[
        rg.RatingQuestion(
            name="rating",
            title="numeric rating",
            description="What is the candidate's score for this attribute, from 0-10, using the examples as guidance?",
            values=list(range(11)),  # 0..10
        ),
        rg.SpanQuestion(
            name="quotes",
            title="Quotes",
            description="Candidate quotes that provide evidence for the score.",
            field="response",
            allow_overlapping=True,
            labels=["evidence"],
        ),
        rg.TextQuestion(
            name="reasoning",
            title="Reasoning",
            description="LLM's explanation of its rating.",
        )
    ],
)

I’m using a Pydantic model for a bit of runtime type checking, but then convert the data into a simple dict before logging it:

        record = ArgillaValuesModel(candidate = interview_info["candidateName"],
                                    interview_id = interview_config.interview_id(),
                                    attribute = attr_name,
                                    attribute_definition = attr.description,
                                    examples = "<br>".join([f"Text: {example.text}\nScore: {example.score}" for example in attr.examples]),
                                    biq = q_text,
                                    response = response,
                                    quotes = quote_ranges,
                                    rating = score,
                                    reasoning = reasoning,
                                   )
        pprint.pp(record.dict())
        argilla_records.append(record.dict())

I’m then calling log() thusly:

dataset.records.log(argilla_records)

Ok, so… in this case the things I’m inserting for the Question fields are from the LLM, so they should be Suggestions, right? What’s the simplest way to tweak this?

The example at Add, update, and delete records - Argilla Docs doesn’t exactly match up to my situation; it’s geared about adding suggestions for a label. In my case, I have three questions: rating, quotes, and reasoning

I want to keep both the AI-generated answers and the human-generated labels in my Argilla dataset.

What’s the easiest way to add this?

1 Like

Following the example here: rg.Suggestion - Argilla Docs

which looks like this:

dataset.records.log(
    [
        {
            "prompt": "Hello World, how are you?",
            "label": "negative",  # this will be used as a suggestion
            "score": 0.9,  # this will be used as the suggestion score
            "model": "model_name",  # this will be used as the suggestion agent
        },
    ],
    mapping={
        "score": "label.suggestion.score",
        "model": "label.suggestion.agent",
    },  # `label` is the question name in the dataset settings
)

I added a mapping dict to my call to log():

dataset.records.log(
    argilla_records,
    mapping={
        "model": "rating.suggestion.agent",
        "model": "quotes.suggestion.agent",
        "model": "reasoning.suggestion.agent",        
    }
)

Note that my three Question fields for which I have my model making predictions are rating, which is numeric; quotes, which is a list of spans, and reasining, which is an explanation of the reasoning behind the numeric rating.

If I add this mapping dict, the reasoning field disappears from the web UI, and the other two Questions appear unchanged…

There must be something obvious I’m missing!

1 Like

Ok, I switched to using rg.Record instead of the dict api, and now it’s working:

        record = rg.Record(
            fields = {
                "candidate": interview_info["candidateName"],
                "interview_id": interview_config.interview_id(),
                "attribute": attr_name,
                "attribute_definition": attr.description,
                "examples": "<br>".join([f"Text: {example.text}\nScore: {example.score}" for example in attr.examples]),
                "biq": q_text,
                "response": response,
            },
            suggestions = [
                rg.Suggestion(value=quote_ranges, question_name="quotes", type="model"),
                rg.Suggestion(value=score, question_name="rating", type="model"),
                rg.Suggestion(value=reasoning, question_name="reasoning", type="model"),
            ]
        )

Yay.

IMO, if Suggestion s are supposed to work with the dict form of log() the docs / examples need to be better documented.

1 Like