Have You Ever Tested an AI Model? What Was Your Experience?

I’m currently working on an AI project (image generation). Besides doing manual testing (functional and integration) and automation (API testing), I also need to focus on testing the AI itself. Since the outputs are images, the main evaluation method right now is human assessment. I’ve defined some categories for review after aligning with the AI Engineer’s perspective.

After finishing the test, I provide feedback to the AI team so they can decide whether the current model needs fine-tuning. However, I don’t want to stay in a purely manual QA / black-box testing role.

I’d like to ask: does anyone here have experience testing AI models? Specifically:

  • What aspects of the AI should I focus on to improve the evaluation process and model quality?

  • Are there ways to automate checks (e.g., for model stability across runs) rather than relying only on human evaluation?

Any guidance would be really helpful.

1 Like

For LLM fine-tuning, evaluation methods are fairly well-established, but for T2I, it’s inherently more difficult…
For now, I’ll just leave the resources from existing projects that might be usable.

2 Likes

it’s really helpful, thank you for your contributing :hugs:

1 Like

Hi John,
After reading your resources, I found them really helpful, thanks for sharing!

By the way, I also have a question. From your point of view, what would be your suggested approach to test this image flow:
I upload an image from my device, then ask the model to detect a specific area on that uploaded image, and after that, I prompt what I want to replace or add into the template.

Initially, I would like to evaluate only the changed image against the prompt. However, I also want to include the uploaded template in the evaluation, to analyze the differences after the modification, possibly by using the TIFA model.

Do you have any recommended method or approach to evaluate the uploaded template, the modified image, and the prompt?

1 Like

Hi. Hmm… Like this ?

Oh really, can I discuss with you more about this case from another platform? Cause I don’t see where to have a conversation with you on HuggingFace

1 Like

Oh. But I’m not really familiar with it.:sweat_smile:
BTW, there’s a Hugging Face Discord.

Oh I see, thank you for your reply. I’m trying to figure out any solution for that case based on your suggested resources

1 Like