I am building a lot of small proof of concepts and ideas for some time now. Some of them are getting attention in my company and now I am fully working on improving them.
Most are RAG based systems or agentic systems to automate tasks.
As I am more and more updating and trying different thing I am now in a situation where I would like to get Figures… Metrics to tell if it is better or not than before in the specific goal of the tool (data retrieval, content creation)
So I would like to build specific benchmarks on top of the tools. Can someone point me a framework, technique, course or any resources to build such benchmarks?