Hey Folks, I wanted to share a recently release feature of Oumi for LLM-as-a-Judge. Would love to get your feedback on how we can improve the API, documentation, additional features you’d like to see, and so on - we’re a community driven project after all!
Here’s the docs: LLM Judge — Oumi as well as a blog post: “ OpenAI just dropped two massive open-weight models — *but how do we separate the reality from the hype?* ” showing how I used it to evaluate gpt-oss-120b
and gpt-oss-20b