How can I evaluate which model performs best for my dataset?

If I try different AI models, how do I measure which one gives the best results for my project?

Benchmarks are iffy, just try ones you want and see how well they perform the task you want.

Yeah, you would just have to test them out and see for yourself. it’s worth trying to change the modes in a particular model as well (i get very different results based on whether i’m in chatgpt Thinking mode or not, for example)