Evaluating my own model

You could probably run each benchmark individually, but if you’re thinking about comparing it to other models, it might be more reliable and easier to ask the model to evaluate the leaderboard.