I see. So you want to try out LLM experimentally.
In that case, I think you could either refer to the type of leaderboard that compares performance for each task, or simply try out the most popular LLM. The second link below is a list of LLM popularity. It also includes VLM and speech models, but that’s about it.
There are a lot of large models, so I think it would be better to find a series of models that suit you first, and then look for smaller ones.
For simple text classification tasks, you might be able to get away with the smaller 3B or 1B models.
The newer and more famous ones are Qwen 2.5 and Llama 3.2.