l’m developing an AI model to determine whether a question is ‘asktoask’ (true) or not (false). My dataset is imbalanced, with more examples of non-‘asktoask’ questions than ‘asktoask’ questions. I would appreciate suggestions on training parameters such as the learning rate (lr), the number of epochs, and the model architecture. What strategies or tips do you recommend for effectively training this model while handling class imbalance in the dataset? Your insights are welcome.
Does your dataset reflect real world? I.e. there are more yes than nos? If yes then don’t change the dataset.
If not, then why not reduce your true set to reflect reality
If there are more ‘no’ than ‘yes’ in my dataset, and in the real world as well…
So, this thought is not dumb if there are more ‘false’ in reality than ‘true’ in the dataset as well
So you shouldn’t have to do anything if dataset is similar to real world. Train the model and see how it performs before deciding what to do.