How can I use class_weights when training?

Alright, you have a very imbalanced dataset with 500 examples and 9 classes.

I understand that you want to use it for training a classifier. But I’m not sure what you want to with it afterwards? Do you have a real world use case with another set of unlabeled documents that you want to classify?

The point is, if you have less than approx. 1000 unlabeled documents to be classified, you should rather label them manually than trying to train a classifier. That will be faster as it should only take a few days to complete.

If you have more than 1000 unlabeled documents I’d say that you should actually take the time to manually label more of them and thus increase the number of examples in your training set.

Why? For training a common classification model you should have at least 100 examples per class (more is better) and the most frequent class should not be 10x the least frequent class.
Another option is to aggregate the classes with few examples into a single class. This will increase accuracy etc. but of course you can not further differentiate.

Now coming to few shot learning. Actually, I have no practical experience with few show learning so I can not point to code or tutorial. But, few shot learning is a different approach with a different training goal. You can find many good articles that explain how it works and also use the search function of this forum to find discussions like this one.

Anyway, the bottom line is that for few shot learning you should use a different model (NLI).

However, as I said before, you actual use case is not really clear. If this is for educational purposes, I’d recommend to try out different datasets, one that is suitable for training a regular classifier and a different one that is suitable for few show learning. If this is a real world use case, you can try few shot learning but otherwise you should acquire more data.