Few-shot learning vs Fine-Tuning

I am trying to define a comparison metric which compares the few-shot learning techniques vs normal fine-tuning, for any NLP down stream task. for example text-classification task.

I am using SETFIT for fewshot with bert as sentence transformer and same bert in sequence classification.

My current thoughts are since fewshot method require very few examples per class in order to achieve similar performance as fine-tuned model, if we follow same rule in normal fine-tuning, will that model give any sensible accuracy score or not. (currently i am getting random accuracy on fixed evaluation set in normal fine-tuning).

I have used samples per class in range 2,4,8,16,32. I am able to see the results for setfit makes sense, but not in case of normal fine-tuning.

Will appreciate flaws in above approach, and new directions for search, any papers in this line will be very helpful