Do we really need a very large dataset to train GPS?

Do we really need a very large dataset to train GPS? If this dataset is not big, will not GPT work well? Or will it still work better than conventional learning models in this situation? And is it possible to quantitatively determine the minimum number of dataset samples suitable for this work? For example, if we talk about malware samples, we can say, for example, that the dataset suitable for GPTs should notbe less than a certain number?