Is there a small (<5GB) dataset for general-purpose LLMs?

Hi,
Is there a tiny (<5GB) dataset for training small LLMs for general purposes?
Thanks!