Hi,
I’m new to the NLP domain and HuggingFace ecosystem.
I wanted to some suggestions on where to read about the meta data of datasets used for NLP.
I have worked mostly with vision data so far and simple meta features shared by image datasets in general were:
- image resolution
- No. of training samples
- No. of classification labels
- No. of channels
Would the text data used in NLP tasks have some such features in common? Aside Number of training samples and number of classification labels. Any thoughts are welcome.
Thanks!