Metadata of NLP datasets

Dipti · November 5, 2022, 7:51pm

Hi,
I’m new to the NLP domain and HuggingFace ecosystem.
I wanted to some suggestions on where to read about the meta data of datasets used for NLP.
I have worked mostly with vision data so far and simple meta features shared by image datasets in general were:

image resolution
No. of training samples
No. of classification labels
No. of channels

Would the text data used in NLP tasks have some such features in common? Aside Number of training samples and number of classification labels. Any thoughts are welcome.

Thanks!

Topic		Replies	Views
What are some popular datasets for domain adaptation in NLP Research	1	471	November 12, 2020
HAR Vision dataaset 🤗Datasets	1	230	February 6, 2023
HuggingFace 🤗 is all you need for NLP and beyond [BLOG] 🤗Transformers	1	859	May 28, 2022
Exploring contexts of occurrence of particular words in large datasets Research	2	821	October 19, 2022
Imagenet in datasets? 🤗Datasets	2	1398	November 9, 2021

Metadata of NLP datasets

Related topics