Help Finding Dataset with Increasing Vocab Size and/or Reading Difficulty

RylanSchaeffer · September 8, 2021, 3:59pm

Can anyone point me towards any corpus with a natural increase in the vocabulary size and/or reading difficulty? I’m thinking of something akin to how children and young adult books come with estimated grade levels (e.g. “5th grading reading level”).

adorkin · September 15, 2021, 9:46am

Have you considered data from CommonLit Readability Prize Kaggle competition? I guess it doesn’t exactly fit your description but it’s kind of similar.

RylanSchaeffer · September 15, 2021, 3:53pm

Thanks! I’ll take a look

Topic		Replies	Views
Any good datasets related to creative writing (books/novels)? 🤗Datasets	0	1133	August 18, 2022
Model to rewrite/summarize a text into a target reading level Beginners	1	957	March 15, 2025
How can i know the category of size of the database(small, medium, large) and diversity Beginners	2	31	January 8, 2025
Bookcorpus dataset format 🤗Datasets	3	2750	April 26, 2023
[Open-to-the-community] One week team-effort to reach v2.0 of HF datasets library 🤗Datasets	292	13892	October 30, 2022

Help Finding Dataset with Increasing Vocab Size and/or Reading Difficulty

Related topics