Can anyone point me towards any corpus with a natural increase in the vocabulary size and/or reading difficulty? I’m thinking of something akin to how children and young adult books come with estimated grade levels (e.g. “5th grading reading level”).
1 Like
Have you considered data from CommonLit Readability Prize Kaggle competition? I guess it doesn’t exactly fit your description but it’s kind of similar.
1 Like
Thanks! I’ll take a look