I’m interested in the task of rewriting/summarizing a text (possibly up to a few tens of thousands of words) into a target reading level and length. It would be preferable to have multi-lingual support. Is anyone aware of some existing models or datasets that already exist that could help with this task?
So far, I haven’t found any models but I have found one dataset that could possibly be helpful. I believe it only has English texts: CLEAR (CommonLit Ease of Readability) Corpus.
Standard text summarization seems like a very similar task, but I’m not aware of any that also handle a target reading level of the output text.
To get the desired output text, I would assume that I’d be providing three inputs: 1) text 2) reading level and 3) output length.
Note: I’m currently reading through the NLP course on hugging face, and I’m not afraid of doing the work that’s needed. I would greatly appreciate some tips that could help along the way! I’m new to huggingface and the world of transformers, but I have some experience training my own, smaller models.
Thanks!