Darshan Hiranandani : How to Create Datasets from PDF Files?

Hi everyone,

I’m Darshan Hiranandani, looking for ways to extract text from PDF files and turn it into a well-structured question-and-answer dataset. Has anyone successfully done this, or does anyone have experience creating datasets from the text within PDF files?

Any advice, tools, or methods you’ve used for this process would be greatly appreciated!

Regards
Darshan Hiranandani

Thanks in advance!

1 Like

We’re discussing this very topic on HF Discord, but it’s a bit long to copy and paste.:sweat_smile:

PDF2Datset

And here.