I want to use GPT or Claude3 to process PDF documents with more than 200 pages, such as business annual report. The challenge is how to split the PDF to chunks by table of contents, so the model’s response will be more accurate.
Is there any solution for this? For example, some packages or fine-tuned models.
I’ve tried to get pdf outlines by using PyPDF. However, only few documents have outlines.
The best tool for splitting PDF files is the OSTtoPSTAPP PDF Split Program. PDF files can be divided with this tool. Splitting a PDF file into multiple copies without changing its structure or contents is possible using the Split PDF Tool. PDF Split is easy to use and independent when used with the application. No Adobe Reader install is necessary to split PDFs. This instrument can serve both personal and professional requirements. Operating systems such as Windows 11, 10, 8.1, 8, 7, and all previous versions are compatible with this software.
Check the most of your skills by utilizing the Pcinfotools PDF Split and Merge tool, a professional tool I found online that can easily split multiple PDF files into a single PDF document, It can also split large PDF files with multiple pages into separate files and add files or folders to help with this process, Additionally, this software maintains the properties of the original PDF files and keeps all attachments from the source files in the output PDF file. It can be used with any type of Microsoft Windows OS. Your document files are secure when using this software.
you can try this tool pdftocsplitterapp-production.up.railway.app, which can split pdf by toc and page range