Read data of pdf or just image format as a part of promt

Viking714 · May 29, 2023, 2:17am

I have a use want to develop, but I don’t know which workflow I should take by huggingface. First I want to load text(maybe several thousand words) which is in pdf format, and according to the text material I want to ask some questions to model(just like gpt or oasst-sft-4-pythia-12b-epoch-3.5 or some other test-generation model). I know in openai there is function can extraction main meaning of pdf, I don’t want to transfer learn the pretrained model, just ask questions and pdf text together. I have two concerns:1 is how should load the pdf, 2 is if the text is a little long, is it ok just put the text as a part of promt? ifover the tokens limit? which model should I use in huggingface for this situation? Thank you very much.

Topic		Replies	Views
Train with Text Beginners	0	201	October 20, 2023
Any Multi Modal LLMs that take direct pdf + text as input? 🤗Transformers	2	1845	October 10, 2024
Google Document AI Alternative 🤗Transformers	3	868	October 6, 2024
Title: Recommendations for Models that Handle Text and Screenshots for QA Models	15	1057	November 7, 2024
Pdf data set issues 🤗Datasets	0	607	November 17, 2022

Read data of pdf or just image format as a part of promt

Related topics