I want to build a simple example project using HuggingFace, where I ask a question and provide context (eg, a document) and get a generated answer. How do I go best about it? Are there any pre-trained models that I can use out of the box?
I found lots of examples about extractive question answering, where the answer is a substring from the given context, but that does not always lead to satisfying answers.
It seems like the “question-answering” task and related models only focus on extractive Q&A with BERT-like models (encoder only).
For generative models, I looked at “text2text-generation”, but these models seem to only answer questions without any given context. As a result, the answers often do not make sense.
Other than that, I only found much more complex approaches including document retrieval, which is more difficult than what I want to do in my beginners project. I already have a selected document and want to use that as context - no need to do document retrieval.
I found consciousAI/question-answering-generative-t5-v1-base-s-q-c · Hugging Face, which seems closest to what I want. But it only supports very short sequence lengths that do not fit an entire document (not even a short one).
How can I build something similar but with longer context? Possibly fine-tuned to a certain domain? But still as simple as possible without the need for excessive compute?
At risk of suggesting something you’ve tried - but could one of the smaller dialogue/instruction-tuned models like Llama3-8b or Phi-3-mini-instruct adapted to longer contexts work for answering the questions?
Hi , I’m working on something similar. I want to create a dataset in .csv or .json format to train my own pre-trained model through fine-tuning. I did the following: I installed Ollama, downloaded Solar with Ollama, and then used Ollama in server mode to interact with the model locally from my application. My script divides the text by the number of tokens that the model supports with the corresponding prompt to generate questions and answers from the text in one of the two formats, which are stored in a .txt file, and then proceeds with the next block. The model returns the questions and answers sometimes organized, other times not, and sometimes with comments that it shouldn’t make. However, it generally works well, but I am failing to get the correct data structure and cannot find the right prompt. A model that gave me good responses in lmstudio was Orca, but its performance in server mode in that application was not good.
@jlu I’ve been playing around with Phi-3-mini and it seems to output good answers. Unfortunately, running on my laptop (Mac Pro with M1), is really slow and it can take around ~30min to get a single answer, which makes it quite unusable.
Are there similar models with reduced size and instructions that are faster and more lightweight?
I don’t necessarily need to have dialogs on any topic (open-domain), but just want to have generated (rather than extractive) answers for questions to a given document. Could I build something smaller and faster with something similar to text summarization?
Simply splitting my full text into these short sequences and getting answers for each of them works quite well.
Is there any way of weighting/prioritizing/scoring these answers? Ie, can the model output some sort of certainty/quality score for the generated answer?
If so, I could use that to filter out the best answers.