Build a question answering system in your own language

lewtun · November 18, 2021, 9:52am

Regarding (1), yes you’re right that these datasets aren’t in the SQuAD format, so what you’d want to do is either:

Use an existing pretrained model in Hungarian or Romanian (or a multilingual model) to generate embeddings for all the answers, and then compute the similarity between a query and all the answers. (See this nice description using sentence-transformers). This could allow someone to enter their question and then you return the top-N most likely answer documents.
Train your own sentence transformer on the Hungarian / Romanian subsets (see example here). This is more complex, so maybe it’s best to get a Space running with something like the above first

Regarding (2), you’re totally right and this is an oversight on my part! I’ll re-word the project description to be more focused on training a QA system in one’s language - creating the dataset would indeed require a lot of human evaluation to update the character indices of the answers. Thank you for pointing this out to me!

Topic		Replies	Views
RAG Class for Question Answering 🤗Transformers	0	446	October 22, 2020
Question answering using Large Language model Models	2	405	February 25, 2024
Creating t5 for language Beginners	0	243	April 9, 2022
Evaluate question answering with squad dataset Beginners	2	1327	October 10, 2021
Fine tune Albert, RoBERTa or ELECTRA on SQuAD2.0 and need a model Models	0	400	April 29, 2021

Build a question answering system in your own language

Related topics