Does finetuning a Q/A model with real people's names hurt generalization?

ArCo · August 3, 2022, 12:29pm

I have about a hundred contexts that look like this:

Sir John DOE lives on 55 Main Street NY, born on august 5th 1995 in Buffalo, he is an american citizen.

and these are the questions I want to fine tune the model for:

Where does John DOE live? (answer: 55 Main Street NY)
Where was John DOE born? (answer: Buffalo)
When was John DOE born? (answer: august 5th 1995)
What is John DOE’s nationality? (answer: american)

Would having the name of the person in the training question hurt the model’s accuracy or is it ok? In production, the context will always look like this (name, birth place, birth date, nationality, address) but the name of the person will always be different, and the question will have the name of the person in it because I plan to ask 2 questions:

Question_1: what is the name of the person?
Question_2: “where does” + question_1[“answer”] + “live?”

So should I train the model with the specific name in the question or replace it with something more generic like “where does the person live”?
Thanks

Topic		Replies	Views
Best way to fine-tune Question-Answer model for different questions Models	0	532	May 29, 2021
Do we have to tokenize the question and context together for Q&A models? Beginners	0	227	March 13, 2022
Domain-specific word similarity problem Awesome paper	2	846	July 19, 2023
What model to choose for structured human-related data Beginners	0	196	May 26, 2023
Fine-tuning LLM for RAG Beginners	2	1150	June 10, 2024

Does finetuning a Q/A model with real people's names hurt generalization?

Related topics