Effect on varing Context length in QA system

mshahzaib777 · March 28, 2023, 12:25pm

I am trying a QA answer system for Entity Linking (I implemented ENTQA paper). To improve the model i am trying to merge two datasets (LCQUAD and T-REx). I trained only with T-REx (paragraphs with entities in it)which worked quite good. but T-REx annotation is not accurate, there are span of words which could have been marked as entities but they are not. to tackle that i merged LCQUAD which is questions only with accurate annotation of entities in the text.

I am fine-tuning deepset/minilm-uncased-squad2 model. with training on merged dataset the performance decreases for longer texts. and model stops detecting any entity in the text, where as for smaller text the performance is also not very good.

I am unable to understand why is it happening!

Topic		Replies	Views
Newbie Seeking Guidance on Optimal Sentence Size for Embedding Encoding 🙏 Beginners	3	1952	April 13, 2023
Improving NER model performance & comparing approaches (Amazon Comprehend vs model from scratch) Beginners	1	1110	April 5, 2024
Annotate a NER dataset (for BERT) Beginners	3	1587	May 29, 2024
TRL - Fine tuned small model (facebook350m) yields many empty inferences Intermediate	1	27	June 19, 2025
Text classification training on long text Intermediate	3	4947	June 18, 2024

Effect on varing Context length in QA system

Related topics