I’m currently working on a multiple-choice question answering (MCQA) model for my machine learning class, and I’ve hit a bit of a roadblock. The issue is that my model is consistently underfitting the training data, and I can’t seem to get the training accuracy over 80%.
Here’s how my model is structured: For each MC question, which comes with three answer options, I create three separate strings in the format “context + question + optionX”. Each of these strings is then encoded separately using BERT-Large. After encoding, I pass each encoded string through a feed-forward layer that consists of a dense layer with 512 nodes and a tanh activation function, followed by a logit layer without any activation function. This results in three logit values, one for each option, and the model selects the option with the highest logit value as the “correct” answer.
I’m wondering if this issue is a result of a flaw in my architectural approach or if there might be something wrong with my code. Has anyone faced a similar issue or have any suggestions on how I can improve my model’s performance? Any advice would be greatly appreciated!