Fine-tuning BERT-based language model to overcome gender-bias
Transformer-based language models such as BERT, and GPT-x has pushed the state-of-the-art on many tasks including language modeling. However, thorough examination [1] of these models has shown their biases toward specific genders, events, etc. For instance, in the sentence “David is driving a car”, almost even attention is paid to different parts. Nonetheless, this does not hold for “Mary is washing the dishes”, in which model is considerably focused on the name “Mary”. We think that by refining the loss function and modifying the training schedule, this artifact might be resolved. We expect a nearly bias-free model to output the same probability for plausible candidates of a masked token.
[1] Do Neural Language Models Overcome Reporting Bias? (https://aclanthology.org/2020.coling-main.605.pdf)
2. Language
The model will be trained in English.
3. Model
We intend to use the bert-base-uncased model.
4. Datasets
We aim to use common datasets used for pre-training of BERT, BookCorpus, and English Wikipedia.
Possible links to publicly available datasets include:
- <BookCorpus: https://github.com/soskek/bookcorpust>
- the link to the Wikipedia dataset is omitted due to new user’s limitations.
5. Training scripts
We had a similar experience of fine-tuning a BERT model for the task of sentiment analysis and language modeling, and we hope by small modifications to our previous codes training for the new objective will be possible.
6. Challenges
There are two central challenges in this project: First, we need to devise a decent loss function for the training to effectively minimize the bias. Second, if redefinition of the bias did not help, we need to train the model from scratch with newly added tokens, plus the new loss function.
7. Desired project outcome
We want to test the model on the task of masked language modeling and hope to see bias-free predictions for masked tokens.