Fine-tuning BERT-based language model to overcome gender-bias

Fine-tuning BERT-based language model to overcome gender-bias

Transformer-based language models such as BERT, and GPT-x has pushed the state-of-the-art on many tasks including language modeling. However, thorough examination [1] of these models has shown their biases toward specific genders, events, etc. For instance, in the sentence “David is driving a car”, almost even attention is paid to different parts. Nonetheless, this does not hold for “Mary is washing the dishes”, in which model is considerably focused on the name “Mary”. We think that by refining the loss function and modifying the training schedule, this artifact might be resolved. We expect a nearly bias-free model to output the same probability for plausible candidates of a masked token.

[1] Do Neural Language Models Overcome Reporting Bias? (

2. Language

The model will be trained in English.

3. Model

We intend to use the bert-base-uncased model.

4. Datasets

We aim to use common datasets used for pre-training of BERT, BookCorpus, and English Wikipedia.

Possible links to publicly available datasets include:

5. Training scripts

We had a similar experience of fine-tuning a BERT model for the task of sentiment analysis and language modeling, and we hope by small modifications to our previous codes training for the new objective will be possible.

6. Challenges

There are two central challenges in this project: First, we need to devise a decent loss function for the training to effectively minimize the bias. Second, if redefinition of the bias did not help, we need to train the model from scratch with newly added tokens, plus the new loss function.

7. Desired project outcome

We want to test the model on the task of masked language modeling and hope to see bias-free predictions for masked tokens.


you have addressed an interesting subject in which I’m utterly enthusiastic to participate.
as you mentioned earlier, it is good to add a new token for genders to tackle the reported bias.

1 Like

In my point of view, there seems to be an unfilled gap in such areas which must be scrutinized by a great devotion in this discipline, I assert my motivation to this cutting edge topic with great interest to see its results and further applications.

1 Like

I am very curious whether this could be solved on the model-side rater than on the data side. It is a decades-old saying “garbage goes in, garbage goes out”. If your data contains biased data, the model will contain and even amplify such bias. You are suggesting that this problem can be programmatically solved without a hit on the actual performance of the model. To me that sounds counter-intuitive (the model performs “so well” because it can reproduce the bias in the data) but I would love to be proven wrong.

Good luck!

1 Like

Glad to hear that mate !

1 Like

Hey Bram, thank you for your interest! I agree that these models are good since they adapt well to the data. There are two options on the model-side: First, fundamental changes in the neural structure of the net. Second, modification of training schedule. The first option is not reasonable as current models already perform well. For the training schedule, the combination of recently developed self-supervised learning techniques and a new loss function looks promising.