Improve DistilBERT Question and Answering model with reinforcement learning

natale · June 24, 2022, 6:20am

Hi,

I’m using a pre-trained model (distilbert-base-cased-distilled-squad) for Question and Answering and I’m looking for a solution to improve the model using user feedbacks as rewards and penalties which indicate how well the model answered to the question in a given context. I’ve found a Transformer Reinforcement Learning (Trl) library which is built on top of the transformer library by Hugging Face that can be used to train transformer language models with Proximal Policy Optimization (PPO). But it was only implemented for a decoder architectures such as GPT2 . I’m wondering if there’s a workaround to use a similar approach on improving pre-trained distilBERT model using reinforcement based method (using reward and penalty scores for question and answer pairs) or any other possible solution for this?

Thank you.

chrisdoyle · August 8, 2022, 4:21pm

If you haven’t already, read this paper https://arxiv.org/pdf/1909.08593.pdf, it’ll give you everything you need.

Once you determine what your policy \rho(y|x) is, the rest is just filling in the pieces. Possibly simpler than the decoder model because each episode is a single action i.e. generate a span?

natale · August 29, 2022, 4:50am

Thanks for the help, appreciate it.

Indramal · October 15, 2022, 2:59pm

that was really good

Topic		Replies	Views
Inference with DistilBertForQuestionAnswering 🤗Transformers	2	384	January 22, 2021
Does it make sense to train DistilBERT from scratch in a new corpus Beginners	14	6673	April 4, 2023
Distilbert customize model 🤗Transformers	0	216	July 24, 2022
How to use a different pre-trained BERT model with bert_score 🤗Transformers	0	461	May 22, 2023
DistilBERT for Donut Decoder 🤗Transformers	0	211	March 29, 2023

Improve DistilBERT Question and Answering model with reinforcement learning

Related topics