Covid19 adverse event detection

hooman650 · June 25, 2021, 10:22pm

Covid19 Related Question Answering (Closed book question answering)

In 2020, COVID-19 which is caused by a coronavirus called SARS-CoV-2 took over the world. It touched the lives of many people and caused a lot of hardship for humanity. There are still many questions in regards to COVID-19 and it is often difficult to get the right answers. The aim of this project is to finetune models for closed book question answering. In closed-book QA, we feed the model a question without any context or access to external knowledge and train it to predict the answer. Since the model doesn’t receive any context, the primary way it can learn to answer these questions is based on the “knowledge” it obtained during pre-training [1] [2].

The main goals of this project are:

Train a model for question answering in regards to COVID-19
Release the top performing models for further research and enhancement
Release all of the preprocessing and postprocessing scripts and findings for future research.

2. Language

The model will be trained in English and python language.

3. Model

Possible model candidate model is pretrained T5-large or Bert variants (e.g. BioBert orBioClinicalBert) with a sequence generation head.

4. Datasets

The following datasets would be used for finetuning the model. Note that the last dataset is optional and the model is evaluated only using Covid-QA.

5. Training scripts

We can make use of :

8. (Optional) Reads

The following links can be useful to better understand the project and
what has previously been done.

How Much Knowledge Can You Pack Into the Parameters of a Language Model?

patrickvonplaten · June 30, 2021, 12:58pm

That’s a super nice & in-detail description! I’d really like this project to take place Are there other ways we can promote this project?

hooman650 · June 30, 2021, 5:13pm

Hi @patrickvonplaten , I am a data science lead in Bayer AG, I will spread the word and see whether anyone would be interested in joining this activity how many supporters this needs to go forward?

esantus · June 30, 2021, 5:26pm

This is a great project! Happy to join if I can be of help!

nbroad · June 30, 2021, 7:01pm

I’m curious if training a language model first on CORD-19 before doing fine-tuning would help.

Also since the COVID QA datasets are relatively small, would it be worthwhile to train on a generic QA dataset (e.g. SQuAD) before training on COVID QA? Or is there a way to mix the datasets to make a more robust model (i.e. for every 5 samples in COVID QA, throw in 1 sample of SQuAD)?

I know relatively nothing about fine-tuning QA models – maybe this approach is already well established as being intractable.

patrickvonplaten · June 30, 2021, 7:06pm

Awesome finalizing this

hooman650 · June 30, 2021, 7:28pm

@nbroad great points Nicolas! I am not sure if we will have the bandwidth to do pretraining (or intermediate training) but CORD-19 sounds quite nice! For finetuning, we definitely will need a mixing approach as explained in this COLAB. I think to save time we could very much follow that approach and mix in Covid QA, CDC QA, SQUAD and Trivia. We could use Seqio for this mixing. Would you like to be a part of this project?

Shravanthi · June 30, 2021, 9:02pm

This sounds very interesting! I’ve been working on a similar task of Question Generation using T5 and BERT and this would probably be an ideal extension to that. I’ve no prior experience with FLAX/JAX so also would be great to learn about them. I’m interested to join and contribute if there’s still any space. Looking forward to it

I’m a Master’s student and live in Canada. PT/PDT time zone.

srisweet · July 1, 2021, 1:50am

Hello @hooman650,
I am interested to be a part of this wonderful project…

patrickvonplaten · July 1, 2021, 10:08am

added you @Shravanthi and @srisweet

Ankit-Kumar-Saini · July 1, 2021, 12:24pm

I’m a Master’s student from India.
Very meaningful project. I have never worked on the task of Question Generation. There will be a lot to learn for me. I am interested in working on this project.

hooman650 · July 1, 2021, 7:32pm

I just created a channel for this project on discord, please feel free to join there we will be talking about the project logistics and planning

patrickvonplaten · July 1, 2021, 11:50pm

Added you @Ankit-Kumar-Saini

asharma85 · July 2, 2021, 11:36am

Hey all
I am a beginner with Transformers and FLAX and want to get into Transformers via a project. Basically I would like to be able to work on getting a minimal training pipeline built and finetune a model using HuggingFace’s awesome resources

Most of my background is CV related in PyTorch. But I do like this idea and happy to get involved. Please let me know if you think I can help

reichenbach · July 2, 2021, 7:19pm

Hi @patrickvonplaten, I have worked on CORD-19 dataset as part of Kaggle Challenge and sort of acquainted with COVID-19 knowledge sources. so can you add me to the group/community. Thanks in advance

hooman650 · July 2, 2021, 11:02pm

@patrickvonplaten our project still does not have access to TPU VMs. I sent you my gmail details and ids via private message in slack.

tuner007 · July 3, 2021, 10:22am

Hi,

The title says “adverse event detection” but based on your approach it looks like you want to try “closed book question answering”.

If “adverse event detection” is all you need then you might want to train a simple encoder only model like pubmed_bert released by Microsoft on entity recognition task.

But in case you want to identify answers to other questions including “adverse events” using a QA approach you might have to first pretrain a seq2seq model on biomedical domain which should have a broad vocab and later finetune it for QA task.

I think closed-book QA is already a challenging task and if we are using a model which is not pretrained on biomedical domain, we might not get desired results.

Still there is no harm in trying, just sharing details based on my experience. Ignore if not useful
Cheers !!

saied · July 3, 2021, 6:23pm

@patrickvonplaten can I still be added this project?

hooman650 · July 3, 2021, 6:26pm

Thanks for your comments! Actually ADE was my first intention but later decided to go towards closed book QA, unfortunately I was not able to update the title! We are aware of the steps.

We are hoping to first pretrain T5 on some medical domain corpora and then finetune on available mixed QA datasets. Feel free to join the discord channel to learn more.

asharma85 · July 4, 2021, 7:46pm

Hi can someone point me to the discord channel please?

Topic		Replies	Views
Segmentation of COVID-19 Chest CT Scans using TransUNet Flax/JAX Projects	11	2065	June 30, 2021
Can I use roberta-base-squad2 for QA on COVID-19 to rank documents? Models	1	249	April 14, 2021
Pretrained language model that enables non-autoregressive generation Flax/JAX Projects	16	2718	September 17, 2021
PreTrain T5 for Italian 🇮🇹 Flax/JAX Projects	3	618	July 7, 2021
Finetuning German BERT for QA on biomedical domain Research	2	1016	January 30, 2022