BERT for medical information extraction

Hello, HuggingFace community. I’ve got unstructured lab reports which contains the values of each test result. For example, this is a report containing the test results for magnesium (MAGNESIO, 2,0), potassium (POTASSIO, 4,9) and sodium (SODIO, 137).

MAGNESIO\nMaterial de Coleta: SORO\nMétodo: Clorofosfonazo IlI\nReferência\nResultado:\n2,0\nmg/dL\n1 1,7 a 2,5 mg/dL\nPOTASSIO\nMaterial de Coleta: SORO\nMétodo: Eletrodo Seletivo\nReferência\nResultado\n4,9\nmEq/L\n1 3,5 a 5,1 mEq/L\nSODIO\nMaterial de Coleta: SORO\nMétodo: Eletrodo Seletivo\nReferência\nResultado\n137\nmEq/L\n/ 135 a 145 mEq/L\n
(Test name and result annotated for ease of reading)

I would like to use a BERT-like model to extract this information in a structure similar as:

{
   "magnesium": "2,0",
   "potassium": "4,9",
   "sodium": "137"
}

Since my inputs are in the Portuguese language, I figured BERTimbau would be a good foundational model. Is using BERT the appropriate way to solve my problem? How would I go about annotating my training data and setting up my model for training?

1 Like