Hi everyone,
I am looking for any ideas or advice that you guys may have obtained in similar situations.
I have been working on an NLP task to cluster medical documents for some time, and whilst I am eager to use transformers to get the best results, through all my efforts it seems that TF-IDF has worked best.
I am working with the SIDER side effect dataset, which provides annotated FDA medication labels, an example is here:
http://sideeffects.embl.de/media/pdf/fda/17106s032lbl/annotated.html#C0026961_0
I have tried TF-IDF and SciBert through sentence transformers, selecting the most relevant passages, but no amazing results yet. Does anyone have any ideas or previous experience?
Many Thanks,
Chris