I am looking for any ideas or advice that you guys may have obtained in similar situations.
I have been working on an NLP task to cluster medical documents for some time, and whilst I am eager to use transformers to get the best results, through all my efforts it seems that TF-IDF has worked best.
I am working with the SIDER side effect dataset, which provides annotated FDA medication labels, an example is here:
I have tried TF-IDF and SciBert through sentence transformers, selecting the most relevant passages, but no amazing results yet. Does anyone have any ideas or previous experience?