Sentence Similarity or Sentence Classification Task?

vitali · March 10, 2021, 3:31pm

I need to codify medical conditions with diagnostic codes. For example “head injury” may be coded as "S02.0, S02.1 Fracture of skull ". I would like to use a model to find likely diagnosis code candidates for entered text. What is the best approach to solving this task? I can either try to find the closest semantic similarity between input sentence and list of diagnosis or I can try to do multi-label classification where diagnostic code is a class. Any ideas, suggestions? Thanks.

neuralpat · March 10, 2021, 4:24pm

Either of your approaches could work.
Do you have a corpus of documents that contains both medical conditions and codified medical conditions?

vitali · March 10, 2021, 4:43pm

Yes, we have 2 data sources: (1) corpus with all notes and (2) list of diagnosis codes with descriptions. We can train embeddings on the corpus and then run embeddings on descriptions of diagnostic codes. My concern is the number of labels (at least 100), not sure how well the classifier can handle this many labels.

neuralpat · March 10, 2021, 4:55pm

So the codes are technically in a different corpus? Then I’d probably try retrieving embeddings before the classifier.

vitali · March 10, 2021, 4:58pm

Yes, the plan was to embed code descriptions for either sentence similarity or classification, but which one to try ?!

vitali · March 10, 2021, 5:24pm

Or maybe we can use a zero-shot classification pipeline? we can pass sentence and possible labels.

neuralpat · March 11, 2021, 6:53am

Honestly, you should try both and see which one does better. ML is a very iterative process, so it’s always best to try different things.
Personally, I’d first try similarity.

Topic		Replies	Views
Sentence Similarity for Code Generation related tasks Beginners	1	872	March 28, 2022
Sentence similarity Beginners	1	946	September 16, 2021
Two sentences classification detail questions 🤗Transformers	0	390	June 2, 2022
[Help Needed] Suicide Risk Detection from Long Clinical Notes (Few-shot + ClinicBERT approaches struggling) Models	2	27	June 10, 2025
FDA Label Document Embedding Research	9	1472	February 19, 2021

Sentence Similarity or Sentence Classification Task?

Related topics