FDA Label Document Embedding

Hi @FL33TW00D, I ran into a similar problem last year with TF-IDF and found the following approach gave better results:

  1. Encode the documents, either with your favourite Transformer or Universal Sentence Encoder (the latter works really well!)
  2. Run UMAP on the embeddings to perform dimensionality reduction
  3. Cluster with HDBSCAN

HTH!

2 Likes