Hi @FL33TW00D, I ran into a similar problem last year with TF-IDF and found the following approach gave better results:
- Encode the documents, either with your favourite Transformer or Universal Sentence Encoder (the latter works really well!)
- Run UMAP on the embeddings to perform dimensionality reduction
- Cluster with HDBSCAN
HTH!