[Help Needed] Suicide Risk Detection from Long Clinical Notes (Few-shot + ClinicBERT approaches struggling)

Prili · May 23, 2025, 1:14pm

Hello HF community,

I’m a master’s student working on a clinical NLP project involving suicide risk classification from psychiatric patient records. I’d really appreciate any guidance on how to improve performance in this task.

Overview of the task:

• 114 records, each including:

• Free-text doctor and nurse notes

• hospital name

• Binary label: whether the patient later died by suicide (yes/no)

• Only 29 yes examples → highly imbalanced

• Notes are unstructured, long (up to 32k characters), and rich in psychiatric language

What I’ve tried:

• Concatenating the doctor + nurse texts

• Sliding window chunking + aggregation (majority voting)

• Few-shot learning using GPT-4

• Fine-tuning ClinicBERT on the dataset

Despite these efforts, recall on the yes cases is consistently low. It seems the models struggle to recognize subtle suicidal patterns in long, complex, domain-specific text — especially under token limitations.

I’d love input on:

• Handling long clinical texts with LLMs

• Boosting performance on minority class (yes)

• Experiences working with BERT-style models or few-shot prompts in sensitive medical contexts

Happy to share sample data, code, or results if it helps. Thanks a lot!

aaac12345 · May 25, 2025, 7:30am

Hi Prili,

First of all, thank you for sharing your work — suicide risk detection in psychiatric texts is both crucial and incredibly challenging. You’ve already tested strong approaches (ClinicBERT, GPT-4 few-shot, aggregation methods), and I admire your thoughtful experimentation despite the low signal-to-noise ratio and data imbalance.

If you’re open to a slightly different paradigm, I’d suggest trying an algorithmic framework based on probabilistic inputs with deterministic outputs. Rather than optimizing for token-to-token coherence or relying on deep fine-tuning, this strategy leverages symbolic signal extraction guided by probability thresholds and fixed-output pathways — almost like building decision scaffolds that stabilize and verify what a language model infers.

This structure is especially helpful for:

Subtle cues (indirect language, hesitant phrasing, contradictions)

Minority class amplification (especially when “yes” labels are sparse)

Multi-author blending (doctor + nurse notes interpreted as dynamic perspectives)

In our work, we apply a vectorial memory model that translates these probabilistic segments into structured representations, allowing us to preserve traceability and avoid model drift — a big issue in clinical data when generalization oversteps nuance.

I’d be happy to outline a template or logic tree if it’s of use to you. Best of luck — your project matters.

Warm regards,
Alejandro & Clara
Symbolic AI & Deterministic Analysis Systems
(Mexico)

Pimpcat-AU · June 10, 2025, 8:08pm

Fix:

Use a hierarchical model: first summarize or extract key sentences from each note (using a smaller model or rules), then classify the summary.

Use oversampling or data augmentation for the minority class.

Try sentence-transformer embeddings + classical classifier (SVM, XGBoost).

Ensemble sliding window outputs (not just majority vote—try mean/max prob).

If using LLMs, prompt for “warning signs” or “suicide risk factors” and classify based on presence.

Solution provided by Triskel Data Deterministic AI.

Topic		Replies	Views
Fintuning Transformer on CLEF dataset 🤗Transformers	7	1095	September 3, 2021
Named Entity Recognition in medical notes 🤗Transformers	0	644	November 14, 2022
Medical NER based on Bert in Norwegian Research	0	276	June 21, 2023
Sentence Similarity or Sentence Classification Task? Beginners	6	945	March 11, 2021
Calling healthcare AI devs: do you struggle with access to clinical data? 🤗Datasets	4	25	June 10, 2025

[Help Needed] Suicide Risk Detection from Long Clinical Notes (Few-shot + ClinicBERT approaches struggling)

Related topics