Hello HF community,
I’m a master’s student working on a clinical NLP project involving suicide risk classification from psychiatric patient records. I’d really appreciate any guidance on how to improve performance in this task.
Overview of the task:
• 114 records, each including:
• Free-text doctor and nurse notes
• hospital name
• Binary label: whether the patient later died by suicide (yes/no)
• Only 29 yes examples → highly imbalanced
• Notes are unstructured, long (up to 32k characters), and rich in psychiatric language
What I’ve tried:
• Concatenating the doctor + nurse texts
• Sliding window chunking + aggregation (majority voting)
• Few-shot learning using GPT-4
• Fine-tuning ClinicBERT on the dataset
Despite these efforts, recall on the yes cases is consistently low. It seems the models struggle to recognize subtle suicidal patterns in long, complex, domain-specific text — especially under token limitations.
I’d love input on:
• Handling long clinical texts with LLMs
• Boosting performance on minority class (yes)
• Experiences working with BERT-style models or few-shot prompts in sensitive medical contexts
Happy to share sample data, code, or results if it helps. Thanks a lot!