We have a text dataset of mental health questionnaires and we need to fine tune an LLM for mental health research.
I have been working on an open source tool called Harmony, which helps researchers combine datasets in psychology and social sciences.
We have noticed for a while that the similarity score that Harmony gives back could be improved. For example, items to do with “sleep” are often grouped together (because of the data that the off the shelf LLMs such as SentenceTransformers are trained on) while a psychologist would consider them to be different.
We are running a competition on the online platform DOXA AI where you can win up to 500 GBP in vouchers (1st place prize).
We provide training data, and your code will be evaluated on submission on the platform.
How to get started?
Create an account on DOXA AI Harmony online coding competition on DOXA AI | Harmony and run the example notebook. This will download the training data.
If you would like some tips on how to train an LLM, I recommend the Hugging Face tutorial on fine tuning.
Thomas Wood, Fast Data Science (https://fastdatascience.com/)