Hi guys. I’m a psychologist and i’m learning to use huggingface.
I’m planning a big scrape of an italian site with lots of high quality psychology articles. I will scrape arguments, categories and the bibliography list related to every article.
Just for the sake of learning, what kind of huggingface-related project could be implemented with such kind of data?
Thanks for the replies and sorry for the bad english
Michele
Just off the top of my head, you could try training all the psychology theory based on categorical word concepts and the words that describe them. Maybe train it on some patient data that is classified as a certain thing. Maybe a bot that scans social media looking for signs of suicide? Good luck.
I come from a similar background and would recommend a few areas that are developing “artificial general intelligence” or AGI for short. I would also recommend checking out the Lex Fridman videos where he interviews some of the greatest neuroscientists who are working on AI which started as basis for understanding the brain. I would also recommend the Pubmed datasets which exist here on HF in various formats:
Here are areas I would recommend to explore AI + Psychology:
AI Pipelines
In psychology and the study of the human brain we learned there are two types of memory - semantic and episodic with episodic involving a relationship with the amygdala. In artificial general intelligence (AGI) we are doing AI pipelines which involve using multiple models for different tasks like the human brain. For this I would recommend learning gradio blocks and ability to use gradio to use multiple models in parallel and in sequence.
Example: 📗Health and ❤️Mindful Story Gen👩⚕️ - a Hugging Face Space by awacke1
Memory Types and Multiple Agent System Memory
In understanding the human brain in psychology of memory we have learned there is both semantic memory (like remembering your address) and episodic memory (or remembering feelings and what we care about). Recently the latter has been possible and also capable of using a single persistent dataset which knits together what many people care about. To do episodic memory with AI we need to freely curate and associate the shared knowledge we acquire and share.
Example with Datasets used as Memory: 💬ChatBack🧠💾 - a Hugging Face Space by awacke1
Cognitive Behavioral Therapy
In Leahy’s book for practitioners he covers Cognitive Behavioral Therapy techniques or tools we can use to understand and change how we think about and feel about situations. The fascinating thing about episodic memory is that each time we recall we change and update the synapses around the memory altering it in an adaptive way. If we can teach ourselves to do this in positive ways, we can help many people. One of the best examples I have seen include “Positive Reframing”: Positive Reframing - a Hugging Face Space by Ella2323
While accessing and browsing Datasets of HF can help - its important to create some tools to help curate the datasets you find and get them into formats you can work with. One tool I have played with recently is a datasets viewer where you can easily see and check datasets: 🥫Datasets🎨 - a Hugging Face Space by awacke1
On HF however there are not lots of other datasets that are important to you yet aside from large language models which might be enough. To find ones pertinent to psychology and the brain I use these two links:
First, thank you very much for your answer, since for me, now, the hardest part in the learning experience is to find something like what you have just proposed as a starting point for a small project.
But being the noob i am, i’ll ask you to elaborate further this passage:
Could you elaborate more on the model/task used? This would imply the selection of certain articles based on content, right? It’s not something i can do on the entire dataset, right?
You can and if you came up with one for Psychology it would be pretty cool for a language model with classification tasks. Here is an example which does similar with inputs by doing Named Entity Recognition (NER). It categorizes word concepts based on a really great model and datasets cultivated by @d4data team organization ⚕️MedNER 🩺Biomed Named 🙋Entity Recognition - a Hugging Face Space by awacke1 with model d4data/biomedical-ner-all
And now here i go again being a noob: would it be of some kind of utility for starters just training a BERT multilingual model on the un-categhorized, un-annotated theory articles? If the answer is yes, what’s the best model to start since the articles are in italian? We are talking about nearly 10k articles
It depends on types of tasks you want (e.g. classification, seq 2 seq, auto QnA…), start with finding a model and dataset similar to what you have for the task you want. A great intro to starting out creating your embeddings and using autotokenization, check out this video and course: Semantic search with FAISS - Hugging Face Course