Hello, absolute beginner here… For research purposes, I am facing the following problem.
I have a medium-size dataset (2000 rows) of (one-line) answers to an open-ended question (“what will you take home and remember of…?” for a series of happy and sad events). It is in Italian.
I wish to make a tool able to perform (at least part of) the following tasks:
- extract common patterns: widespread answers and emotions
- give examples of answers for specific patterns
- count statistics on these answers and emotions
- point out and highlight particular and atypical concerns
Imagine a question concerning the covid19 pandemic; an example of reply from the system might be:
“A widespread feeling is loneliness, being separated from the beloved ones. There is a rediscovering of the time spent within family. 30% of answers are full of sorrow for the death of someone… One message reported the consciousness of how useless is life when you are alone…” and so on.
I have tried different ways (chatgpt with some system of text splitting, gpt3 fby its api, some free approaches from python…) and different prompts.
I am facing the problems of the limit in the number of tokens, and of short memory (I am sending the questionnaire answers in blocks, but the system forgets old ones and I am not succeeding in making it remember the context as I wish, and enrich it as blocks are transmitted to it).
Moreover, the answers generated by the systems are not really under my control: sometimes they gave me useful comments, sometimes they translated Italian into English (!), and so on.
Of course a large part of the problem is my inexperience, but maybe I am not choosing the right way.
So my question is: what is a good way to follow, possibly with free tools?