LLM for cyberbullying intervention - profanity issues

I am doing my Master’s project on building a chatbot to monitor social networking sites for cyberbullying. The chatbot uses BLSTM for the detection engine, which works reasonably well. Messages detected as cyberbullying are then sent to hugchat with a request for a response which the chatbot then posts as a reply to the cyberbullying message. The prompt looks like this:

my_query = "The following comment has been detected as " +
"cyberbullying against an individual. " +
“Commment = {” + my_comment.body + "}. " +
"Provide a response to this comment as though you are a " +
"bystander who wants to show support for the victim, "+
"with the primary goal of mitigating the impact of this " +
"cyberbullying comment on their mental health. " +
"Your response should be both empathetic and respectful. " +
"Your response should be no longer than ten sentences and written in " +
"a casual tone appropriate for social media. " +
"Your response should be written in the persona of " +
"a 30-year old person who lives in the USA, has a liberal " +
“arts education, and is technically adept.”

This works very well if the cyberbullying comment does not contain profanity. Unfortunately, most cyberbullying messages do contain profanity. When that happens, the response back from hugchat is along the lines of “I cannot provide a response that would engage in name-calling or personal attacks as it would go against my programming rules”. This response is obviously unhelpful, and isn’t what the prompt is asking for. I am wondering if anyone has encountered similar issues, and if there is anything I can add to the prompt to get it to provide an appropriate response. Thank you!

This is somewhat standing out. I assume the model thinks a offensive tone should be included in the response.
Try adding “Instead of using profanity, name-calling or personal attacks reply in a casual tone suitable for social media”

Thanks for the suggestion! I added “Your response should avoid profanity, name-calling, or personal attacks.” just before the part about being empathetic and respectful, and that seems to have worked. Thanks again!

1 Like