Emulating text classification PoC in ChatGPT

Hey all,

I have been having the time of my life playing around with this! I just discovered it recently, and think it may come in handy for a project of mine that had plateaued.

I am looking for ways to determine whether throughout a given twitch stream, whether the streamer is on-topic, or if they are exchanging pleasantries with their chat.

I was able to use Whisper’s transcription model to get a transcription of the text with timestamps, however I’m failing on being able to accurately get a good result of “is this science talk or chit-chat?”.

For example, for the following text:

even if they weren’t born with the equipment to glow. Let’s revisit the anglerfish moments before it tried to eat you. That glowing bait on top of its head? It comes from a pocket of skin called the eska. The eska holds bioluminescent bacteria. The anglerfish can’t glow there by itself, so it holds a sack of glowing bacteria instead. Remember the firefly? It can actually make itself glow. Inside its lantern are two chemicals. is how much the researchers have utilized them to see if there’s any efficiency differences. So they looked at the sequences that they identified, both the sequences for luciferase and luciferin are contained in the glowworm. So that’s why presumably they have the glowing and bioluminescent ability. Why that is in terms of efficiency, we don’t know. We haven’t like a comparative look. They haven’t isolated this particular molecule because there are some differences in terms of sequence. If it’s more or less efficient, then let’s say a firefly is one. Why? Hold on, Cricut. I used to know this. Okay, so yes, that’s what Kith is right. I did find, I have confirmed this was my suspicion. So in Latin, as Grimley points out, to loose, in Latin origin means light. Sorry, the word luxe, it’s the word luxe. Just like Lucifer’s nightclub in the hit Netflix TV show, Lucifer starring Tom Ellis. I got you, Cricket, I got you on the modern day fancy pants stuff, I got you. Chad, do you know about Loving This Cricket? She’s on her 1,000 follower push, she’s almost there, she’s nine. One, two, three, four, five, six, seven, eight, nine followers away from getting 1,000. Please go follow our lovely, amazing friend Cricket. Back to the Luciferase and Luciferin mix together in the presence of oxygen and fuel for the cell called ATP, the chemical reaction gives off energy in the form of light. Once scientists figured out how the Firefly creates its Luciferase and Luciferin, they used genetic engineering to make this light-producing reaction occur inside other living things that can’t glow. For example, they inserted the genes, or instructions, for a in theory apply. New Zealand and when the other place heavily studied was also Australia. But it’s actually suspected when that they were colonized originally from New Zealand to Australia by human intervention, for better or worse. The beauty of bioluminescence, unlike the light from the sun or an incandescent bulb, is that it’s not hot. It takes place in a range of temperatures that don’t burn a living thing. And unlike a glow stick which fades out as the chemicals inside get used up, bioluminescent reactions use replenishable resources. That’s one reason engineers are trying to develop bioluminescent trees. Just think, if planted on the side of highways, they could light the way using harm to the local animals because of the light pollution. But the counter argument was if they’re in places with already existing light pollution, it’s not going to make things worse, maybe. And but it would help with in terms of the environment and oxygen production and CO2 reduction and things along those lines. Not sure I want to glow. Drink more, cook less. you glow every day. I see the way you look at your wife. Talk about survival advantage. That could help our planet live longer. Do you find yourself thinking of other ways to put bioluminescence to good use? That glow stick you swing at a rave may help you find a mate. But how else can bioluminescence improve your survival? If you start thinking in this way, you have seen the light.

I would expect that to be categorized as science or education mostly, since there’s a lot of chat about the technical stuff, and not a lot about just chatting. However, I’ve tried a couple zero-shot classifications, I’ve tried some popular twitter classifiers, I’ve tried summarizing and then classifying, and haven’t gotten a result that didn’t include some level of strong bias towards diaries_&_daily_life or relationships or greetings, which I would expect to NOT be the case.

I posted this in ChatGPT (3.5) with the following question:

Is this bit of text moreso a twitch streamer exchanging pleasantries with their viewership and chat, or a twitch streamer discussion a scientific topic?

To which it responded with:

Based on the provided text snippet, it appears to be more of a twitch streamer discussing a scientific topic related to bioluminescence and the mechanisms behind it. The streamer is mentioning various examples, such as the anglerfish, fireflies, and the chemical reactions involved in bioluminescence. They also touch on topics like genetic engineering and the potential applications of bioluminescence in areas such as environmental conservation. While there might be some interaction with the viewership and chat, the focus seems to be on delivering scientific information rather than exchanging pleasantries.

Which is MUCH more along the lines of what I was expecting an AI to be able to pick up on.

Now, obviously I have hours and hours worth of snippets like this to do, so I’d love to do it with HuggingFace, but am not sure what type of model I would have to look into to have a solid base. I don’t mind training one up, but would like to start with some type of basic foundational success before I dive into any one approach over the other…

If zero-shot classification with dedicated models on HF aren’t high quality enough for your use case you could try using Guidance to design a prompt for an open LLM like falcon-7b-instruct. Guidance makes building prompts for LLMs easier in a few ways, one of which is using them as zero-shot classifiers. Check out this example of how to use a select statement for zero-shot classification.

1 Like

I’m trying some zero-shot models and I’ve seen that their effectiveness depends on the type of domain (science, math, etc…). For your case I saw that they are not effective. For my stuff I’m trying google/flan-t5-large. That would be a Text Generation model, but I’ve seen it to be very effective in various zero-shot cases.
I tried your example and it correctly gave me ‘science talk’ as the answer. Obviously it should be used as Text Generation. This is my prompt:

Apply Labels to Text.

Labels:

  • education
  • science talk
  • chit-chat
  • other

Text:
[YOUR_TEXT_HERE]

Hope it’s help.
Ciao

1 Like

Thank you so much! This was a great idea!

Specifically because this is very much in-line with how my example with ChatGPT was able to work, I went ahead and marked this as the solution.

I have tried this out with some of my other examples, and it seems to work out quite well! This is gunna be a real time-saver!

As an afterthought, I wonder if there would realistically be any advantage in training a classification AI using these, or if this is doing a good enough job, whether it would be good enough for me to just use this in perpetuity :thinking:

1 Like

Ciao andrewcz.
Thanks for marking my answer as a solution.
Regarding your observation, from a programming point of view, a model for classification is much simpler to use: labels are objects. You can create different sets of labels, storing them as structured data. With Text Generation models it’s not possible, because labels are parts of text prompt.
However, I too use google/flan-t5-large, because it works very well even with the Italian language. I use the same English prompt you saw in my previous post, then I use both English and Italian for queries, with great results.

Ciao