Streamer AI (Like Neuro-Sama)

Basically I’ve been trying to wrap my head around how it could be possible to make an AI similar to Neuro-Sama, which is a Vtuber (Vtuber part isn’t important) that seems to be able to discern and reply to some twitch messages with a type of ‘personality’.
and was wondering if anyone else had been thinking of the same thing or interested in the topic.

So far I’ve only thought of using an AI language model such as BERT or GPT-2, since they are free, but i couldn’t really find a way to fine tune these AI’s off of my data, since most training methods I have seen are not text generative, but rather just for text classification.
I’ve got a little bit of test data in a .csv that has a context and a response column, as well as the sentiment column (label) for negative, neutral, and positive values.

  • of course I don’t fully understand what I’m doing and I don’t know if I even need the sentiment column for what I’m trying to do.
    Anyway my current idea is that context includes the twitch chat message data, and the response would be how I would like the AI to respond to such messages.
    If anyone knows any way i can try going about this, or if you want to get more information, it would be much appreciated- I understand that the task will be a complex one, so if it seems out of reach please don’t hesitate to speak an objective truth.
3 Likes

From my research recently on exactly sucha project I am working on myself. It seems more likely that there are multiple neural networks working together. My guess is that she has a personality from where she gathers her being, and is looking for keywords and symbols. if it is a queastion she will reply to it, if it is a random sentence she will use keywoards and use them to create a sentence or longer text.

I also guess that when Vedal is talking his audio input is connected to a neural net that specifically recognizes him as the owner. If someone else is using it it probably just uses a speech to text translation on the same input as twitch chat. Also the program itself prioritizes speech over twitch chat it seems to me.

So I guess there are a few models working together and have been trained on the same or at least similar data.

So yes the task is complex, but also expensive. To be able to run such a program as fast and efficiently as Vedal does it would need to be hosted on a dedicated machine for Neural-nets with enough ram and processing power. (Vedal has stated that he is in fact loosing money on the project) Also he is developing it since 2014, so he probably even had to create the datasets himself.

It is not impossible, but needs alot of dedication. It also is unclear what model he is using, if he did not create his own, checking on the time it took for him to create it, he probably did his own.

The filter is either a neural net that checks for negative context or just a word filter, as he stated that the majority are false positives, which would be favorable for an ai running a twitch stream.

The gaming aspect is a different AI, that occasionally sneds inputs to the vtuberAI, but it has limited interaction, as also stated by Vedal.

The Vtuber itself is an interaction between a voice syntesizer and an application that changes the audiofile generated from the text-response into mouth movements of the vtuber. (according to some sources the vtuber is made in unity but other sources talk aboiut a software interacting with vtube studio)

So you would need to check what exactly your version is supposed to do and how it should interact.

I would reccomend to start with an conversational AI and giving it a personality, then train it until it is as close as you want and then look for how it should interact with the chat. and maybe use different models for Q&A and text generation. My guess is that vedal uses the actual conversational chat part for his voice.

3 Likes

Thank you very much! This was quite helpful and i guessed as much, I’ll probably just be working on segments of it and logging them in videos in hopes that one day it will fund itself lol.
Good luck on your own project as well!

I have tried to collect data from Filian’s stream and feed that to a GPT Neo model. The data is basically conversations between the streamer and the chat that I pre-processed before feeding it into the model. However, the results didn’t look very good, the model ended up being too repetitive and not very smart, probably for the lack of data. This seems to be a very expensive thing to do. Let me know if you want to get involved.

I’ve been trying to do something like Neuro Sama for a few years and I found this one app that might help you, (youtube video that shows how to set it up is not mine) How to Sound Like an Anime Girl With This New AI Voice Changer - YouTube

Have you seen the current neuro sama V2 her llm is so well made I cant feel like im talking to a AI