Keep getting the same output from Mistral-7b-Instruct

Hello!

I’m trying to make a simple ‘choose your own path’ fantasy interactive text game using Mistral-7b-Instruct and python.

I have a rough setup that lets me bounce back and forth between player and ‘engine’.

But I’m constantly seeing the same responses from the model. Its an opening in the forest with a Elf with long hair, or its a bunch of goblins hoping around a fire. Etc. The same ones keep coming back even if I change the initiating text prompt. I get variations now and then sure but on the whole too repetitive and similar.

Is there a ‘seed’ or something I can add to push it around into new spaces?

I want the model to come up with new scenarios and not me having to hand-hold it with explicit prompt text.

EDIT: I currently have temperature=0.9 and tried top_p=0.5->0.95
do_sample=True

Any ideas how this can be achieved?

Cheers!

It might be that Mistral 7B is just too small to be very creative. You can try setting the number of beam searches to 5 or 6 to get a few different outputs at the same time, and see if you’re getting some variety.
You also want to reduce the top P to get a wider range of next tokens to choose from, but you might start getting gibberish if you set it too low.

Hi thank you for your suggestions!
I’ll try and see what happens with lower top_p. I am already running with 4 beams but I didn’t realize I have to capture different outputs. The difficulty with different outputs to choose between would be that I then have to figure out how to evaluate them automatically to pick the best one for any situation.

Yes maybe its just not big enough model or have enough genre relevant text that it was trained on.
I was thinking of paying just $5 or something to run it with gpt4 and see if I got better results. Just for comparison.

I only have a rtx3060 12gb so not enough vram to run a much bigger model.

If you use device_map=auto when you’re loading the model, it will use CPU RAM and Disk space to store the layers that are not needed, swapping them to VRAM when they are needed. It allows you to run much bigger models, though it will suffer in performance.
If you really want to continue using this particular model, you could also try finding some services that charge by the hour. I’ve been using the lowest paid monthly tier on Paperspace and can easily run and train small models and I can even train large models if I work out the device map myself.