Generate unlikely text

Hi there,

I was just playing with hugging face text generation and I wondered if there was a simple way to tweak it such that an unlikely continuation of the text was generated, rather than a likely one?

I don’t mean just random noise - it would be great if the continuing text made grammatical sense, but just was more unlikely than not.

For example I was just generating text off of a sentence like “a pub, a cafe, a” and gpt-2 generated text suggested continuing sentences that started with “hotel”, “cinema” and so on, which is great, but I’d be really interested if I could tweak the settings such that unlikely things like “giraffe” or “bikini atoll” or “15th century writ of land ownership” would be generated.

Or maybe that just wouldn’t be possible with tweaking and would require constructing some special pipeline?

Many thanks in advance
Best, Sam

You don’t want things that are too unlikely, otherwise you’ll get gibberish like “a pub, a cafe, a fetch Voulez conspire the obstreperous of.” You may as well just use random numbers to pick words off a list at that point. It’s much less computationally intensive.

Start by turning up the temperature. Temperature values >1.0 will flatten out the probability curve, spreading the probability around and giving more “random” results when sampling.

Then look at typical_p to invoke typical sampling (instead of top_p or top_k). Typicality is specifically designed to avoid the most extreme outcomes, figuring that if the next word is too obvious, a person wouldn’t bother to say it, and if it’s too random you’ll just confuse the listener.

The combination of high temperature and typical sampling might hit middle-probability options fairly often without degenerating into gibberish.

If that doesn’t get you where you want to be, you’ll probably need to avoid the generate pipeline entirely. If you call the model directly word-by-word and implement your own method for pruning options that are too likely, you might get the result you want.

1 Like

Thanks! That’s super helpful. Temperature and typical_p are giving me some interesting results

generator(
    "the particularly weird thing, apart from being able to meditate for an hour at will in a pub or a coffee shop or a",
    max_length=30,
    num_return_sequences=20,
    temperature=1000.0,
    typical_p=10000,
)

things like “local brewery” and “book”, but I think I probably will want to generate my own pipeline. I guess this is the best resource for understanding how to implement my own? How to create a custom pipeline?

many thanks again :slight_smile: