How to prime GPT-2 with input-output pairs

Hi, first post here! Let me know if I’m in the wrong subforum.

It looks like it’s possible to prime GPT-3 with an input and output (see, e.g. github.com/shreyashankar/gpt3-sandbox). I’m wondering how to do this for GPT-2.


Further details:

My use case is to try to replicate the results of this demo, whose author primes GPT-3 with the following text:

gpt.add_example(Example('apple', 'slice, eat, mash, cook, bake, juice'))
gpt.add_example(Example('book', 'read, open, close, write on'))
gpt.add_example(Example('spoon', 'lift, grasp, scoop, slice'))
gpt.add_example(Example('apple', 'pound, grasp, lift'))

I only have access to GPT-2, via the Huggingface Transformer. How can I prime GPT-2 large on Huggingface to replicate the above examples? The issue is that, with the online Hugginface demo, one doesn’t get to prime with the input and corresponding output separately (as the author of the GPT-3 demo did above).

Similarly, I can’t find anything in the Huggingface documentation describing how to prime with examples of input-output pairs, like Example('apple', 'slice, eat, mash, cook, bake, juice').

Does anyone know how to do this?


Desired output:
use GPT-2 to return something like, for input “potato”, output “peel, slice, cook, mash, bake” (as in the GPT-3 demo above). Obviously the exact list of output verbs won’t be the same as GPT-2 and GPT-3 are not identical models.

Hi @DGhose, I’ve found that using the following prompt format to be reasonably good at getting GPT-2 to complete the pattern for the last input_n:

input_1 => output_1 \n input_2 => output_2 \n ... input_n =>

So for your use case, you could try feeding something like the following

apple => slice, eat, mash, cook, bake, juice \n book => read, open, close, write on \n spoon => lift, grasp, scoop, slice \n banana =>

which in the HuggingFace inference API for gpt2-xl produces a semi-coherent output for “banana”:

Screen Shot 2021-02-03 at 4.26.47 pm

You’ll probably need more examples if you’re doing more complex mappings (e.g. language translation) and it takes a few tries to “cherry pick” the desired output because the text generation is not deterministic in the API (I think they use sampling)