I’m trying to generate “paraphrasal substitutions” for part of a sentence. For example:
epitaphs composed for [happiest]
I want to find (phrasal) substitutions for “happiest” that are similar to happiest in a word-vector / embedding sense but that also result in a linguistically valid final sentence. One completion might be “epitaphs composed for the most fortunate”.
So really I’m looking for a “smoothed” sentencial representation of “happiest” that fits the context.
My thought was to encode the whole sentence (including “happiest”) and then decode starting with pos=4 (i.e. not allowing changes to the first three words), and also maybe penalize for re-using any variant of “happiest” (since I want distinct paraphrases).
I was trying to do this with encoderdecoder (bert2bert or bert and gpt2) but don’t quite know if the generate() command supports this sort of thing.
Any help is much appreciated!