Reconstructing Original Sentence From Bert Output w/ Added Noise

Hi there,

I’m trying to convert a modified bert embedding back to text.

I’m shooting for a pipeline like the following:

  1. input_text
  2. Tokenizer
  3. Bert Base Uncased
  4. Add noise to output tensor
  5. Attempt to reconstruct original text

I’m somewhat lost on step 5. I have a perturbed torch tensor but how do I convert that back to a sentence? I’m guessing with some logit layer that using the same vocabulary as the tokenizer?

Thanks in advance!

1 Like

I have a similar problem. Have you figured out how to do this ?

have you tried tokenizer.convert_ids_to_tokens(tensor)?