Reconstructing Original Sentence From Bert Output w/ Added Noise

pdflynn · August 10, 2021, 5:58pm

Hi there,

I’m trying to convert a modified bert embedding back to text.

I’m shooting for a pipeline like the following:

input_text
Tokenizer
Bert Base Uncased
Add noise to output tensor
Attempt to reconstruct original text

I’m somewhat lost on step 5. I have a perturbed torch tensor but how do I convert that back to a sentence? I’m guessing with some logit layer that using the same vocabulary as the tokenizer?

Thanks in advance!

imkhan107 · April 3, 2023, 6:32pm

I have a similar problem. Have you figured out how to do this ?

shivpalit · January 24, 2024, 4:54pm

have you tried tokenizer.convert_ids_to_tokens(tensor)?

Topic		Replies	Views
Model() output issue during migration from pytorch_pretrained_bert to transformers 🤗Transformers	0	546	September 15, 2020
Does thoes lines in the equivalent manner in bert? Beginners	0	186	February 10, 2022
How to get a model's initial input representation? 🤗Transformers	2	822	June 21, 2022
How to convert TF Checkpoints to sentence embedings Beginners	4	1522	November 27, 2020
How do we reassemble sub tokens when running a token classification model in inference with a sentence? 🤗Transformers	2	818	January 4, 2023

Reconstructing Original Sentence From Bert Output w/ Added Noise

Related topics