The attention weight is not equal to TensorFlow

BenWan · May 8, 2023, 9:39am

I used these code to conver TF model to Transformer model

transformers-cli convert --model_type bert  --tf_checkpoint "./orgbert/bert_model.ckpt"  --config "./orgbert/bert_config.json"  --pytorch_dump_output ".pytorch_model.bin"

when I using these code to get the

  from transformers import AutoConfig, AutoModel, AutoTokenizer,BertForPreTraining
  tokenizer = AutoTokenizer.from_pretrained('.\pretrained_bert/bertbase')
  torch_bert = BertForPreTraining.from_pretrained('.\pretrained_bert/bertbase')
  torch_bert.eval()
  st = 'Complex Langevin (CL) dynamics  [1,2] provides an approach to circumvent the sign problem in numerical simulations of lattice field theories with a complex Boltzmann weight, since it does not rely on importance sampling. [SEP] Complex Langevin (CL) dynamics  [1,2] provides an approach to circumvent the sign problem in numerical simulations of lattice field theories with a complex Boltzmann weight, since it does not rely on importance sampling.'
  inputs = tokenizer(st, return_tensors="pt", max_length=512,truncation=True)
  with torch.no_grad():
      outputs = torch_bert(**inputs, output_attentions=True)
  attention = outputs.attentions

and the attention weight is large than TF like this
tf_attention[0][‘attns’][0][0][0][:5]
[2.742e-06 2.049e-04 2.688e-05 8.023e-05 9.804e-03]
torch_attention[0][‘attns’][0][0][0][:5]
[0.00899282 0.00776654 0.00819262 0.00504107 0.01239857]

Topic		Replies	Views
Converting TF-Bert to Torch using conversion script works, but Beginners	4	767	July 23, 2021
Loading pytorch_pretrained_bert models with transformers Beginners	2	1899	April 29, 2021
BERT model is slow in Pytorch 🤗Transformers	5	626	November 30, 2023
Replace weights in TFBertModel 🤗Transformers	1	2070	December 4, 2021
Untrained models produce inconsistent outputs 🤗Transformers	3	1161	July 30, 2020

The attention weight is not equal to TensorFlow

Related topics