I used these code to conver TF model to Transformer model
transformers-cli convert --model_type bert --tf_checkpoint "./orgbert/bert_model.ckpt" --config "./orgbert/bert_config.json" --pytorch_dump_output ".pytorch_model.bin"
when I using these code to get the
from transformers import AutoConfig, AutoModel, AutoTokenizer,BertForPreTraining
tokenizer = AutoTokenizer.from_pretrained('.\pretrained_bert/bertbase')
torch_bert = BertForPreTraining.from_pretrained('.\pretrained_bert/bertbase')
torch_bert.eval()
st = 'Complex Langevin (CL) dynamics [1,2] provides an approach to circumvent the sign problem in numerical simulations of lattice field theories with a complex Boltzmann weight, since it does not rely on importance sampling. [SEP] Complex Langevin (CL) dynamics [1,2] provides an approach to circumvent the sign problem in numerical simulations of lattice field theories with a complex Boltzmann weight, since it does not rely on importance sampling.'
inputs = tokenizer(st, return_tensors="pt", max_length=512,truncation=True)
with torch.no_grad():
outputs = torch_bert(**inputs, output_attentions=True)
attention = outputs.attentions
and the attention weight is large than TF like this
tf_attention[0][‘attns’][0][0][0][:5]
[2.742e-06 2.049e-04 2.688e-05 8.023e-05 9.804e-03]
torch_attention[0][‘attns’][0][0][0][:5]
[0.00899282 0.00776654 0.00819262 0.00504107 0.01239857]