I am Playing around with Bert Pretrained Models (bert-large-uncased-whole-word-masking) I used Huggingface to try it I first Used this Piece of Code
m = TFBertLMHeadModel.from_pretrained("bert-large-cased-whole-word-masking")
logits = m(tokenizer("hello world [MASK] like it",return_tensors="tf")["input_ids"]).logits
I then used Argmax to get max probabilities after applying softmax, Things works fine Until now.
When I used padding with max_length = 100 The model started making false prediction and not working well and all predicted tokens were the same i.e 119-Token ID
Code I used for Argmax
tf.argmax(tf.keras.activations.softmax(m(tokenizer("hello world [MASK] like it",return_tensors="tf",max_length=,padding="max_length")["input_ids"]).logits)[0],axis=-1)
Output Before using padding
<tf.Tensor: shape=(7,), dtype=int64, numpy=array([ 9800, 19082, 1362, 146, 1176, 1122, 119])>
Output After using padding with max_length of 100
<tf.Tensor: shape=(100,), dtype=int64, numpy=
array([119, 119, 119, 119, 119, 119, 119, 119, 119, 119, 119, 119, 119,
119, 119, 119, 119, 119, 119, 119, 119, 119, 119, 119, 119, 119,
119, 119, 119, 119, 119, 119, 119, 119, 119, 119, 119, 119, 119,
119, 119, 119, 119, 119, 119, 119, 119, 119, 119, 119, 119, 119,
119, 119, 119, 119, 119, 119, 119, 119, 119, 119, 119, 119, 119,
119, 119, 119, 119, 119, 119, 119, 119, 119, 119, 119, 119, 119,
119, 119, 119, 119, 119, 119, 119, 119, 119, 119, 119, 119, 119,
119, 119, 119, 119, 119, 119, 119, 119, 119])>
I wonder if this problem prevail even training a new model as It is mandatory to set Input shape for training new model I Padded and tokenized the data but, now I want to know if this problem continues with it too.