Should I use BertConfig? Why these output are different?

Hello :slightly_smiling_face:

The 1st and 2nd snippets load the weights of the prajjwal1/bert-tiny model (either with or without the LM head), so their outputs are the same.

The 3rd snippet only loads the config, meaning no weights are loaded, the model variable contains an untrained model, so the outputs will differ.

1 Like