Hi,
I tried to fine tune a bert model for text classification task using same parameters(learning rate, warmup step, batch size, number of epoch) in pytorch and tensorflow. If I use tensorflow, the validation accuracy changes dramatically. In pytorch accuracy is around %96, in tensorflow %76. One thing I noticed is the gpu memory usage difference (pytorch: ~12gb, tf ~8gb). Shouldn’t we expect it to be the similar accuracy?
-
transformers
version: 3.5.1 - Platform: Linux-4.19.112±x86_64-with-Ubuntu-18.04-bionic
- Python version: 3.6.9
- PyTorch version (GPU?): 1.7.0+cu101 (True)
- Tensorflow version (GPU?): 2.3.0 (True)
- Using GPU in script?: Yes
- Using distributed or parallel set-up in script?: No
from transformers import TFBertForSequenceClassification
model = TFBertForSequenceClassification.from_pretrained('bert-base-uncased', num_labels = num_labels)
optimizer = tf.keras.optimizers.Adam(learning_rate=lr_schedule)
model.compile(optimizer=optimizer, loss=model.compute_loss, metrics=['accuracy'])
history = model.fit(train_dataset.shuffle(1000).batch(32), epochs=epochs, batch_size=32)