Previously I have been fine-tuning “bert-base-uncased” on my custom dataset (loaded from a csv file with datasets.load_dataset), and everything works fine when I use BERT with the Hugging Face Trainer. I have recently tried replacing “bert-base-uncased” with “albert-base-v2” (in both the model and tokenizer), but I have been stuck on this error message when trying to run ALBERT:
Expected input batch_size (4096) to match target batch_size (8)
Since per_device_train_batch_size=8, I am certain that the input dimension comes from 8 * 512 = 4096 where 512 is the length of an embedded vector, I think. It seems like the issue is that somewhere along the way in the model the batch matrix of embedded vectors gets smushed down to one vector.
I have tried everything to fix this bug, but I cannot work it out. My set up for the fine-tuning is exactly the same as the PyTorch set-up in [Fine-tune a pretrained model]. Any advice would be greatly appreciated