I am using the BertTokenizerFast from transformers library to encode my text data. I am using pre-trained model for that “bert-base-uncased”. I have a data set of 7 Million rows which is around 2GB. I am using 64GB RAM.
When I am trying to convert into vectors using the model BertModel with same pre-trained model"bert-base-uncased", It cannot convert even 10,000 encoded vectors. It is allocating by saying very high memory needed (100s of GB).
I am using the reference code from here reference blog