I have trained a Roberta based model using the
TFAutoModelForSequenceClassification class. I want to now use this model to make predictions. I am used to Keras, so I’m using the tensorflow version above, and to predict I use
model.predict on the tokenized input sentences.
However the issue is that if I do this on a dataset of size below about 1500 this works fine, but as soon as I go a little above it (I actually want to run this on ~5 million samples) it hangs on the
model.predict line. I have tried various batch sizes from 32 to 512, this didn’t change anything.
Tokenization works fine for the full dataset, and the only output I get from
model.predict are these 2 probably unrelated warnings that I also get when I run it on a small dataset where it doesnt hang:
tensorflow/core/platform/profile_utils/cpu_utils.cc:128] Failed to get CPU frequency: 0 Hz
tensorflow/core/grappler/optimizers/custom_graph_optimizer_registry.cc:113] Plugin optimizer for device_type GPU is enabled.
The full line is
model.predict(dataset, batch_size=64) for instance, where
Any idea what I’m doing wrong?