When I use the pipeline
API, it crashes Colab with an out of memory error (fills 25.5GB of RAM). I think it should be possible to do the inference on TPUv2? But how do I tell the pipeline
to start using the TPUs from the start?
from transformers import pipeline
model_name = 'EleutherAI/gpt-j-6B'
generator = pipeline('text-generation', model=model_name)
out = generator("I am Harry Potter.", do_sample=True, min_length=50)