I’m using OWLViTForObjectDetection model and I want to perform inference on the GPU. So what I’m doing is something like:
model = model.to(device='cuda')
with torch.no_grad():
model.eval()
data = data.to(device='cuda')
# inference code
It seems that the inclusion of torch.no_grad() is probably causing some of the model’s parameters to not be copied in the GPU memory because I’m getting an error that all tensors should be on the same device but at least two different devices were found (cuda and cpu). If I remove torch.no_grad() the error does not happen but then I get an out of memory error because all the model’s activation are kept in GPU memory for gradient calculation.
This has not happened to me ever in the past with various models that I’ve been using, so I’m wondering whether it is particularly related to HuggingFace models. Have this occurred to anyone else? Are there any known workarounds for this?
Your are probably getting a GPU error unrelated to torch.no_grad() if you installed the PyPI release of transformers with pip install transformers. Sorry about that! This issue was fixed a few weeks ago and you should be able to run the model without any problems if you install the development branch instead: pip install -q git+https://github.com/huggingface/transformers.git
In general, there is no need to call the eval() method within torch.no_grad(). If your issue persists, could you copy paste the minimal code to reproduce the error and the full error log?
model = model.to(device='cuda')
model.eval()
with torch.no_grad():
data = data.to(device='cuda')