Can't inference on the train model due to some cuda problem

I have fine tuned a model with a single GPU by setting

import os
device=torch.device(‘cuda:6’ if torch.cuda.is_available() else ‘cpu’)

but when I am in inference mode, I am getting the following error

"Traceback (most recent call last):
File “/TransformerModels/”, line 237, in
inputs = tokenizer(smiles, return_tensors=“pt”, padding=‘max_length’, truncation=True, max_length=250).to(device) #max_length=195
File “anaconda3/envs/lm_hugg/lib/python3.9/site-packages/transformers/”, line 759, in to = {k: for k, v in}
File “anaconda3/envs/lm_hugg/lib/python3.9/site-packages/transformers/”, line 759, in = {k: for k, v in}
RuntimeError: CUDA error: invalid device ordinal
CUDA kernel errors might be asynchronously reported at some other API call, so the stacktrace below might be incorrect.
For debugging consider passing CUDA_LAUNCH_BLOCKING=1.
Compile with TORCH_USE_CUDA_DSA to enable device-side assertions.

How to solve the problem?

Thank you