Can't inference on the train model due to some cuda problem

I have fine tuned a model with a single GPU by setting

import os
os.environ[‘CUDA_VISIBLE_DEVICES’]=‘6’
device=torch.device(‘cuda:6’ if torch.cuda.is_available() else ‘cpu’)

but when I am in inference mode, I am getting the following error

"Traceback (most recent call last):
File “/TransformerModels/TransformerRoBERTaTrial.py”, line 237, in
inputs = tokenizer(smiles, return_tensors=“pt”, padding=‘max_length’, truncation=True, max_length=250).to(device) #max_length=195
File “anaconda3/envs/lm_hugg/lib/python3.9/site-packages/transformers/tokenization_utils_base.py”, line 759, in to
self.data = {k: v.to(device=device) for k, v in self.data.items()}
File “anaconda3/envs/lm_hugg/lib/python3.9/site-packages/transformers/tokenization_utils_base.py”, line 759, in
self.data = {k: v.to(device=device) for k, v in self.data.items()}
RuntimeError: CUDA error: invalid device ordinal
CUDA kernel errors might be asynchronously reported at some other API call, so the stacktrace below might be incorrect.
For debugging consider passing CUDA_LAUNCH_BLOCKING=1.
Compile with TORCH_USE_CUDA_DSA to enable device-side assertions.

How to solve the problem?

Thank you