Hugging Face BART Text Summarization Model-Sage Maker deployed after a month giving me an 400 error

Hi
I have Hugging Face BART -Text summarization Model in Sage Maker -ran model created /deployed and using Lambda/API Gateway created able to post the request via postman and was able to get a response without any use .
After a 1 month now is giving me an error 400- all related to CUDA error , CUDA error CUBLAS STATUS NOT INITIALIZED when calling 'cublasCreate(handle) another one saying CUDA Error device side assert triggered considered passing Cuda Launch blocking =1 errorType -ModelError in pyTorch any suggestion appreciated.

{
“errorMessage”: “An error occurred (ModelError) when calling the InvokeEndpoint operation: Received client error (400) from primary with message "{\n "code": 400,\n "type": "InternalServerException",\n "message": "CUDA error: device-side assert triggered\nCUDA kernel errors might be asynchronously reported at some other API call,so the stacktrace below might be incorrect.\nFor debugging consider passing CUDA_LAUNCH_BLOCKING=1."\n}\n".
“errorType”: “ModelError”,
“requestId”: “47b466d2-4c43-4f94-870b-3ce806f5b579”,
“stackTrace”: [
" File "/var/task/lambda_function.py", line 23, in lambda_handler\n response=runtime.invoke_endpoint(EndpointName=ENDPOINT_NAME,Body=json.dumps(event),ContentType=‘application/json’)\n”,
" File "/var/runtime/botocore/client.py", line 391, in _api_call\n return self._make_api_call(operation_name, kwargs)\n",
" File "/var/runtime/botocore/client.py", line 719, in _make_api_call\n raise error_class(parsed_response, operation_name)\n"
]
}
“errorMessage”: “An error occurred (ModelError) when calling the InvokeEndpoint operation: Received client error (400) from primary with message "{\n "code": 400,\n "type": "InternalServerException",\n "message": "CUDA error: device-side assert triggered\nCUDA kernel errors might be asynchronously reported at some other API call,so the stacktrace below might be incorrect.\nFor debugging consider passing CUDA_LAUNCH_BLOCKING=1."\n}\n".
“errorType”: “ModelError”,
“requestId”: “4a03a846-a15a-4f37-bc2a-6d53802a793f”,
“stackTrace”: [
" File "/var/task/lambda_function.py", line 23, in lambda_handler\n response=runtime.invoke_endpoint(EndpointName=ENDPOINT_NAME,Body=json.dumps(event),ContentType=‘application/json’)\n”,
" File "/var/runtime/botocore/client.py", line 391, in _api_call\n return self._make_api_call(operation_name, kwargs)\n",
" File "/var/runtime/botocore/client.py", line 719, in _make_api_call\n raise error_class(parsed_response, operation_name)\n"
]
}
“errorMessage”: “An error occurred (ModelError) when calling the InvokeEndpoint operation: Received client error (400) from primary with message "{\n "code": 400,\n "type": "InternalServerException",\n "message": "CUDA error: CUBLAS_STATUS_NOT_INITIALIZED when calling cublasCreate(handle)"\n}\n".
“errorType”: “ModelError”,
“requestId”: “ac209435-01cf-450c-8e0a-c5afc81319ab”,
“stackTrace”: [
" File "/var/task/lambda_function.py", line 23, in lambda_handler\n response=runtime.invoke_endpoint(EndpointName=ENDPOINT_NAME,Body=json.dumps(event),ContentType=‘application/json’)\n”,
" File "/var/runtime/botocore/client.py", line 391, in _api_call\n return self._make_api_call(operation_name, kwargs)\n",
" File "/var/runtime/botocore/client.py", line 719, in _make_api_call\n raise error_class(parsed_response, operation_name)\n"
]
}

“errorMessage”: “An error occurred (ModelError) when calling the InvokeEndpoint operation: Received client error (400) from primary with message "{\n "code": 400,\n "type": "InternalServerException",\n "message": "CUDA error: CUBLAS_STATUS_NOT_INITIALIZED when calling cublasCreate(handle)"\n}\n",
“errorType”: “ModelError”,
“requestId”: “ac209435-01cf-450c-8e0a-c5afc81319ab”,
“stackTrace”: [
" File "/var/task/lambda_function.py", line 23, in lambda_handler\n response=runtime.invoke_endpoint(EndpointName=ENDPOINT_NAME,Body=json.dumps(event),ContentType=‘application/json’)\n”,
" File "/var/runtime/botocore/client.py", line 391, in _api_call\n return self._make_api_call(operation_name, kwargs)\n",
" File "/var/runtime/botocore/client.py", line 719, in _make_api_call\n raise error_class(parsed_response, operation_name)\n"
]
}

Some body suggested to change to Cuda Launch booking =1 that leads again all the way to training the models /deploy/endpoint creation extra lot of work , without again training , how to fix already deployed/endpoint created model?
Also somebody said clear .lock file , i search my laptop tons of file after i searched .lock not sure which one? , any help is appreciated. Advance Thanks