Error deploying BERT on SageMaker

Hi Simon, I’ve done that just as you said. The endpoint deploys but I keep getting this persistent error:

RuntimeError: CUDA error: an illegal memory access was encountered
CUDA kernel errors might be asynchronously reported at some other API call, so the stacktrace below might be incorrect.
For debugging consider passing CUDA_LAUNCH_BLOCKING=1
Compile with `TORCH_USE_CUDA_DSA` to enable device-side assertions.

here is the code I ran, I cleared out my api key:

model_name = "CogniXpert"

hub = {
    'HF_MODEL_ID':'I00N/CogniXpert',
    'HF_TASK':'text-generation',
    'HF_API_TOKEN': "",
    'HF_MODEL_QUANTIZE':'bitsandbytes'
}

model = HuggingFaceModel(
    name=model_name,
    env=hub,
    role=role,
    image_uri=image_uri
)


predictor = model.deploy(
    initial_instance_count=1,
    instance_type="ml.g5.xlarge",
    endpoint_name=model_name
)
1 Like