Inference issue with fine tuned model

I’ve successfully (I had to use an Inf1 sagemaker notebook so I couldn’t run the validation) compiled my own fine-tuned model: csabakecskemeti/bert-base-case-yelp5-tuned-experiment · Hugging Face, and deployed it in on Inf2.xlarge with optimum-neuron. The model has a 5 label classification head and fine-tuned on the yelp5 dataset.

When I’ev tried to predict, received the following error:

message": "Could not load model /.sagemaker/mms/models/csabakecskemeti__bert-base-case-yelp5-tuned-experiment with any of the following classes: (\u003cclass \u0027transformers.models.auto.modeling_auto.AutoModelForSequenceClassification\u0027\u003e, \u003cclass \u0027transformers.models.bert.modeling_bert.BertForSequenceClassification\u0027\u003e)."

Deployment code:

from sagemaker.huggingface.model import HuggingFaceModel
# HF_TASK list https://huggingface.co/docs/transformers/main_classes/pipelines
config = {
    "HF_MODEL_ID": "csabakecskemeti/bert-base-case-yelp5-tuned-experiment", # model_id from hf.co/models
    "HF_TASK": "text-classification", # NLP task you want to use for predictions
    "HF_BATCH_SIZE": "1", # batch size used to compile the model
    "MAX_BATCH_SIZE": "1", # max batch size for the model
    "HF_SEQUENCE_LENGTH": "128", # length used to compile the model
}
# create Hugging Face Model Class
huggingface_model = HuggingFaceModel(
   env=config,
   model_data=s3_model_uri,        # path to your model and script
   role=role,                      # iam role with permissions to create an Endpoint
   transformers_version="4.28.1",  # transformers version used
   pytorch_version="1.13.0",       # pytorch version used
   py_version='py38',              # python version used
   model_server_workers=2,         # number of workers for the model server
)
# Let SageMaker know that we've already compiled the model
huggingface_model._is_compiled_model = True
# deploy the endpoint endpoint
predictor = huggingface_model.deploy(
    initial_instance_count=1,      # number of instances
    instance_type="ml.inf2.xlarge" # AWS Inferentia Instance
)

If I changed the HF_MODEL_ID to the original bert: “google-bert/bert-base-cased”, the text classification started working, but the results were not correct obviously.

Does anyone have any hint what’s going on?

Side note: I also found that the Inference Examples have stopped working on my model card, and experienced similar for other “text-classification” models. I don’t know if it’s related it’s a “text-classification” pipeline problem or specific to my model.

All suggestions are welcomed and appreciated

Just checked my model ( csabakecskemeti/bert-base-case-yelp5-tuned-experiment) locally with transformers.pipeline at first I’ve got the same error:
could not load model with any of the following classes: transformers.models.auto.modeling_auto.AutoModelForSequenceClassification,
transformers.models.bert.modeling_bert.BertForSequenceClassification

Then I’ve loaded google-bert/bert-base-cased it has worked. I’ve tried to load my model agan to double check the error message but I haven’t got it anymore.

I’ve tried to reproduce the issue by clearing both my and the google-bert model from the local cache… but it’s still working. I’m not sure what’s going on.

Any idea?

I’ve tested it again on another local machine. The model works just fine with the pipeline

from transformers import pipeline
my_task = "text-classification"
model_id = "csabakecskemeti/bert-base-case-yelp5-tuned-experiment"
generate = pipeline(task=my_task, model=model_id, device='cpu')
res = generate.predict(["This is an awful place to stay", "this is a reasonable restaurant", "best experience ever", "5 start best ever", "mediocare"])
print(res)

Any hit would be welcome what could be the problem on Inf2.
@philschmid maybe?