I’ve just tried to use Inf2 with the Huggingface APi
All works fine if I use a precompile LLM from aws-neuron.
The problem appeared when I’ve tried to compile my own model.
-
Because of some weird corp AWS setup I only have access to us-west to create, so I’ve created a t3.medium notebook there. BUT I can deploy to us-east-2 so I can use Inf2. The other implication for this I have no access to the logs in east, so all I see in what I can access in the notebook.
-
All examples shows different package requirements. I found these packages required to have neuron export type available for the optimum-cli
!python -m pip config set global.extra-index-url https://pip.repos.neuron.amazonaws.com
!pip install -U transformers_neuronx optimum.neuron optimum[neuronx] optimum-neuron[neuronx]
Is this correct?
- Model export complains about ‘No neuron device available’. Different tutorials tells different info if Inferentia2 host is needed for the model export or not. When model compiles it’s cannot validate on the t3 instance. I guess that’s ok (if I don;t need Inf2 instance for the export
2024-04-04T19:23:20Z Compiler status PASS
[Compilation Time] 65.04 seconds.
[Total compilation Time] 65.04 seconds.
Validating distilbert-base-uncased-distilled-squad model...
2024-Apr-04 19:23:25.796921 19073:19073 ERROR TDRV:tdrv_get_dev_info No neuron device available
...
The export also seems little random, Sometimes it fails with sigterm, sometime it’s crashing my notebook.
- After tar.gz the model and S3 upload I’ve managed to deploy:
from sagemaker.huggingface.model import HuggingFaceModel
env = {
# 'HF_MODEL_ID':'distilbert-base-uncased-distilled-squad', # model_id from hf.co/models
'HF_TASK':'question-answering' # NLP task you want to use for predictions
}
# create Hugging Face Model Class
huggingface_model = HuggingFaceModel(
env=env,
model_data=s3_model_uri, # path to your model and script
role=role, # iam role with permissions to create an Endpoint
transformers_version="4.28.1", # transformers version used
pytorch_version="1.13.0", # pytorch version used
py_version='py38', # python version used
model_server_workers=2, # number of workers for the model server
)
# Let SageMaker know that we've already compiled the model
huggingface_model._is_compiled_model = True
# deploy the endpoint endpoint
predictor = huggingface_model.deploy(
initial_instance_count=1, # number of instances
instance_type="ml.inf2.xlarge" # AWS Inferentia Instance
)
it prints out the “------------!” but seemingly it doesn’t work
data = {
"input": {
"question": "How many apples?",
"context": "Joe got 5 apples"
}
}
# request
predictor.predict(**data)
TypeError: Predictor.predict() got an unexpected keyword argument ‘input’
My main questions:
- Can I export a model without Inferentia2 device (like on t3 node)?
- What are the exact list of packages I need?
- Is there anything wrong with the code I’ev shared above?
A tested working notebook for the same or similar would be amazing!
Thanks