Hi @thuangster , did you upload a new model.tar.gz with a requrements.txt file?
Hi @razvanp, thank you for the reply.
I did upload a new model.tar.gz. Then I downloaded model.tar.gz, unzip it and created a folder named “code”, and added requirements.txt inside, with transformers==4.23.1
After that I tried the following:
!wget https://cdn-media.huggingface.co/speech_samples/sample1.flac
audio_path = "sample1.flac"
res = predictor.predict(data=audio_path)
print(res)
I am getting
ModelError: An error occurred (ModelError) when calling the InvokeEndpoint operation: Received client error (400) from primary with message "{
"code": 400,
"type": "InternalServerException",
"message": "\u0027whisper\u0027"
}
Then I tried this:
SAMPLING_RATE = 1000
with open(audio_path, "rb") as file:
audio_file = file.read()
audio_nparray = ffmpeg_read(audio_file, SAMPLING_RATE)
predictor.predict({
"raw": audio_nparray,
"sampling_rate": SAMPLING_RATE
})
Now I am getting
/opt/conda/lib/python3.7/site-packages/sagemaker/predictor.py in _create_request_args(self, data, initial_args, target_model, target_variant, inference_id)
197 args["InferenceId"] = inference_id
198
--> 199 data = self.serializer.serialize(data)
200
201 args["Body"] = data
/opt/conda/lib/python3.7/site-packages/sagemaker/serializers.py in serialize(self, data)
390 return data
391
--> 392 raise ValueError(f"Object of type {type(data)} is not Data serializable.")
ValueError: Object of type <class 'dict'> is not Data serializable.
I am new to this so I might be asking dumb questions. Any help will be appreciated. Thanks.
Let’s take one at the time: when you get this error “message”: “\u0027whisper\u0027” this is because of the wrong Transformers version. You need to solve that first.
If you use audio_serializer = DataSerializer(content_type=‘audio/x-audio’) provided in the example by @thusken, you don need to so the serialization yourself. Just use
predictor = sagemaker.predictor.Predictor(endpoint_name=endpoint_name,serializer=audio_serializer)
# Make a prediction using the Predictor object
prediction = predictor.predict(data=audio_path)
Thank you @razvanp, I have been busy with work and school and just got time to play with this today. I finally got the Transformer version correct and is able to make a prediction.
My final goal is trying to make subtitles of mp4 videos. Would it be better if I:
A) use something like FFMPEG for AWS Lambda lambda-video - npm to extract audio from mp4 put it in S3 then then call SageMaker endpoint with S3 path
B) write customized inference script that downloads mp4 from S3 then uses ffmpeg to extract audio and make a prediction
I am still trying to learn how to use all these tools, any guidance will be appreciated. Thanks
Hi @thuangster , It seems that the Huggingface version of Whisper is not really maintained: the 30 seconds limitation for audio makes it quite useless. Currently I am looking for some other options.
Thanks for the reply. If the 30 seconds limitation cannot be resolved then it is quite useless. Please let me know if you find a better option.
Hi, I have been following this along and have managed to deploy the “whisper-small” model with the requirements file. However I am getting this error when I try to predict using the predictor.
ModelError: An error occurred (ModelError) when calling the InvokeEndpoint operation: Received client error (400) from primary with message "{
“code”: 400,
“type”: “InternalServerException”,
“message”: “\u0027str\u0027 object has no attribute \u0027pop\u0027”
}
This is my code for deploying and prediction. Any help will be appreciated
huggingface_model = HuggingFaceModel(
model_data=s3_location,
role=role,
transformers_version=‘4.17.0’,
pytorch_version=‘1.10.2’,
py_version=‘py38’,
env={
‘HF_MODEL_ID’:‘openai/whisper-small’,
‘HF_TASK’:‘automatic-speech-recognition’
}
)
audio_serializer = DataSerializer(content_type=‘audio/x-audio’)whisper_predictor = huggingface_model.deploy(
initial_instance_count=1,
instance_type=‘ml.m5.xlarge’,
serializer=audio_serializer,
)
audio_path = “sample1.flac”
result = whisper_predictor.predict(data=audio_path)
Could you try to use the latest version, we also have a sample here: notebooks/sagemaker-notebook.ipynb at main · huggingface/notebooks · GitHub
Thanks for the reply @philschmid. Do you mean the transformers version in the requirements file? I have just tried changing the versions in the HuggingFaceModel like the sample notebook but I ran into the same error “message”: “\u0027str\u0027 object has no attribute \u0027pop\u0027”.
huggingface_model = HuggingFaceModel(
model_data=s3_location,
role=role,
transformers_version=“4.26”,
pytorch_version=“1.13”,
py_version=‘py39’,
env={
‘HF_MODEL_ID’:‘openai/whisper-large’,
‘HF_TASK’:‘automatic-speech-recognition’
}
)audio_serializer = DataSerializer(content_type=‘audio/x-audio’)
deploy the endpoint endpoint
whisper_predictor = huggingface_model.deploy(
initial_instance_count=1,
instance_type=“ml.g4dn.2xlarge”,
serializer=audio_serializer,
endpoint_name = “whisper-large”
)
Edit: I have found the error. When getting the predictor back, I did not add the serializer.
predictor = sagemaker.predictor.Predictor(endpoint_name=endpoint_name,serializer=audio_serializer)
with
predictor.predict({
'inputs': "sample1.flac"
})
getting the same error, how did you fix it?
found the fix, need to add these to config
env={
'HF_TASK':'automatic-speech-recognition'
}
without this model is unable to identify the given task
I was able to bypass max output limit, which is set in “generation_config.json” as * max_length:448, you can update this to some bigger number like 6000000
getting a warning afterwards,
UserWarning: Neither `max_length` nor `max_new_tokens` has been set, `max_length` will default to 36000 (`generation_config.max_length`). Controlling `max_length` via the config is deprecated and `max_length` will be removed from the config in v5 of Transformers -- we recommend using `max_new_tokens` to control the maximum length of the generation.
what then the issue I am getting is error 413, which is related to how sagemaker works and has nothing to do hf model
currently the code, that is working directly streams the file using the path to the endpoint; so the solution is rather to give s3 path and download and process file inside the endpoint;
which needs an custom “inference.py”
let me know if anyone still working on this
Hey @MLLife - note that max_length
refers to the maximum token length generated per-batch. Whisper has an intrinsic upper-limit of 448 (which should be plenty!)
is there any example like this but with the wisper model? That can transcribe more than 30 sec of audio? Thx
I too got this error for Image-to-text model, did the serialization for image data but still error not solved.
I can not provide the configuration setting of the whisper-largev3 model while deploying.
I want to specify batch size, language, etc