In order to transcribe audio files withe the whisper model I would assume that the asyn inference option on AWS sagemaker might be the right choice for long audio files (1 hour, around 5-50mb).
According to the docs it should be possible to have payload sizes up to 1gb
I followed philipp Schmids article here
but I do get the following error which is surprising to me, since my payload is around 11mb.
Received client error (413) from primary and could not load the entire response body
hub = {
'HF_MODEL_ID': 'openai/whisper-base',
'HF_TASK': 'automatic-speech-recognition'
}
# create Hugging Face Model Class
huggingface_model = HuggingFaceModel(
env=hub, # configuration for loading model from Hub
role=role, # iam role with permissions to create an Endpoint
transformers_version="4.26", # transformers version used
pytorch_version="1.13", # pytorch version used
py_version='py39', # python version used
)
# create async endpoint configuration
async_config = AsyncInferenceConfig(
output_path=s3_path_join("s3://", sagemaker_session_bucket, "async_inference/output"),
# Where our results will be stored
# notification_config={
# "SuccessTopic": "arn:aws:sns:us-east-2:123456789012:MyTopic",
# "ErrorTopic": "arn:aws:sns:us-east-2:123456789012:MyTopic",
# }, # Notification configuration
)
# deploy the endpoint
huggingface_model.deploy(
initial_instance_count=1,
instance_type="ml.m5.xlarge", # ml.g4dn.xlarge,
async_inference_config=async_config
)
def predict():
session = boto3.session.Session()
sagemaker_session = sagemaker.Session(session)
predictor = HuggingFacePredictor(endpoint_name=endpoint_name,
sagemaker_session=sagemaker_session,
serializer=audio_serializer
)
async_predictor = AsyncPredictor(predictor)
ASYNC_S3_PATH = "s3://async-inf/async-distilbert"
with open(audio_path, "rb") as data_file:
audio_data = data_file.read()
data = {
"s3_file": "s3://async-inf/async-distilbert"
# "language": "pl"
}
res = async_predictor.predict_async(input_path="s3://async-inf/async-distilbert")
# res = async_predictor.predict_async(data=audio_data, input_path=ASYNC_S3_PATH)
config = WaiterConfig(
max_attempts=5, # number of attempts
delay=10 # time in seconds to wait between attempts
)
res.get_result(config)
print(res)
@philschmid Any idea about how to post large payloads to the async endpoint?
Anyhow thanks a lot for your tireless support. Very much appreciated.