Deploying Open AI's whisper on Sagemaker

found the fix, need to add these to config

    env={
        'HF_TASK':'automatic-speech-recognition'
    }

without this model is unable to identify the given task


I was able to bypass max output limit, which is set in “generation_config.json” as * max_length:448, you can update this to some bigger number like 6000000

getting a warning afterwards,

UserWarning: Neither `max_length` nor `max_new_tokens` has been set, `max_length` will default to 36000 (`generation_config.max_length`). Controlling `max_length` via the config is deprecated and `max_length` will be removed from the config in v5 of Transformers -- we recommend using `max_new_tokens` to control the maximum length of the generation.

what then the issue I am getting is error 413, which is related to how sagemaker works and has nothing to do hf model

currently the code, that is working directly streams the file using the path to the endpoint; so the solution is rather to give s3 path and download and process file inside the endpoint;
which needs an custom “inference.py”

let me know if anyone still working on this