I have a finetuned mistral instruct V2 model that has been fine-tuned using SFT followed by DPO mechanisms. I want to deploy this on sagemaker with custom inference script. While doing this I also want to stream the output via sagmaker. Is this possible. I have seen a comment last November saying - One cannot use the llm container with a custom inference script. (Deploying custom inference script with llama2 finetuned model - #2 by philschmid)
Any idea if we can do this now…any help would be appreciated