Hey everyone, what’s the best approach for finding and keeping track of valid inferencing parameters for Sagemaker-deployed HF models invoked via client from boto3.Session and its invoke_endpoint method? I’m interested in text-generation and text2text-generation model parameters (LLMs).
TIA, Vladimir
I found Introducing the Hugging Face LLM Inference Container for Amazon SageMaker, which seems to be the correct answer.
There are, in fact, two input/output JSON formats currently supported on SageMaker (June 2023). Some HF models (MPT, OpenLlama etc.) are deployed in containers supporting the old style of input where the expected payload is in the format {"text_inputs": "the prompt goes here", additional_params}
while Falcon runs on the new text-generation-inference docker containers supporting {"inputs": "Hello world", "parameters": {}}
.