Your program is not wrong. There was one part of the model settings that did not match the HF specifications, so I have made a PR.
However, the Serverless Inference API itself is currently in a state where it can hardly be used with personal models, so it may be difficult to implement it with InferenceClient. It is difficult to use it unless the state is Warm. Only well-known models become Warm…