I am evaluating technologies for optimizing the inference of text and image models. Came across Huggingface infinity inference API and AWS Inferentia instances. Wanted some clarity on the differences between the two options.
Is huggingface inference API a pure software optimization that we can apply on models running on any server as opposed to aws inferentia where there are dedicated chips for optimising inference ?
Any reference to the underlying technical details behind the technologies would be helpful.