How can one optimally deploy Hugging Face’s open-source embedding models in an application with high user activity, where frequent document uploads necessitate efficient embedding creation and inference? I’m seeking strategies that are cost-effective and prevent vendor lock-in. While considering options like AWS services, including SageMaker and Lambda functions, or the Hugging Face Hub, I’m open to exploring other avenues. I would appreciate insights or recommendations on best practices, architectural considerations, and potential challenges in deploying these models in a manner that balances performance, cost, and independence from specific vendors.