Sourcing AI Model and Building local application

We’re exploring embedding open-source models from Hugging Face into our application.

For teams that have done this — how are you building containerized applications around these models?

- Is there a reference workflow you follow (from model pull → packaging within application → deployment)?

- How do you check the sourced AI model for any vuln (Is the concept same as checking for vulnerabilities like Open Source Dependencies)

-do you use any artificatory like Jfrog or sonatype to store the models ?

- what are other considerations to make to embed the models within application compared to making API calls to OpenAI/Anthropic ?

2 Likes

reference workflow

TGI + Docker or vLLM + Docker are recommended for their speed and scalability. Ollama is fast and easy to use for testing purposes, but it is not good at handling very long contexts.

vuln

Use safetensors. If using trust_remote_code, research it thoroughly beforehand. If you really want to use Pickle Tensor, use PyTorch 2.6.0 or later.