Want to host a production level server for runnin llm for code generation

I am planning to host a LLm model on a gpu vps (nvidia l40s) for my software development company for code generation, I am planning to use qwen-2.5-coder-Instruct 32b. I looked up a few hosting server libraries, like TGI_trtllm, would love to get a guide who could teach me how to set up the model for TGI_trtllm backend as i am new to the AI field. I tried running it using TGI but faced a few issues with TGI.Also if there are any other better models for code generation pls feel free to suggest them.

I also want the llm to be able to be used by the Continue.dev extension in vscode.

Thanks.

1 Like