My graphics card consists of two Tesla T4 GPUs, and the CPU is an Intel(R) Xeon(R) Gold 6240 CPU @ 2.60GHz. I need a high-quality Chinese language generation model, and my setup is powerful enough to run it.
1 Like
Personally, I recommend Alibaba’s Qwen series for its extensive lineup. While it doesn’t always offer the absolute best performance, you can generally handle most tasks within this series…
In multi-GPU environments like this one, choosing the right backend is harder than selecting a specific model. Well, as long as vLLM or SGLang runs, it’ll be fast.
While Qwen3-30b is a super good solution, there are some other notable models worth considering. Namely: Cogito-14b, GPT-OSS-20b, and Gemma3-29b
1 Like
Recommend Alibaba Qwen and OpenAI GPT-OSS systems
1 Like
Hey — with 32 GB of GPU memory (especially across 2× T4), you could try something like Mungert/QwenLong-L1-32B-GGUF which is quantized to be more memory-efficient.
1 Like