I need a model recommendation. My GPU memory is 32GB. Which Chinese large model is most suitable for me?

My graphics card consists of two Tesla T4 GPUs, and the CPU is an Intel(R) Xeon(R) Gold 6240 CPU @ 2.60GHz. I need a high-quality Chinese language generation model, and my setup is powerful enough to run it.

1 Like

Personally, I recommend Alibaba’s Qwen series for its extensive lineup. While it doesn’t always offer the absolute best performance, you can generally handle most tasks within this series…

In multi-GPU environments like this one, choosing the right backend is harder than selecting a specific model. Well, as long as vLLM or SGLang runs, it’ll be fast.

While Qwen3-30b is a super good solution, there are some other notable models worth considering. Namely: Cogito-14b, GPT-OSS-20b, and Gemma3-29b

1 Like

Recommend Alibaba Qwen and OpenAI GPT-OSS systems

1 Like

Hey — with 32 GB of GPU memory (especially across 2× T4), you could try something like Mungert/QwenLong-L1-32B-GGUF which is quantized to be more memory-efficient.

1 Like