When I try to use my fine-tuned Causal LM model to inference a prompt, I get nothing but the last word repeated multiple times

It might be the way the model is fine-tuned (how dataset is structured, how data is processed by data collator and trainer, etc.) but repetition like the example you’ve provided also seems to be a fairly common occurrence. There exists a repetition_penalty parameter that you can try toying with: Transformers - repetition_penalty parameter

Out of curiosity, is there a particular reason you’re trying to generate English text with a model whose base language is Chinese?