Help Needed: Fine-Tuned Model for Georgian Language Not Generating Text

Hello Hugging Face Community,

I am working on fine-tuning a language model to generate text in Georgian. Despite following the fine-tuning process, my model is not generating new text as expected—it simply echoes back the input prompt. I am seeking advice or suggestions on how to resolve this issue.

Here’s a brief overview of what I’ve done:

  • Model and Goal: I started with mistralai/Mistral-7B-Instruct-v0.1 and aimed to fine-tune it for text generation in Georgian.
  • Dataset: My dataset comprises Georgian news articles, split into titles and corresponding texts, which I used for training the model.
  • Tokenization: I used a custom tokenizer trained for Georgian text.
  • Training Process: I employed a PEFT (Parameter-Efficient Fine-Tuning) approach, using the transformers and peft libraries. Training proceeded without errors, and I saved the fine-tuned model checkpoints successfully.
  • Testing: When testing the model with prompts in Georgian, the model only returns the input prompt without generating any additional text.

I am perplexed as to why the model is not generating new content. I’ve verified that the tokenizer aligns with the model, ensured the training process seemed to proceed correctly, and experimented with various generation parameters.

Could there be something I’m overlooking in the fine-tuning or testing process that might be causing this behavior? Any insights, experiences, or suggestions would be greatly appreciated to help me get this model to generate text properly in Georgian.

Thank you for your time and help!