Seeking Advice on Fine-Tuning a Legal Language Model for Nepalese Law (LLM + RAG)

Hi everyone, :wave:

I’m working on building an AI-powered legal assistant focused on Nepalese law. My goal is to create a model that can provide legal advice by understanding and interpreting laws, acts, and judicial decisions in both Nepali and English.

Currently, I’m planning to use a combination of:

  • Fine-tuned LLMs (like Legal-BERT, mBERT, or GPT-2) for legal reasoning.
  • Retrieval-Augmented Generation (RAG) to pull up-to-date legal information (Constitution, Civil/Criminal codes, etc.) without needing constant retraining.

What I’ve done so far:

  • Collected legal texts: Constitution of Nepal (2072), Muluki Ain (2017), and other acts.
  • Started preparing a question-answer dataset for fine-tuning.
  • Exploring FAISS and LangChain for RAG implementation.

What I need help with:

  1. Model selection:
  • Would Legal-BERT be a good choice for fine-tuning legal Q&A, or should I use mBERT since my data involves both Nepali and English?
  • Is GPT-2 suitable for generating long-form legal explanations?
  1. RAG setup:
  • For a legal AI, would you recommend FAISS or ChromaDB for storing and retrieving legal document embeddings?
  • How can I balance retrieval accuracy with generation quality?
  1. Handling bilingual capabilities:
  • Should I fine-tune the model in Nepali directly, or train in English and use a translation layer for outputs?
  • Any suggestions for models like BLOOM or mBERT that support Nepali?
  1. Fine-tuning strategy:
  • For fine-tuning, should I use a SQuAD-style Q&A format or focus on situation-based legal questions?
  • Any best practices for avoiding hallucinations in legal answers?

I want to build a model that doesn’t just generate answers but cites the correct articles or acts — ensuring transparency and trust.

Would really appreciate your expert insights on how to refine this system, avoid pitfalls, and structure the pipeline efficiently. :pray:

Thanks in advance — excited to hear your thoughts!

1 Like