Seeking Advice on Fine-Tuning a Legal Language Model for Nepalese Law (LLM + RAG)

sachindhital123 · February 25, 2025, 11:15pm

Hi everyone,

I’m working on building an AI-powered legal assistant focused on Nepalese law. My goal is to create a model that can provide legal advice by understanding and interpreting laws, acts, and judicial decisions in both Nepali and English.

Currently, I’m planning to use a combination of:

Fine-tuned LLMs (like Legal-BERT, mBERT, or GPT-2) for legal reasoning.
Retrieval-Augmented Generation (RAG) to pull up-to-date legal information (Constitution, Civil/Criminal codes, etc.) without needing constant retraining.

What I’ve done so far:

Collected legal texts: Constitution of Nepal (2072), Muluki Ain (2017), and other acts.
Started preparing a question-answer dataset for fine-tuning.
Exploring FAISS and LangChain for RAG implementation.

What I need help with:

Model selection:

Would Legal-BERT be a good choice for fine-tuning legal Q&A, or should I use mBERT since my data involves both Nepali and English?
Is GPT-2 suitable for generating long-form legal explanations?

RAG setup:

For a legal AI, would you recommend FAISS or ChromaDB for storing and retrieving legal document embeddings?
How can I balance retrieval accuracy with generation quality?

Handling bilingual capabilities:

Should I fine-tune the model in Nepali directly, or train in English and use a translation layer for outputs?
Any suggestions for models like BLOOM or mBERT that support Nepali?

Fine-tuning strategy:

For fine-tuning, should I use a SQuAD-style Q&A format or focus on situation-based legal questions?
Any best practices for avoiding hallucinations in legal answers?

I want to build a model that doesn’t just generate answers but cites the correct articles or acts — ensuring transparency and trust.

Would really appreciate your expert insights on how to refine this system, avoid pitfalls, and structure the pipeline efficiently.

Thanks in advance — excited to hear your thoughts!

Topic		Replies	Views
Which Datasets for RAG and fine tuning LLM? Beginners	1	127	February 28, 2025
How to fine-tune an LLM model with an entire document in a format such as *.txt/docx/pdf ect 🤗AutoTrain	6	7222	August 21, 2024
Although doing RAG does it worth fine tuning the LLM on the documents? - Llama2 Intermediate	1	1525	September 14, 2023
Fine-Tuning + RAG based Chatbot: Dataset Structure & Instruction Adherence Issues Intermediate	7	385	March 11, 2025
How to Finetune Llama 3 8b Instruct on new Indian legal laws Models	1	28	April 16, 2025

Seeking Advice on Fine-Tuning a Legal Language Model for Nepalese Law (LLM + RAG)

What I’ve done so far:

What I need help with:

Related topics