Fine tuning a LLM with a code

I want to fine-tune a LLM locally to serve as an intelligent code reviewer to use as a tool for developers that, given natural language descriptions, identifies and highlights specific locations in the C# codebase where changes are needed. The goal is to streamline the code review process by providing developers with precise indications of where modifications should be made based on their high-level descriptions. Even though there are suitable LLMs for the task i can’t figure out a way to feed my C# code base to the LLM. (a way for the LLM to read my code files )

Are you looking to train any specific LLM? I had used GPT2 for a similar task and it worked decently well.

yes i was thinking code llama or mistral 7b (i can use any open source LLM that supports a C# code base)… how did you feed your code base into the llm to fine tune it to learn the code?

I had created a dataloader function and used huggingface’s trainer function. I used GPT2 and not mistral or code llama.


can you explain the function or maybe give me the code?


@Venushki @AbishekSundar Please share approach/code taken to prepare dataset like schema which is feeded into data loader.

Any update here? I’m working on a similar task. But have no idea how to feed code file to llm. Do I need create a dataset with comment like what the function is doing? or just feed the code file to mode? Anyone can help?


1 Like