Fine-tune code llama on private source code

boblevy8 · September 13, 2023, 12:23pm

What’s the best approach to fine-tune code llama to answer questions about source code on my local disk, without sending the code into the cloud?

Assume the local machine has sufficient GPU (via petals) and the source code in question is ~1m LOC of C# but is unlabelled (there are no “questions” in the training set). So I assume this will necessitate an unsupervised learning approach?

Know of any good papers or other work addressing this use case?

Thanks!
-Bob

stormchaser · October 6, 2023, 5:26pm

hey hello! have you been able to fine tune your code yet? It will be great if you write here on your findings.

aakashgoel12 · February 27, 2024, 8:53am

@stormchaser @boblevy8 Any findings/approach/code, please share ?

himapatel · May 30, 2024, 5:45am

You could use this for your data preparation work GitHub - IBM/data-prep-kit: Open source project for data preparation of LLM application builders

Topic		Replies	Views
Fine-tuning CodeLlama for Multi-File Code Generation in a Private Repository Beginners	10	7968	October 23, 2024
Best Way to fine tune Llama 3? Beginners	1	7427	June 14, 2024
Which repo should I refer to for finetuning llama2? Beginners	0	345	October 25, 2023
Fine tuning a LLM with a code Models	7	3429	February 5, 2025
Fine-Tuning LLMs on Large Proprietary Codebases Models	9	246	June 24, 2025

Fine-tune code llama on private source code

Related topics