Fine-tune code llama on private source code

What’s the best approach to fine-tune code llama to answer questions about source code on my local disk, without sending the code into the cloud?

Assume the local machine has sufficient GPU (via petals) and the source code in question is ~1m LOC of C# but is unlabelled (there are no “questions” in the training set). So I assume this will necessitate an unsupervised learning approach?

Know of any good papers or other work addressing this use case?

Thanks!
-Bob

2 Likes

hey hello! have you been able to fine tune your code yet? It will be great if you write here on your findings.

1 Like

@stormchaser @boblevy8 Any findings/approach/code, please share ?

You could use this for your data preparation work GitHub - IBM/data-prep-kit: Open source project for data preparation of LLM application builders