Fine-tune code llama on private source code

What’s the best approach to fine-tune code llama to answer questions about source code on my local disk, without sending the code into the cloud?

Assume the local machine has sufficient GPU (via petals) and the source code in question is ~1m LOC of C# but is unlabelled (there are no “questions” in the training set). So I assume this will necessitate an unsupervised learning approach?

Know of any good papers or other work addressing this use case?

Thanks!
-Bob

hey hello! have you been able to fine tune your code yet? It will be great if you write here on your findings.