I want to create a small model that could optimize code. What are your suggestions?

For example(just example), if it is c++ code, I want it to perform like this:

input:
int func(){
int a = 1;
int b = 2;
int useless = 5;
int c = a+b;
return c;
}
output:
int func(){
int a = 1;
int b = 2;
int c = a+b;
return c;
}
This is only an example and I understand compiler will optimize this automatically. But I hope you understand it that I have plenty of optimizations that with an original input and an output that is optimized.
I want to train a model that focused on this particular language and it will then be applied to new code, and auto optimize the code.

Any suggestions will be appreciated.
My questions include:

  1. is this better a supervised or unsupervised model?
  2. should I use LLM or other small models will also work?
  3. what do you think this project? will it work as expected? I have large enough input and output.

Hi @StevenGon, welcome to the forums, I’m pretty new, too and am exploring code applications with LLMs.

I haven’t thought about your particular use-case. However, from your example, linters already do that, so what are you trying to achieve beyond that or are you just trying to recreate what they do yourself? i.e. not sure you need to reinvent the wheel when you can just use an existing solution if you’re not just trying to learn how it works.

If you do in fact want to learn, then the first thing I’d do is search around for papers and articles related to this task - i.e. ‘automatic code improvement’, ‘ml approach for fixing code’, etc. Share what you’ve found here, am curious.

1 Like

Thanks. I will consider your post. I don’t need to do it from scratch, just need a solution that could finally involve in my project, that would be fine.

And what’s more, I have a particular language and I have many optimizations myself. (not c/c++/…, and optimizations are also not regular ones)