Hi,
I am starting to testout various slm for possible use alongside automation code for civil engineering design.
That is my initial task, but I am generally interested in various applications that slm can be applied to.
Initially I looked at onnx models and the discovered gguf and llama.cpp etc.
llama.cpp python api seems the most straightworwrd option on windows due to compile issues I experienced and looking at feedback. I managed to find prebuilt binaries in the llama.cpp pythin github repo, all work well.
So now I can use llama. cpp python api to interact with the various gguf models.
Thankyou to the community for providing this treasure trove.
At this time I am look at SmolLM models - 135m to start with for speed of inference.
It is not straightforward to get sensible responses consistantly. I am using a python script from powershell and i can get an interactive chat session (using Context manager to suppress stdout and stderr during model initialization) for testing inputs to see what works, what does not.
Obviously in an automation setting this will work differently, but for purposes of testing it works well for me. I have done similar with Micorsoft Phi-3, 3.5 , but they are a little slow so decided to try smallest models and see how I get on.
Could I ask the community for feedback on this topic for garner knowledge on various prompts that are known to work, various settings that you recommended or other hacks, using in combination with other streams of data etc and just any feedback you can give on this.
Remember I am looking at SmolLM initially, but would appreciate general ideas for uber fast models that can be good to augment automation code with logic, reasoning and decision capabilities, starting small and building on knowledge base.
Kind Regards
cm