Smollm or othe SLM's example uses andmfeedback for getting the most of of them

Hi,
I am starting to testout various slm for possible use alongside automation code for civil engineering design.
That is my initial task, but I am generally interested in various applications that slm can be applied to.
Initially I looked at onnx models and the discovered gguf and llama.cpp etc.
llama.cpp python api seems the most straightworwrd option on windows due to compile issues I experienced and looking at feedback. I managed to find prebuilt binaries in the llama.cpp pythin github repo, all work well.
So now I can use llama. cpp python api to interact with the various gguf models.
Thankyou to the community for providing this treasure trove.

At this time I am look at SmolLM models - 135m to start with for speed of inference.

It is not straightforward to get sensible responses consistantly. I am using a python script from powershell and i can get an interactive chat session (using Context manager to suppress stdout and stderr during model initialization) for testing inputs to see what works, what does not.
Obviously in an automation setting this will work differently, but for purposes of testing it works well for me. I have done similar with Micorsoft Phi-3, 3.5 , but they are a little slow so decided to try smallest models and see how I get on.

Could I ask the community for feedback on this topic for garner knowledge on various prompts that are known to work, various settings that you recommended or other hacks, using in combination with other streams of data etc and just any feedback you can give on this.

Remember I am looking at SmolLM initially, but would appreciate general ideas for uber fast models that can be good to augment automation code with logic, reasoning and decision capabilities, starting small and building on knowledge base.

Kind Regards
cm

1 Like

Hello.
Fine tuning is important just prior to practical use of any model, but the basic performance of each model and the training conducted by the producer in advance will greatly affect the quality of the output.
Popular models from major manufacturers are generally smart.

If a suitable model is successfully found, the process can be greatly shortened.
If you enter 1B or 0.5B in the model search screen, you will find models in that size range, although not all of them are available. Below are links to models from 0.15B to 1B.

In addition to the HF native format, there are usually volunteer-converted ggufs, so testing with Llamacpp should not be a problem. You can also simply search by model name and gguf.
If you are producing an application, the HF native format, which is compatible with the HF library, transformers, may be useful, but I think it is sufficient to consider it only after you have decided on the model to use.

Thanks John,
Thanks for the information , I will look through your links.
I guess that’s what I had thought was the case re fine tuning - BUT I am surprised that SO much effort and energy both human and computer is going in to producing these models and their variants at the moment, and their seems little online discourse re specifics of what’s been tried and tested and found to work with these base models before one would spend time and effort in fine tuning.

I work in renewable energy sector and so do not like wasting energy if it can be avoided and so I wanted to do exhaustive testing to find suitable models that could be considered viable candidates to be fine tuned for my purposes.

I am interested in reasoning capabilities (at moment just using SmolLM models), logic and decision capabilities, anything that can be used to enhance python based automation routines that we are developing.
Hence it would be great for the community to report back any feedback on what has been tried and worked for the base models, and any other information that could be helpful with our task.
For instance just simple interactions with SmolLM-135 v0.2 version that’s supposed to be better at chatting etc can be inconsistent:

Start chatting with SmolLM! Type 'exit' to end the session.
You: what are your capabilities
SmolLM: I'm an assistant, and I have a lot of responsibilities. I can provide traditional recipes, but I also have some skills that complement traditional recipes. For example, I can help with cooking, baking, and meal planning. I can also assist with meal planning and grocery shopping.
You: in terms of language modelling tasks
SmolLM: I can help with language models, but I don't have the ability to generate text. I can only provide a set of pre-defined patterns and templates for generating text.
You: ok how can you help without generating text
SmolLM: I can help with generating text, but I don't have the ability to generate text. I can only provide a set of pre-defined patterns and templates for generating text.
You:
PS C:\lm> python llama_chat_SmolLM-135mQ4_chat.py
Start chatting with SmolLM! Type 'exit' to end the session.
You: hello
SmolLM: hello

Hello! How can I help you today?
You: i would like to make a cup of tea, hot tea, using english breakfast blend, give me steps
SmolLM: to make a cup of tea, hot tea, using English breakfast blend, give me steps.
You:
SmolLM: you can use the English breakfast blend, give me steps.
You:
SmolLM: you can use the English breakfast blend, give me steps.
You: workflow
SmolLM: a workflow is a sequence of steps that a person takes to complete a task or project. It can be a manual process, a digital process, or a combination of both.
You: workflow for tea
SmolLM: a workflow for tea is a process that involves the preparation of tea leaves, brewing, and serving. It can be a manual process, a digital process, or a combination of both.
You: what are your capabilities
SmolLM: I can provide a comprehensive list of capabilities, including the skills and knowledge I possess to perform tasks like tea preparation, brewing, and serving.
You: expand on what you can assist me with
SmolLM: I can assist you with tasks like tea preparation, brewing, and serving. I can also provide guidance on the best ways to prepare and serve tea, as well as any necessary equipment or tools.
You:

So would like to get a feeling how to squeeze out the most from this very small model as a learning exercise before moving onto bigger models and then moving onto fine tuning etc.

Once again thankyou for your response.
cm

1 Like

and their seems little online discourse re specifics of what’s been tried and tested and found to work with these base models before one would spend time and effort in fine tuning.

LLMs increase through merging in addition to releases and training, so feedback is not keeping up with the speed at which they increase.
There are a few models that are collectively produced with detailed feedback in closed communities and external forums…

Anyway, I also realize that even simple feedback on the LLM, such as how it feels to use, is surprisingly small compared to the number of downloads. Well, the Discussion section is next to the Like button, so it may be that it has a high visual hurdle. In fact, it can be used as if it were a BBS.
A few of us have posted something similar about HF’s weakness in this area in the following request post.
If these areas could be improved, it would be energy efficient and we could make progress in improving the model.
There may also be a means of strengthening ties with outside communities.

Also, it seems that many people use the 0.1B to 8B models offline for their own use.
It seems that they don’t often give feedback on the Internet about the results of adjustments they made secretly for their own use.

I’m new to HF and have very little knowledge of LLM, but there are several active experts who frequent this Forum and HF Discord, so if you want to hear more about the technology, you can contact them directly by sending a mentions. (@+hf_username)

Discord:

Hi @loubnabnl, @anton-l,
I read your blog posts when SmolLM was released, thankyou.
Any feedback on these queries I highlight would be great.
Regards,
cm

oh, and I forgot @eliebak, any feedback would be great.
Regards,
cm