Developing OpenSource Models : The right way!

Today i Stumbled upon a great resource with datsets and a great model:
:slight_smile:
The focus here is on the strategy and understand why we need to devlop strategies such as this to move forwards:

verifiers-for-code (Verifiers For Code) (huggingface.co)

here the developers have create a series of datasets designed to help build a model Specificallly targetted to be a Planning Agent as this is one of the most crucial steps when managing complex tasks as well as agentic environments:

The models we have are Language models designed to predict the next token :slight_smile:

We use simularity and probablity to match the input to the outputs: hence it is a seq to seq network : This model has been further Expanded , and this was aminly possible due to the massive parameters of the model , hence 1b … and above and beyond :slight_smile:

the problem is how they were initially trained … Random Corpuses … so this means they are also great for predicting largers sequences and patterns but not always as we desire: this was further expanded for Question and Answer , by providing valid examples … this has made the models able to focus the random information stored in thier corpes of langauage model probablitys: So we also trained the tokenization process to provide meaning to the entry point and embedding spaces inside the model also : so by training these components we have also optimized the knowldge comming in and out providing knowledge simularity ad granularity to the information being passed … so it giive the impression of conversation and knowing :slight_smile:

So We further expanded this to include some form of Instructions : an we can utilize the three entry model to frame many forms of arguments to provide complexed inputs to gain even ore complexed outputs : So we trained domain daqta and we have tuneed models :slight_smile:
So now we have designed complex prompts and rag ssytems to ehance the input model to provide a valid and reaaly focused output model : we have added Methodologies !

Methodologies enable for customized thought processes and we asked the models to produce data to enable for new processes such as chain of thought and random forests and voting and preferencail selective answering and output formatting , visual spacial thinking etc : but these are in indivisula model and projects implemented by university groups and students who are publishing ( ideas ) << Not true Solutions and sometimes thier data is shared so we can train our models to perform these tasks , large models such as GPT and Claude and perplexity now can even generate agents which have been created using chains and external softwares and sdks and even call apis ( to perform fuctions which they have not been trained to produce )

SO : We are here:
these larger companies are releasing large lazy models which has been massivly trained on large corpuses and not methodologies as they are still relying on thier external processes and systems such as “Rag” <<< Some type of publically explainable solution ) … but in fact then model have been trained on the tranasaction data they gained by going public so we essentially traine dhtier models on our conversations and api calls :slight_smile:

they did not share these data!

so we ask a model to create an app and it cannot perform the task ? WHY!

this is because the models which perform this have been trained onn these tasks :slight_smile: and your callls to the online model are first organizaed by a collection of agents to produce responses and the highest value is selected : as you see the speed of these smaller models are super fast and the main opus model returning the user response mearly waits seconds ! ( they always had an agentic schema )…

So to get the model to produce what we need we need to train AGENTS !!! < yes i said it TOO !) …
So Training the model to be a planner agent is a first step in the chain of agent skills required to produce higher quality softwares and prodcution grade outputs:
We have already trained for coding !
So we also will need to train for magement of agents ! , we will need to train for user case and softwares design methodologies such as waterfall and agile and user case development construction of sequece diagrams and object mocel using solid principles etc ! << then the model can generate these agents to perform these tasks :
Hence some are thinking alig the lines of traiing roles !! << yes with thier own toolboxes( but these tools sould be trained internally ie the model should be trained to produice the outputs the tools are creating )
We need to understand our model more … the main heart and tensor stack contains the probablitys and statistics for the model ad these ca be directed to many forms of tasks … so a languge models tensors stack can also be utilize for vision and audio transformations as the also use decoding or encoding… hence these stages are just layers which are interchangable … So we need to create a model in which we have many forms of encoder entry point and many decoder exit points to handle to input and output : and based on these inputs produce and output , so given a multimedia input we can generate a mulitmedia output !:::::
then these layers can be extended as required !::

and the REAL training can take place ! <<

So to get to the point again , we need to implement training components and strategies such as verifiers-for-code (Verifiers For Code) (huggingface.co) and thier palnning agents

as well as llmware (llmware) (huggingface.co)
who have also implemented Slim models : this is also a way to distribute the cost of executing models , so models can be created with smaller stacks to perform dedicated tasks , later these can can even be selected and combined to form a single collection of experts ! <<< yes we should be constructing our expert models from 1b or smaller models : hence th imprtance to train small models for testing to determine the correct settings for the model before scalling up !<< we kow the correct settings for the 7b models ( mistral v1&2 ) << both correct setting for hidden layers etc … ( numbers must be factor of 8 > factor 2 ) << bits and bytes >> ( scaling laws)

Scalling laws are important to understand when growing models and when to grow the model ? why ? its not sugessted to go past 7b models . so why are they building super large models ? CASH ! < the models are lazy models and need to be specifically trained or they are just general models trained on large corpuses !( at stage one of training !!!) ( they do not have methodolgies because they are too large to train for ( fine tune )…)… the models have been Alligned Only … ie set to MESSAGE output / CHATQA output / INSTRUCT !! <<
so with smaller model you have the ablity to create dedicated experts and your models will grow by adding more experts to your moe stack hence , for methodolgies and dedicated tasks we can suggest to use 1b models . so when you comine models your growth will reflect the tasks and experts in your model ! <<< Now we are creating real models >> We need to understand what makes the models have great thinking and the expert model route information between different perceptions of the same query: enabling for a collected response … in fact they are the thinking models , we have not figured out how to train each head to produce specific responses and the final collation head to collate these responses into a finall output response yet! <<
so these models are very heavy as they contain X number of generations for a single pass ( very good this is where we need to be ) …

So by training for methoidologies , creating expert models , with slim models we can actually create the master models people are afraid of ! ::

I will fid some great liks for dedicated models as well as some great small models and share here … if any body has perceptions ideas comments reflectios great dedicated models : great methodolgies and datasets please…

Welcommen ! all comers to Expertss any opinion or ideas on these topics raised !
check out my page for any models i have created lately as im always testing these pesky papers and implementing these methods into my model !
so my latest model will contain all the past trained methdolgies !