Hi,
So I haven’t gotten to the LLM stage yet, though it at least seems interesting.
I didn’t know if anyone could point me to a good open-source reference to using LLMs for ‘non-language’ applications. In examples given they are doing protein folding and gene / cell identification.
In particular I am curious how they structure their datasets for the training phase.
My use case topic is not neccesarily related to biology/chemistry but I am trying to get my head around how you train something that is ‘not obviously a language’ with these structures.
But I am not really sure where to learn/look (?)